Find Lines in One File Not Found in Another File

Objective: Find lines that are present in one file but not on another file.

The diff utility or a combination of grep, awk, sort and/or sed might come to your mind for this task, but there are better tools to achieve this.

Let’s assume we have two files with the following content:

a
b
d
c
1

d
f
h
g
a
b
c
e

Let’s say we want to find lines that are only found in b.txt, which are the lines containing the letters e,f,g and h.

The first tool that we can use is the comm utility, which is normally part of the Linux coreutils package.

To use comm, we will need the files to be lexically sorted first. So, to get the lines found only on file b.txt:

$ comm -13 <(sort a.txt) <(sort b.txt)
e
f
g
h

To print lines found only in a.txt, use the -23 argument:

$ comm -23 <(sort a.txt) <(sort b.txt)
1

To print all common lines between the two files, use the -12 argument:

$ comm -12 <(sort a.txt) <(sort b.txt)
a
b
c
d

The second method is to use combine from the moreutils package - a utility that supports not, and, or, xor operations

To get the lines in b.txt that are not found in a.txt:

$ combine <(sort b.txt) not <(sort a.txt)
e
f
g
h

For more info on the comm and combine tools, refer to the man page.

Mohamed Ibrahim

ibrahim = { interested_in(unix, linux, android, open_source, reverse_engineering); coding(c, shell, php, python, java, javascript, nodejs, react); plays_on(xbox, ps4); linux_desktop_user(true); }