Objective: Find lines that are present in one file but not on another file.
The diff
utility or a combination of grep
, awk
, sort
and/or sed
might come to your mind for this task, but there are better tools to achieve this.
Let’s assume we have two files with the following content:
1 2 3 4 5 |
a b d c 1 |
1 2 3 4 5 6 7 8 |
d f h g a b c e |
Let’s say we want to find lines that are only found in b.txt
, which are the lines containing the letters e
,f
,g
and h
.
The first tool that we can use is the comm
utility, which is normally part of the Linux coreutils
package.
To use comm
, we will need the files to be lexically sorted first. So, to get the lines found only on file b.txt
:
$ comm -13 <(sort a.txt) <(sort b.txt) e f g h
To print lines found only in a.txt
, use the -23
argument:
$ comm -23 <(sort a.txt) <(sort b.txt) 1
To print all common lines between the two files, use the -12
argument:
$ comm -12 <(sort a.txt) <(sort b.txt) a b c d
The second method is to use combine
from the moreutils
package - a utility that supports not
, and
, or
, xor
operations
To get the lines in b.txt
that are not found in a.txt
:
$ combine <(sort b.txt) not <(sort a.txt) e f g h
For more info on the comm
and combine
tools, refer to the man page.