We're an ISO27001:2013 Certified Supplier

The comm command can be used to compare two files line by line. It’s particularly useful when writing shell scripts. Take for example the following two files:

$ cat file1
a
b
c
$ cat file2
b
c
d
e

You can quickly see which lines are common to the two files and which are present in only one:

$ comm file1 file2
a
		b
		c
	d
	e

The first column lists the lines present only in the first file, the second column those present only in the second file, and the third shows the lines that are identical in both files.

Keeping Things In Order

Before we delve further it’s important to note that one of comm‘s restrictions is that the input files must be sorted. That is easily rectified using sort (without any extra options). For example:

$ # First, randomise file2 and save as file3:
$ shuf file2 | tee file3
d
e
b
c
$ comm file1 file3
a
b
c
	d
	e
comm: file 2 is not in sorted order
	b
	c
$ # Now sort file3 before using it:
$ comm file1 <(sort file3)
a
		b
		c
	d
	e

When the input files are not sorted the output of comm is not defined and it will exit with an error.

Using comm In Scripts

The columns output by comm are delimited by single TAB characters, so scripts can reasonably easily parse comm's output to glean the information they need. Sometimes you only need what's in one of the columns, though, and nobody wants to reach for cut or even awk without good cause. Thankfully comm can be told to omit columns from its output entirely.

To display only the lines unique to file1, use -23 to exclude the second and third columns:

$ comm -23 file1 file2
a

To display only the lines unique to file2, exclude the first and third columns:

$ comm -13 file1 file2
d
e

And finally, to display only the lines common to both files, include only the third column:

$ comm -12 file1 file2
b
c

comm is part of GNU coreutils and should be available out of the box on most Linux systems. More options are available, such as --total to calculate a summary of the number of lines in each column, or --zero-terminated (-z) which is useful when dealing with file names that can contain spaces (together with find -print0 and xargs -0 for example); be sure to check the comm(1) man page as well as the online documentation to get the full picture.

Photo by Paul Gilmore on Unsplash

Leave a Reply

Your email address will not be published. Required fields are marked *

Secure. Reliable. Scalable.

If that doesn't describe your current Linux systems, check out our FREE Linux Survival Guide to help you get your systems up to scratch today!

  • This field is for validation purposes and should be left unchanged.