Certified Supplier


We're an ISO27001:2013 Certified Supplier


The comm command can be used to compare two files line by line. It’s particularly useful when writing shell scripts. Take for example the following two files:

You can quickly see which lines are common to the two files and which are present in only one:

The first column lists the lines present only in the first file, the second column those present only in the second file, and the third shows the lines that are identical in both files.

Keeping Things In Order

Before we delve further it’s important to note that one of comm‘s restrictions is that the input files must be sorted. That is easily rectified using sort (without any extra options). For example:

When the input files are not sorted the output of comm is not defined and it will exit with an error.

Using comm In Scripts

The columns output by comm are delimited by single TAB characters, so scripts can reasonably easily parse comm‘s output to glean the information they need. Sometimes you only need what’s in one of the columns, though, and nobody wants to reach for cut or even awk without good cause. Thankfully comm can be told to omit columns from its output entirely.

To display only the lines unique to file1, use -23 to exclude the second and third columns:

To display only the lines unique to file2, exclude the first and third columns:

And finally, to display only the lines common to both files, include only the third column:

comm is part of GNU coreutils and should be available out of the box on most Linux systems. More options are available, such as --total to calculate a summary of the number of lines in each column, or --zero-terminated (-z) which is useful when dealing with file names that can contain spaces (together with find -print0 and xargs -0 for example); be sure to check the comm(1) man page as well as the online documentation to get the full picture.

Photo by Paul Gilmore on Unsplash

Leave a Reply

Your e-mail address will not be published. Required fields are marked *