The join command, as the name suggests, outputs the contents of two files joined together based on a common field. For example, consider the following files of names:

I would like to see the full names for each individual. It’s as simple as join names surnames:

That was easy! Note the numbers at the start of each line – this is what join uses to determine whether or not a record exists. Unless otherwise specified, join will use the first field for a file. If I append another line to names, for example 7 Percy, then join the files again, there will be no seventh row in the second file to join with:

No Percy there. However, if I add the -a option with a value of 1, we get the following:

The -a switch controls how join should deal with unmatched lines; the value of 1 here means “first argument”. In our example, there is no matching line from surnames so the value from names is used. If I wanted to display all the lines without matches from surnames, I’d use -a2 instead. If I wanted to see all lines from both files, match or not, I could use -a1 -a2.

Bringing it all together

Let’s put all of this together, in a single file, full names – but without those line numbers. While we’re at it, we can also add a surname for Percy. The -o switch controls the output fields in the format file.field; 1.1 being the first field in the first file (in this case, the numbers), 1.2 is the second field in the first file (Pike, Grog) and so on. These arguments are given as a comma-separated list.

An additional -e switch, which requires -o to be given, specifies what to display in case of no match for a field in the output.

Let’s give Percy his surname and redirect the output to a file.

Other formats

What if the data is not space-separated? The -t switch specifies the field separator to use. For example, we can specify a comma for use with CSV files:

When joined, they become:

Those final arguments, -1 2 -2 1 are more advanced output control. They tell join what to compare to join our files: use field two for file one (-1 2) and field one for file two (-2 1). The -2 isn’t strictly speaking necessary, as join assumes the first field unless otherwise instructed, but being explicit is never a bad thing.


