We're an ISO27001:2013 Certified Supplier

blog-post-featured-image

The join command, as the name suggests, outputs the contents of two files joined together based on a common field. For example, consider the following files of names:

$ cat names
1 Pike
2 Grog
3 Scanlan
4 Shaun
5 Tiberius
6 Taryon

$ cat surnames
1 Trickfoot
2 Strongjaw
3 Shorthalt
4 Gilmore
5 Stormwind
6 Darrington

I would like to see the full names for each individual. It’s as simple as join names surnames:

$ join names surnames
1 Pike Trickfoot
2 Grog Strongjaw
3 Scanlan Shorthalt
4 Shaun Gilmore
5 Tiberius Stormwind
6 Taryon Darrington

That was easy! Note the numbers at the start of each line – this is what join uses to determine whether or not a record exists. Unless otherwise specified, join will use the first field for a file. If I append another line to names, for example 7 Percy, then join the files again, there will be no seventh row in the second file to join with:

$ join names surnames
[...]
6 Taryon Darrington

No Percy there. However, if I add the -a option with a value of 1, we get the following:

$ join names surnames -a1
[...]
6 Taryon Darrington
7 Percy

The -a switch controls how join should deal with unmatched lines; the value of 1 here means “first argument”. In our example, there is no matching line from surnames so the value from names is used. If I wanted to display all the lines without matches from surnames, I’d use -a2 instead. If I wanted to see all lines from both files, match or not, I could use -a1 -a2.

Bringing it all together

Let’s put all of this together, in a single file, full names – but without those line numbers. While we’re at it, we can also add a surname for Percy. The -o switch controls the output fields in the format file.field; 1.1 being the first field in the first file (in this case, the numbers), 1.2 is the second field in the first file (Pike, Grog) and so on. These arguments are given as a comma-separated list.

$ join names surnames -o 1.2,2.2 -a1
Pike Trickfoot
Grog Strongjaw
Scanlan Shorthalt
Shaun Gilmore
Tiberius Stormwind
Taryon Darrington
Percy

An additional -e switch, which requires -o to be given, specifies what to display in case of no match for a field in the output.

$ join names surnames -o 1.2,2.2 -a1 -e "has no surname"
Pike Trickfoot
Grog Strongjaw
Scanlan Shorthalt
Shaun Gilmore
Tiberius Stormwind
Taryon Darrington
Percy has no surname

Let’s give Percy his surname and redirect the output to a file.

$ join names surnames -o 1.2,2.2 -a1 -e "DeRolo" >
fullnames
$ cat fullnames
Pike Trickfoot
Grog Strongjaw
Scanlan Shorthalt
Shaun Gilmore
Tiberius Stormwind
Taryon Darrington
Percy DeRolo

Other formats

What if the data is not space-separated? The -t switch specifies the field separator to use. For example, we can specify a comma for use with CSV files:

$ cat species
Grog Strongjaw,Goliath
Percy DeRolo,Human
Pike Trickfoot,Gnome
Scanlan Shorthalt,Gnome
Shaun Gilmore,Human
Taryon Darrington,Human
Tiberius Stormwind,Dragonborn

$ cat players
Travis,Grog Strongjaw
Taliesin,Percy DeRolo
Ashley,Pike Trickfoot
Sam,Scanlan Shorthalt
Matt,Shaun Gilmore
Sam,Taryon Darrington
Orion,Tiberius Stormwind

When joined, they become:

$ join -t, players species -1 2 -2 1
Grog Strongjaw,Travis,Goliath
Percy DeRolo,Taliesin,Human
Pike Trickfoot,Ashley,Gnome
Scanlan Shorthalt,Sam,Gnome
Shaun Gilmore,Matt,Human
Taryon Darrington,Sam,Human
Tiberius Stormwind,Orion,Dragonborn

Those final arguments, -1 2 -2 1 are more advanced output control. They tell join what to compare to join our files: use field two for file one (-1 2) and field one for file two (-2 1). The -2 isn’t strictly speaking necessary, as join assumes the first field unless otherwise instructed, but being explicit is never a bad thing.

Credits

Photo by rawpixel on Unsplash