Lets say the data contains multiple fields/columns separated by space or comma or some other delimiter. And we want to compare two files ignoring a specific column. Lets divide work in two small issues. First is to ignore the provided field/column.
If we simply want to ignore the first column, we can use one of the following cut constructs.
cut -d',' -f 1 --complement datafile
cut -d',' -f 2- fileName.csv
If we want to ignore a specific one we can use awk in following manner which is much more generalized because you can specify which column to ignore, be it first, third or last.
This can be used as
awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile
Next part is to diff the output after ignoring (read removing) the column. That is where process substitution comes handy. Here are two examples.
# ignore 1st column from two csv datafiles while comparing diff -u <(cut -d, -f 2- datafile1) <(cut -d, -f 2- datafile2) # ignore column 3 from two csv datafiles while comparing diff -u <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile1) <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile2)
So instead of giving it two real files, we give it two redirected streams. Same solution can be used to pre-process files differently (e.g. ignore any comments or empty lines or compare two unsorted files).
See below for more information on Process Substitution.