Warning: preg_replace(): Compilation failed: escape sequence is invalid in character class at offset 4 in /home/customer/www/theunixtips.com/public_html/wp-content/plugins/resume-builder/includes/class.resume-builder-enqueues.php on line 59

Compare files ignoring a field or column using Process Substitution

Lets say the data contains multiple fields/columns separated by space or comma or some other delimiter. And we want to compare two files ignoring a specific column. Lets divide work in two small issues. First is to ignore the provided field/column.

If we simply want to ignore the first column, we can use one of the following cut constructs.

cut -d',' -f 1 --complement datafile
cut -d',' -f 2- fileName.csv

If we want to ignore a specific one we can use awk in following manner which is much more generalized because you can specify which column to ignore, be it first, third or last.

This can be used as

awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile

Next part is to diff the output after ignoring (read removing) the column. That is where process substitution comes handy. Here are two examples.

# ignore 1st column from two csv datafiles while comparing
diff -u <(cut -d, -f 2- datafile1) <(cut -d, -f 2- datafile2)
# ignore column 3 from two csv datafiles while comparing
diff -u <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile1) <(awk -F',' -v FieldToIgnore=3 -f ignoreField.awk datafile2)

So instead of giving it two real files, we give it two redirected streams. Same solution can be used to pre-process files differently (e.g. ignore any comments or empty lines or compare two unsorted files).

See below for more information on Process Substitution.
http://www.tldp.org/LDP/abs/html/process-sub.html
http://wiki.bash-hackers.org/syntax/expansion/proc_subst