Say, we have a file or data that has many duplicate rows or entries and we want to find how many time each one has repeated and maybe want to know which is repeated most of the time. Here is an elegant script that can do that in single line.
sort input.file | uniq -c | sort -n -r
Explanation:
First sort will sort the records in the file. Then uniq -c will count how many times each record is duplicated. And finally sort -n -r will sort the output of uniq -c in reverse order giving us the records that repeated most often to the least often.
Lets see an example. Lets say our data file contains following.
unixite@sandbox:~$ cat test.txt one one one two two one five one one two five two one one one two two one five one one two three four two five unixite@sandbox:~$ sort test.txt | uniq -c 4 five 1 four 12 one 1 three 8 two unixite@sandbox:~$ sort test.txt | uniq -c | sort -n -r 12 one 8 two 4 five 1 three 1 four unixite@sandbox:~$