bash : grep for pattern from certain location in the file

Syntax for grep to search for a pattern in a file is very well-known. But there are times when one has to grep for the pattern from a certain location or after a certain offset in the file. For example if we are searching for a pattern in a log file which could appear multiple times. Each time we grep, it will provide us all the matching lines from top to bottom of the file and then we have to find which lines were new since our last run. Using dd, the file can be sliced and then grep can be applied for the pattern on that slice. Lets see an example.

We run a simple command in while loop that will continue to update a log file that will be our input.

unixite@sandbox:~/ > while [ 1 ] ; do sleep 1; date >> mytest.log ; done &
[1] 27812
unixite@sandbox:~/ > grep Oct mytest.log
Mon Oct  3 12:24:09 EDT 2011
unixite@sandbox:~/ > grep Oct mytest.log
Mon Oct  3 12:24:09 EDT 2011
Mon Oct  3 12:24:10 EDT 2011
unixite@sandbox:~/ > grep Oct mytest.log
Mon Oct  3 12:24:09 EDT 2011
Mon Oct  3 12:24:10 EDT 2011
Mon Oct  3 12:24:11 EDT 2011
unixite@sandbox:~/ >

Now what we need is to see only what has updated since we last run our search. Leaving the background script running so the input file continue to update.

unixite@sandbox:~/ > ls -l ~/mytest.log | awk '{print $5}'
87
unixite@sandbox:~/ > dd if=~/mytest.log skip=87 bs=1 | grep Oct
Mon Oct  3 12:24:12 EDT 2011
Mon Oct  3 12:24:13 EDT 2011
Mon Oct  3 12:24:14 EDT 2011
Mon Oct  3 12:24:15 EDT 2011
Mon Oct  3 12:24:16 EDT 2011
Mon Oct  3 12:24:17 EDT 2011
Mon Oct  3 12:24:18 EDT 2011
Mon Oct  3 12:24:19 EDT 2011
Mon Oct  3 12:24:20 EDT 2011
Mon Oct  3 12:24:21 EDT 2011
unixite@sandbox:~/ > ls -l ~/mytest.log | awk '{print $5}'
377
unixite@sandbox:~/ > dd if=~/mytest.log skip=377 bs=1 | grep Oct
Mon Oct  3 12:24:22 EDT 2011
Mon Oct  3 12:24:23 EDT 2011
Mon Oct  3 12:24:24 EDT 2011
unixite@sandbox:~/ >

Explanation: Here we are telling dd that which is the input file and to skip how many blocks from the input file. By default block-size is 512 bytes (OS dependent), so we also provide bs=1 to change the block size to 1 byte in order to match the exact byte count that we would want to skip. So this prints out the contents of the file after skip bytes.

So in the script or the application, user can continue to record the file size after each run and use that for the next instance of dd and grep chain.

PS: Don’t forget to stop the background job otherwise it will eat up the disk space slowly. And yes I have manipulated some of the output to clearly indicate the outcome. Otherwise there would have been gaps in the output (I cannot run ls and then dd/grep chain that quickly but yet that slowly that each time they show the new output, so there are time lapses between ls and dd/grep chain to ensure that time-line moves smoothly for the output to make sense).