Unix: Find All Text Below String in a File
Check out how author Mark Needham tackles the issue of parsing out text from the end of a file that occurs after a given string.
Join the DZone community and get the full member experience.
Join For FreeI recently wanted to parse some text out of a bunch of files so that I could do some sentiment analysis on it. Luckily the text I want is at the end of the file and doesn’t have anything after it, but there is text before it that I want to get rid of.
The files look like this:
# text I don't care about
= Heading of the bit I care about
# text I care about
In other words I want to find the line that contains the Heading and then get all the text after that point.
I figured sed was the tool for the job, but my knowledge of the syntax was a bit rusty. Luckily this post served as a refresher.
Effectively what we want to do is delete from the beginning of the file up until the line after the heading. We can do this with the following command:
$ cat /tmp/foo.txt
# text I don't care about
= Heading of the bit I care about
# text I care about
$ cat /tmp/foo.txt | sed '1,/Heading of the bit I care about/d'
# text I care about
That still leaves an extra empty line after the heading which is a bit annoying but easy enough to get rid of by passing another command to sed that strips empty lines:
$ cat /tmp/foo.txt | sed -e '1,/Heading of the bit I care about/d' -e '/^\s*$/d'
# text I care about
The only difference here is that we’re now passing the ‘-e’ flag to allow us to specify multiple commands. If we just pass them sequentially then the 2nd one will be interpreted as the name of a file.
Published at DZone with permission of Mark Needham, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments