Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Unix: Find All Text Below String in a File

DZone's Guide to

Unix: Find All Text Below String in a File

Check out how author Mark Needham tackles the issue of parsing out text from the end of a file that occurs after a given string.

· Java Zone ·
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

I recently wanted to parse some text out of a bunch of files so that I could do some sentiment analysis on it. Luckily the text I want is at the end of the file and doesn’t have anything after it, but there is text before it that I want to get rid of.

The files look like this:

# text I don't care about

= Heading of the bit I care about

# text I care about

In other words I want to find the line that contains the Heading and then get all the text after that point.

I figured sed was the tool for the job, but my knowledge of the syntax was a bit rusty. Luckily this post served as a refresher.

Effectively what we want to do is delete from the beginning of the file up until the line after the heading. We can do this with the following command:

$ cat /tmp/foo.txt 
# text I don't care about

= Heading of the bit I care about

# text I care about
$ cat /tmp/foo.txt | sed '1,/Heading of the bit I care about/d'

# text I care about

That still leaves an extra empty line after the heading which is a bit annoying but easy enough to get rid of by passing another command to sed that strips empty lines:

$ cat /tmp/foo.txt | sed -e '1,/Heading of the bit I care about/d' -e '/^\s*$/d'
# text I care about

The only difference here is that we’re now passing the ‘-e’ flag to allow us to specify multiple commands. If we just pass them sequentially then the 2nd one will be interpreted as the name of a file.

Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:
unix ,search ,file ,shell ,sed

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}