Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Extract Content From a PDF

DZone's Guide to

How to Extract Content From a PDF

Need to grab text or images from a PDF programatically from your Mac? Then read on, because this post has a solution for you.

· Integration Zone ·
Free Resource

Learn more about how to Prevent Slow or Broken APIs From Affecting Your Bottom Line.

I've occasionally needed to extract text and/or images from a PDF. I've found a couple of easy, free ways to do this on MacOS.

There's commercial software such as Adobe Acrobat that will extract images from a PDF, of course, but there's an easier way: a free application called The Unarchiver that treats a PDF file as if it were a zip file and extracts everything into a folder. Just install the app, then right-click on a PDF file, and select Open With.

Related pro-tip: if you want to extract all the images from a Keynote presentation, you can simply unzip the presentation using the commandline unzip application. It'll expand into a folder that contains all the images and other assets (or you can right-click and open with the Archive Utility app).

Mission accomplished, but you'll probably have a bunch of .tiff files where you want compact.jpg or compressed .png files instead. If you're a command line user, and you have ImageMagick installed, you can convert them all at once with a Bash variable substitution like this:

find . -name '*.tiff' | while read line; do 
convert "$line" "${line%%tiff}jpg" 
done 

That'll do the trick for the images. For the text, you can just open the PDF in Mac's default PDF viewer, the Preview app. Use Cmd-A to select all of the text and other content, and then you can simply paste it into any plaintext destination. If you don't have a favorite text editor such as Atom or Sublime Text, you can use Mac's default TextEdit app. Just use Format > Make Plain Text to set it to plain text mode.

Learn about the Five Steps to API Monitoring Success with Runscope

Topics:
shell ,bash ,integration ,extract text

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}