Over a million developers have joined DZone.

How to Extract Content From a PDF

DZone's Guide to

How to Extract Content From a PDF

Need to grab text or images from a PDF programatically from your Mac? Then read on, because this post has a solution for you.

· Integration Zone ·
Free Resource

Learn more about how to Prevent Slow or Broken APIs From Affecting Your Bottom Line.

I've occasionally needed to extract text and/or images from a PDF. I've found a couple of easy, free ways to do this on MacOS.

There's commercial software such as Adobe Acrobat that will extract images from a PDF, of course, but there's an easier way: a free application called The Unarchiver that treats a PDF file as if it were a zip file and extracts everything into a folder. Just install the app, then right-click on a PDF file, and select Open With.

Related pro-tip: if you want to extract all the images from a Keynote presentation, you can simply unzip the presentation using the commandline unzip application. It'll expand into a folder that contains all the images and other assets (or you can right-click and open with the Archive Utility app).

Mission accomplished, but you'll probably have a bunch of .tiff files where you want compact.jpg or compressed .png files instead. If you're a command line user, and you have ImageMagick installed, you can convert them all at once with a Bash variable substitution like this:

find . -name '*.tiff' | while read line; do 
convert "$line" "${line%%tiff}jpg" 

That'll do the trick for the images. For the text, you can just open the PDF in Mac's default PDF viewer, the Preview app. Use Cmd-A to select all of the text and other content, and then you can simply paste it into any plaintext destination. If you don't have a favorite text editor such as Atom or Sublime Text, you can use Mac's default TextEdit app. Just use Format > Make Plain Text to set it to plain text mode.

Learn about the Five Steps to API Monitoring Success with Runscope

shell ,bash ,integration ,extract text

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}