Make PDFs Searchable (OCR) After Importing Into SharePoint
Use Microsoft Power Automate and Adobe PDF Tools connector to automatically OCR PDFs so that they are searchable in SharePoint.
Join the DZone community and get the full member experience.Join For Free
Let’s say you scan a piece of paper and convert it to a PDF. Did you know your PDF can have its text and images processed to make it easy to search through? This also makes it easy for other applications like SharePoint, BOX, Dropbox, and others to index your content, so you can search for them in those applications.
Depending on which PDF engine you use, you might run into issues. Some PDF engines just take an image and create a PDF wrapper around it. Then, your file repositories can’t index it because it’s just a glorified picture of a page.
Free tools like Adobe Scan allow you to turn your smartphone into a portable scanner. Unlike some PDF engines out there, Adobe Scan will use your Adobe Document Cloud account to OCR (optical character recognition) your PDFs, so they are indexable by your repositories.
But, what about all those other PDFs that are already scanned and aren’t searchable? What if you have decades of PDFs that have been collecting digital dust because they aren’t searchable? How do you make sure that whatever documents you are importing into tools like SharePoint are searchable? Fortunately, with Adobe PDF Tools for Microsoft Power Automate, this is super easy to do.
In this article, you’ll learn how to automatically OCR PDFs when they are placed in a folder in SharePoint using Power Automate. This method does not only work with SharePoint — Power Automate integrates with apps like BOX, Dropbox, and others, but this walkthrough is a great reference.
Important Note: This article applies to typically scanned documents. Many of your PDFs, such as ones converted from Word documents, are already indexable and searchable.
How Do I Know if My PDF Isn’t Searchable?
There is an easy test to figure out whether PDFs are searchable. Open your PDF in Adobe Acrobat Reader. If you can’t select the text in the PDF, you can’t search the document. If you can, then it probably is already searchable.
What You Will Need
- Adobe PDF Tools (you can get a free trial here). Note: You can get a free 6 month trial.
- Microsoft Power Automate (you will need premium licensing to take advantage of PDF tools).
Create Your Adobe PDF Tools Credentials
If you haven’t already created credentials to use with your Adobe PDF Tools, you can create them here.
Once you supply a name and description, your client credentials will be generated. Keep this window open; you will need this information to create a connection in Microsoft Power Automate.
Create a Flow From a Template
Adobe PDF Tools has a variety of templates pre-created to make it easy for you to get started. We are going to use one of these for this example.
- Log into Microsoft Power Automate.
- Use this template to get started.
- You will need to create a connection to PDF Tools. Use the credentials you created earlier.
- Click Create.
- Now that you have all your connections, you only need to make a few changes to get this template working.
- In your trigger called, When a file is created in a folder, set the Site Address to the SharePoint site you want to reference.
- In the Folder Id, set the path to the folder you want Microsoft Power Automate to watch to generate the OCR PDF.
- Scroll down to the Create file action and set your Site Address and Folder Id to the specified place where you want the generated documents to go.
Pro Tip: If you want to dynamically decide where the generated documents can go, you can use Dynamic Content to use variables to decide the path. Here is a helpful video to learn more about that.
See It in Action
That’s it! Now, give it a try by placing a file in the Input folder. Microsoft SharePoint is not instantaneous when triggering a flow, so you might need to wait 30–60 seconds before it reacts and triggers the flow.
If you don’t want to wait that long, use the Test button in your flow before you drop the files in. This will trigger it to process faster.
Search for Files in Sharepoint
Now that you have that file in an indexable PDF format, try searching the contents of that document in SharePoint using search. After SharePoint indexes the file, you will then find that document found in your search results, making it easier to find your content even if it was physically scanned.
Being able to OCR documents automatically when they are imported into SharePoint makes it much easier to find the content you are looking for and convert legacy documents into a searchable format. However, this doesn’t stop at OCR. Have a look at many of the other actions available in Adobe PDF Tools, such as Create PDF, Combine, Export, and other formats.
Published at DZone with permission of Ben Vanderberg. See the original article here.
Opinions expressed by DZone contributors are their own.