DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • How To Validate Three Common Document Types in Python
  • Leveraging LLMs for Software Testing
  • Chat Completion Models vs OpenAI Assistants API

Trending

  • Power BI Embedded Analytics — Part 2: Power BI Embedded Overview
  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • A Guide to Developing Large Language Models Part 1: Pretraining
  • AWS to Azure Migration: A Cloudy Journey of Challenges and Triumphs
  1. DZone
  2. Coding
  3. Frameworks
  4. Getting Started With OCR docTR on Ubuntu

Getting Started With OCR docTR on Ubuntu

Explore docTR, an open-source Optical Character Recognition (OCR) solution. Walk through what is needed to install it on Ubuntu as well as a test example.

By 
Sylvain Kalache user avatar
Sylvain Kalache
·
May. 08, 24 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
3.3K Views

Join the DZone community and get the full member experience.

Join For Free

In this tutorial, I'll explore how to set up and utilize docTR, the open-source OCR (Optical Character Recognition) solution of the document parsing API startup Mindee. I’ll go through what you need to install docTR on Ubuntu. It accepts PDFs, images, and even a website URL as an input. In this example, I will parse a grocery store receipt. Let’s get started.

Setting Up docTR on Ubuntu

docTR is compatible with any Linux distribution, macOS, and Windows. It is also available as a Docker image. I will use Ubuntu 22.04 LTS (Jammy Jellyfish) for this tutorial. Hardware-wise, you don’t need anything specific, but if you want to do extensive testing, I recommend using a GPU instance; OVHcloud offers affordable options, with servers starting at less than a dollar per hour.

Let’s start by installing Python. At the time of writing, docTR requires Python 3.8 (or higher).

Shell
 
sudo apt install -y  python3


To avoid messing with system libraries, let’s use a virtual environment.

Shell
 
sudo apt install -y python3.10-venv
python3 -m venv testing-Mindee-docTR


Then we install the OpenGL Mesa 3D Graphics Library, used for the computer vision part of docTR.

Shell
 
sudo apt install -y libgl1-mesa-glx


We install pango, which is a text layout engine library.

Shell
 
sudo apt-get install -y libpango-1.0-0 libpangoft2-1.0-0


Then, we install pip so that we can install docTR.

Shell
 
sudo apt install -y python3-pip


Finally, we install docTR within our virtual environment. This version is specifically for PyTorch. If you choose to use TensorFlow, change the command accordingly.

Shell
 
testing-Mindee-docTR/bin/pip3 install "python-doctr[torch]"


Using docTR

Now that docTR is installed, let’s start playing with it. In this example, I will test it with a grocery store receipt. You can download the receipt using the command below.

Shell
 
wget "https://media.istockphoto.com/id/889405434/vector/realistic-paper-shop-receipt-vector-cashier-bill-on-white-background.jpg?s=612x612&w=0&k=20&c=M2GxEKh9YJX2W3q76ugKW23JRVrm0aZ5ZwCZwUMBgAg=" -O receipt.jpeg


Receipt example

Create a testing-docTR.py file and insert the following code into it.

Python
 
from doctr.io import DocumentFile
from doctr.models import ocr_predictor

# Load the grocery receipt
doc = DocumentFile.from_images("receipt.jpeg")

# Load the OCR model
model = ocr_predictor(pretrained=True)

# Perform OCR
result = model(doc)

# Display the OCR result
print(result.export())


Note that docTR uses a two-stage approach: 

  1. First, it performs text detection to localize words.
  2. Then, it conducts text recognition to identify all characters in the word.

The ocr_predictor function accepts additional parameters to select the text detection and recognition architecture. For simplicity, I used the default ones in this example. You can find information about other models on the docTR documentation.

Reading a Receipt Using docTR

Now you just need to run your Python script:

Shell
 
testing-Mindee-docTR/bin/python3 testing-docTR.py


You will get an output such as the one below:

JSON
 
{"pages": [{"page_idx": 0, "dimensions": [612, 612], "orientation": {"value": null, "confidence": null}, "language": {"value": null, "confidence": null}, "blocks": [{"geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]], "lines": [{"geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]], "words": [{"value": "RECEIPT", "confidence": 0.9695481061935425, "geometry": [[0.44140625, 0.1201171875], [0.548828125, 0.14453125]]}]}], "artefacts": []}]}]} 


Note that I drastically shortened the JSON output for readability and only kept the part showing the “RECEIPT” word.

Here is the JSON structure you’d be looking at without truncating the result. I have expanded the part of the tree that I kept in the JSON output. docTR will provide a bunch of information about the document but the important part is about how it breaks down the document into lines, and for each line, provides an array containing the words it detected along with the degree of confidence. Here, we can see it spotted the word RECEIPT with a confidence of 96%.

JSON structure

docTR offers an efficient OCR solution that simplifies text recognition processes. Depending on the document type, you may need to change the text detection and text recognition architectures to improve accuracy. Comprehensive docTR documentation is available here.

Considerations When Using docTR

Deploying docTR entails certain complexities. First, you must create a dataset and train docTR to achieve satisfactory accuracy. This means dealing with data annotation on many images. Since OCR systems typically serve as backend services for other apps, it may be necessary to integrate docTR via an API and scale it according to the app’s needs. docTR does not provide this out of the box, but there are many open-source technologies that can help facilitate this step.

Conclusion

Document processing technologies have come a long way since the advent of OCR tools, which are limited to character recognition. Intelligent Document Processing (IDP) platforms represent the next step; they utilize OCR (such as docTR) along with additional layers of intelligence like table reconstruction, document classification, and natural language understanding, to achieve better accuracy and precision.

Additionally, for those seeking a scalable IDP solution without the complexities of data collection and model training, I recommend trying out Mindee’s latest solution, docTI. This training-free IDP solution leverages Large Language Models (LLMs) to eliminate the need for data collection, annotation, and the model training process. You can use the free-tier plan, configure an instance, and start querying the API in minutes.

API Data set Document Python (language) ubuntu

Opinions expressed by DZone contributors are their own.

Related

  • Parent Document Retrieval (PDR): Useful Technique in RAG
  • How To Validate Three Common Document Types in Python
  • Leveraging LLMs for Software Testing
  • Chat Completion Models vs OpenAI Assistants API

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!