DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Understanding the Basics of Neural Networks and Deep Learning
  • How to Use Python for Data Science
  • Basic Convolutional Neural Network Architectures

Trending

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Docker Base Images Demystified: A Practical Guide
  • The Modern Data Stack Is Overrated — Here’s What Works
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Deep Dive Into OCR for Receipt Recognition

Deep Dive Into OCR for Receipt Recognition

No matter what you choose, an LSTM or another complex method, there is no silver bullet. Some methods are hard to use and not always useful.

By 
Ivan Ozhiganov user avatar
Ivan Ozhiganov
·
Jun. 21, 17 · Tutorial
Likes (13)
Comment
Save
Tweet
Share
79.0K Views

Join the DZone community and get the full member experience.

Join For Free

Optical Character Recognition is a process when images of handwritten, printed, or typed text are converted into machine-encoded text. Automated recognition of documents, credit cards, car plates and billboards significantly simplifies the way we collect and process data.

The growth of Machine Learning and Convolutional Neural Networks (CNN) has helped text recognition make a huge leap forward. We used CNN in our research to recognize paper receipts from retail stores. The system can be adjusted to process different languages, but we tested it using Russian.

The goal of our project was to develop an app using the client-server architecture for receipt recognition. Let's take a closer look, step by step.

Preprocessing

First things first: we rotated the receipt image so that the text lines were horizontally oriented, made the algorithm detect the receipt, and binarized it.

Rotating Image to Recognize a Receipt

What we used to recognize a receipt on the image:

  1. Adaptive binarization with a high threshold.

  2. Convolutional Neural Network.

  3. Haar cascade classifier.

Using Adaptive Binarization

Original receipt view

First, we recognized the area on the image that contains the full receipt and almost no background. To achieve this, we rotated the image so that text lines are horizontally oriented.

Receipt rotation

We used the adaptive_threshold function from the scikit-image library to find the receipt. This function keeps white pixels in areas with a high gradient, while more homogeneous areas turn black. Using this function, we got a homogeneous background with a couple of white pixels. We were searching for them to define the rectangle.

Identified area containing receipt

Using a CNN

We decided to find receipt keypoints using a convolutional neural network as we did before for the object detection project. We chose the receipt angles as key points. This method performed well, but worse than adaptive binarization with a high threshold.

The CNN was able to define only those angle coordinates relative to the found text. Text to angle orientation varies greatly, meaning this CNN model is not very precise.

See CNN results below.

Examples of finding the receipt angles using CNN

Using the Haar Cascade Classifier to Recognize a Receipt

As a third alternative, we tried the Haar cascade classifier. However, after a week of training the classifier and changing recognition parameters, we didn’t get any positive result. Even the CNN performed much better.

Haar cascade classifier results:

The best results we managed to get from the classifierThe best results we managed to get from the classifier

Binarization

In the end, we used the same adaptive_threshold method for binarization. The window is quite big so that it contains the text as well as the background.

Receipt binarization

Text Detection

Let's cover a few different components of text detection.

Detecting Text Via Connected Components

First, we found the connected components using the findContours function from OpenCV. The majority of connected components are characters, but a few are just noisy text fragments left after binarization. We filtered them using maximal/minimal axis.

Then we applied a combining algorithm to compound characters, like :, Й, and =. The characters are then combined into words via a nearest neighbour search. Here is the principle of nearest neighbours method. You need to find the closest neighbor for every character. Then you choose the most appropriate candidate for combination from the right and from the left side. The algorithm is processed until there are no more characters left.

Finding connected components and forming words (words are highlighted in one color)

Then words formed text lines. We used the hypothesis that words in a single line are located at the same height.

Forming the lines (lines are highlighted in one color)

The disadvantage is that this algorithm cannot correctly recognize noisy text.

Using a Grid for Text Detection

We found that almost all receipts had monospaced text. So we managed to draw a grid on the receipt and separate characters from each other using grid lines:

Grid example

The grid simplifies further receipt recognition. A neural network can be applied to every cell of the grid and every character can be easily recognized. The problem of noisy text is gone. Finally, the number of consequent spaces was precisely defined.

We tried the following algorithm to recognize the grid. First, we found connected components in the binary image:

Finding connected components

Then we processed the lower-left angles of the green rectangles and got a set of points specified by coordinates. To determine distortions we used the 2d periodic function:

Image title

Graph of the function in formula

The main idea behind the receipt grid was to find non-linear geometric distortions with the graph peak points. In other words, we had to find the maximum value sum of this function. Also, we needed to find an optimal distortion.

We parametrized a geometric distortion using the RectBivariateSpline function from the Scipy Python module. We used the minimize Scipy function for optimization.

Here’s what we got:

Incorrectly found grid

All in all, this method appeared to be slow and unstable. We decided not to use it again.

Optical Character Recognition

Let's deal with recognizing text we found via connected components and recognizing complete words.

Recognizing Text We Found Via Connected Components

For text recognition, we used a Convolutional Neural Network (CNN) trained on receipt fonts. As an output, we had probabilities for every character. We took several initial options that together had 99% probability. Then we used a dictionary to check all possible words that can be compiled using these characters. This helped improve the recognition accuracy and eliminate faults caused by similar characters (for example, "З" and "Э", Cyrillic alphabet).

Image title

However, the method’s performance is very low when it comes to noisy text recognition.

Recognizing Complete Words

It is necessary to recognize complete words when a text is too noisy to recognize it by single characters. We solved this problem using two methods:

  • LSTM network.

  • Uniform segmentation.

LSTM Network

You can read these articles to learn more about reading text in deep convolutional sequences and using LSTM networks for language-independent OCR. For this purpose, we used the OCRopus library.

We used monospaced fonts and prepared an artificial sample for training.

Artificial set

After the training, we tested our network using a validation set. The test result appeared to be positive. Then we tested it using real receipts. Here is what we got:

Image title

The trained neural network performed well on simple examples. We successfully recognized them before using other methods. The network didn’t work for complex cases.

We distorted the training sample and approximated it to the words recognized on receipts.

Examples of the artificial setTo avoid network overfitting we stopped the training process several times and continued training the network with the new dataset. Finally, we got the following results:

Image title

Our new network was good at recognizing complex words. But simple word recognition was not so good.

We believe this CNN can perform much better with a single font and minor distortions.

Uniform Segmentation

The receipts font was monospaced, so we decided to split the words by characters. First, we needed to know the width of every character. Thus, the mode of the character width was estimated for every receipt. In the case of a bimodal character width distribution, there are two modes chosen and a specific width is picked for every text line.

Bimodal distribution of character widths in the receipt

When we got an approximate character width, we divided the length of the word by the character width to get the approximate number of characters. Then we divided the length of the word by the approximate number of characters, give or take one character:

Finding the optimal segmentation

Choosing the best option for division:

Optimal segmentation

The accuracy of such segmentation is quite high.

Algorithm working correctly

Sometimes our algorithms performed incorrectly:

Incorrect performance

Every fragment was processed by a CNN after the segmentation.

Extracting Meaning From Receipts

We used regular expressions to find purchases in receipts. There is one feature in common for all the receipts: the price of purchases is written in the XX.XX format, where X is a number. Therefore, it’s possible to extract the lines containing purchases. The Individual Taxpayer Number can be found by searching for 10 numbers and tested by the control sum. The Cardholder Name has the format NAME/SURNAME.

Extracting meaning from receipts

Takeaways

  1. No matter what you choose, an LSTM or another complex method, there is no silver bullet. Some methods are hard to use and not always useful.

  2. We'll continue working on the project. For now, the system shows good performance when the recognized text is not noisy.

neural network Machine learning Network Convolutional neural network

Published at DZone with permission of Ivan Ozhiganov. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • How to Port CV/ML Models to NPU for Faster Face Recognition
  • Understanding the Basics of Neural Networks and Deep Learning
  • How to Use Python for Data Science
  • Basic Convolutional Neural Network Architectures

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!