DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • How To Convert Common Documents to PNG Image Arrays in Java
  • How to Convert Excel and CSV Documents to HTML in Java
  • How to Convert a PDF to Text (TXT) Using Java
  • How to Merge HTML Documents in Java

Trending

  • What’s Got Me Interested in OpenTelemetry—And Pursuing Certification
  • Proactive Security in Distributed Systems: A Developer’s Approach
  • How To Build Resilient Microservices Using Circuit Breakers and Retries: A Developer’s Guide To Surviving
  • Tired of Spring Overhead? Try Dropwizard for Your Next Java Microservice
  1. DZone
  2. Data Engineering
  3. Databases
  4. How To Convert Scanned or Photographed Documents Into Text Using Java

How To Convert Scanned or Photographed Documents Into Text Using Java

Learn how you can harness two API solutions to convert scanned or photographed images into plain, machine-readable text.

By 
Brian O'Neill user avatar
Brian O'Neill
DZone Core CORE ·
Updated Dec. 13, 22 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.8K Views

Join the DZone community and get the full member experience.

Join For Free

As many of our most pressing needs in the digital age are satisfied at the click of a button (or, perhaps more accurately, the touch of a screen), one might reasonably assume that the value of putting pen to paper has diminished to a level just shy of complete irrelevance. Yet, even as the business world continues its steady march towards complete digitization, we still somehow find ourselves writing out hand-written materials for many of our life’s most important transactions. For example, while many of us elect health insurance coverage through our employer’s online portal, we still typically need to write out various important details when we arrive at the doctor’s office – such as allergies, medications, recent disease exposures, etc. –  and from there, we need to hand these filled-out forms into an office attendant for processing. Further, while (likely) most of us check finances and pay bills through secure banking applications, we still can’t seem to escape the use of in-person, hand-written materials in the process of acquiring loans for our future ventures, and we still often receive handwritten checks for payment. All things considered, the pertinence of hand-written materials may not be dying quite as quickly as anticipated.  

However, regardless of how our medical or financial information is filled out, one thing is certain: that information, one way or another, is going to be stored in digital form.  The reason for that is straightforward – relying on stacks of paper in cluttered filing cabinets to store all-important reference data is clearly irresponsible in the modern day.  Even if mountainous filing cabinets still do exist in some over-burdened offices in the world, no one would willfully accept the storage of such files without some form of digital redundancy.  The possibility of losing potentially critical data to a natural disaster – whether it be an age-old case of crippling human disorganization, or a truly unpredictable and tragic case like an earthquake or fire – is too devastating to fathom.  After all, digitizing a physical form is a simple and cheap process.  All you need is a camera or a scanning machine, and you’re immediately equipped to create digital copies of just about any paper-bound materials you can think of.

Once stored in digital form, the quality and usability of paper data become defined through a new metric: how well can that data now be queried, accessed, and manipulated? Once photographed, a document exists as a two-dimensional rasterized file composed entirely of pixels. As such, the data displayed within that picture isn’t machine-readable: it can’t be queried directly on its own. For that data to become digitally accessible, its pixelated text needs to be translated into a plain text string, which is a form of data computers are well equipped to read.  

To accomplish this analog-to-digital transition, Optical Character Recognition (OCR) is the go-to solution.  OCR is the simple process of converting an image of text into machine-readable text, and there are two major ways it can be accomplished – namely, via Pattern Matching or Feature Extraction – but I’ll defer a deeper discussion regarding those methodologies to a more broadly focused article on the subject. Once text becomes machine-readable, it can be stored independently of the two-dimensional file it arrived in.  This means important text can now be queried, manipulated, and stored however we see fit.

Demonstration

The main purpose of this article is to demonstrate two OCR API solutions that you can use to convert your scanned and photographed images into plain, machine-readable text.  Both services offer customizable fault-tolerance settings (basic, normal, and advanced; these respectively increase the number of API calls consumed per conversion) and offer excellent flexibility with support for dozens of common languages, including English, Chinese, Dutch, French, and more.  While these APIs perform a very similar function from a birds-eye view, the primary difference between them is their degree of built-in fault tolerance.  While any adequately tuned OCR service must first preprocess (i.e., remove imperfections from) an image prior to performing its text-recognition function, it’s important to acknowledge that scanned images are necessarily taken under much more favorable conditions than handheld photographs (which are prone to bad angles, prominent background details, etc.), and thus require less exhaustive preprocessing attention.

Each of the below API solutions can be used with a free Cloudmersive account (yields 800 API calls per month), and use the ready-to-run Java code examples provided in this article to structure your API calls.  

Before calling either API, our first step is to install the Cloudmersive OCR API client.  We can do so with Maven by first adding a Jitpack reference to the repository in pom.xml:

 
<repositories>
    <repository>
        <id>jitpack.io</id>
        <url>https://jitpack.io</url>
    </repository>
</repositories>


After that, we need to add a reference to the dependency in pom.xml:

 
<dependencies>
<dependency>
    <groupId>com.github.Cloudmersive</groupId>
    <artifactId>Cloudmersive.APIClient.Java</artifactId>
    <version>v4.25</version>
</dependency>
</dependencies>


Alternatively, we can install with Gradle by first adding a reference in our root build.gradle at the end of repositories:

 
allprojects {
	repositories {
		...
		maven { url 'https://jitpack.io' }
	}
}


And then adding a dependency in build.gradle:

 
dependencies {
        implementation 'com.github.Cloudmersive:Cloudmersive.APIClient.Java:v4.25'
}


With installation finished, we can turn our attention to the larger API call structure.  To convert a scanned image of a document to text, we first need to copy and paste the imports at the top of our file, and then call the function, including our API key and scanned document image in the designated fields:

 
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ImageOcrApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

ImageOcrApi apiInstance = new ImageOcrApi();
File imageFile = new File("/path/to/inputfile"); // File | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.
String recognitionMode = "recognitionMode_example"; // String | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 26-30 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls.  Default recognition mode is 'Advanced'
String language = "language_example"; // String | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)
String preprocessing = "preprocessing_example"; // String | Optional, preprocessing mode, default is 'Auto'.  Possible values are None (no preprocessing of the image), and Auto (automatic image enhancement of the image before OCR is applied; this is recommended).
try {
    ImageToTextResponse result = apiInstance.imageOcrPost(imageFile, recognitionMode, language, preprocessing);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling ImageOcrApi#imageOcrPost");
    e.printStackTrace();
}


To convert a photograph of a document to text, we’ll instead perform the identical process for a slightly different body of code.  Again, we’ll first copy the imports to the top of our file, and then we’ll include the function, adding our API key and handheld document photograph into the same fields as before:

 
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.ImageOcrApi;

ApiClient defaultClient = Configuration.getDefaultApiClient();

// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");

ImageOcrApi apiInstance = new ImageOcrApi();
File imageFile = new File("/path/to/inputfile"); // File | Image file to perform OCR on.  Common file formats such as PNG, JPEG are supported.
String recognitionMode = "recognitionMode_example"; // String | Optional; possible values are 'Basic' which provides basic recognition and is not resillient to page rotation, skew or low quality images uses 1-2 API calls; 'Normal' which provides highly fault tolerant OCR recognition uses 26-30 API calls; and 'Advanced' which provides the highest quality and most fault-tolerant recognition uses 28-30 API calls.  Default recognition mode is 'Advanced'
String language = "language_example"; // String | Optional, language of the input document, default is English (ENG).  Possible values are ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)
try {
    ImageToTextResponse result = apiInstance.imageOcrPhotoToText(imageFile, recognitionMode, language);
    System.out.println(result);
} catch (ApiException e) {
    System.err.println("Exception when calling ImageOcrApi#imageOcrPhotoToText");
    e.printStackTrace();
}


As a reminder, you can set either API to detect the language of your choice by including your preferred language’s three-letter abbreviation in the “string language” field (each applicable language is listed in the code comments adjacent to this field). Additionally, if you’d like to get a better sense of the difference between basic, normal, and advanced fault tolerance for each solution, I recommend testing the same document once with each different value.  The default – and most resource-intensive – setting is ”Advanced” (this consumes about 28-30 API calls per document).  

Please refer to the below for the comprehensive list of languages supported by both of the above APIs:

  • ENG (English), ARA (Arabic), ZHO (Chinese - Simplified), ZHO-HANT (Chinese - Traditional), ASM (Assamese), AFR (Afrikaans), AMH (Amharic), AZE (Azerbaijani), AZE-CYRL (Azerbaijani - Cyrillic), BEL (Belarusian), BEN (Bengali), BOD (Tibetan), BOS (Bosnian), BUL (Bulgarian), CAT (Catalan; Valencian), CEB (Cebuano), CES (Czech), CHR (Cherokee), CYM (Welsh), DAN (Danish), DEU (German), DZO (Dzongkha), ELL (Greek), ENM (Archaic/Middle English), EPO (Esperanto), EST (Estonian), EUS (Basque), FAS (Persian), FIN (Finnish), FRA (French), FRK (Frankish), FRM (Middle-French), GLE (Irish), GLG (Galician), GRC (Ancient Greek), HAT (Hatian), HEB (Hebrew), HIN (Hindi), HRV (Croatian), HUN (Hungarian), IKU (Inuktitut), IND (Indonesian), ISL (Icelandic), ITA (Italian), ITA-OLD (Old - Italian), JAV (Javanese), JPN (Japanese), KAN (Kannada), KAT (Georgian), KAT-OLD (Old-Georgian), KAZ (Kazakh), KHM (Central Khmer), KIR (Kirghiz), KOR (Korean), KUR (Kurdish), LAO (Lao), LAT (Latin), LAV (Latvian), LIT (Lithuanian), MAL (Malayalam), MAR (Marathi), MKD (Macedonian), MLT (Maltese), MSA (Malay), MYA (Burmese), NEP (Nepali), NLD (Dutch), NOR (Norwegian), ORI (Oriya), PAN (Panjabi), POL (Polish), POR (Portuguese), PUS (Pushto), RON (Romanian), RUS (Russian), SAN (Sanskrit), SIN (Sinhala), SLK (Slovak), SLV (Slovenian), SPA (Spanish), SPA-OLD (Old Spanish), SQI (Albanian), SRP (Serbian), SRP-LAT (Latin Serbian), SWA (Swahili), SWE (Swedish), SYR (Syriac), TAM (Tamil), TEL (Telugu), TGK (Tajik), TGL (Tagalog), THA (Thai), TIR (Tigrinya), TUR (Turkish), UIG (Uighur), UKR (Ukrainian), URD (Urdu), UZB (Uzbek), UZB-CYR (Cyrillic Uzbek), VIE (Vietnamese), YID (Yiddish)
Document Convert (command) Java (programming language) API

Opinions expressed by DZone contributors are their own.

Related

  • How To Convert Common Documents to PNG Image Arrays in Java
  • How to Convert Excel and CSV Documents to HTML in Java
  • How to Convert a PDF to Text (TXT) Using Java
  • How to Merge HTML Documents in Java

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!