DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • From APIs to Event-Driven Systems: Modern Java Backend Design
  • Jakarta EE Glossary: The Terms Every Java Engineer Should Actually Understand
  • Translating OData Queries to MongoDB in Java With Jamolingo
  • Scaling AI Workloads in Java Without Breaking Your APIs

Trending

  • No More Cheap Claude: 4 First Principles of Token Economics in 2026
  • Throughput vs Goodput: The Performance Metric You Are Probably Ignoring in LLM Testing
  • How AI Is Rewriting Full-Stack Java Systems: Practical Patterns with Spring Boot, Kafka and WebSockets
  • The Cost of Knowing: When Observability Becomes the Outage
  1. DZone
  2. Data Engineering
  3. Databases
  4. Extracting Text From an Image

Extracting Text From an Image

Advances in Big Data have made extracting text from images a much easier task than it used to be years ago.

By 
Amuda Adeolu user avatar
Amuda Adeolu
·
Apr. 21, 17 · Tutorial
Likes (11)
Comment
Save
Tweet
Share
88.5K Views

Join the DZone community and get the full member experience.

Join For Free

Years ago, extracting text from images seemed to be one of the greatest challenges to all developers. Now, with the arrival of great tools, reading and extracting text from images is easy.

Today's I'll be explaining how to extract text from images using the Java Tesseract API from net.sourceforge.tess4j.

Extracting text from an image means that you are considering the flowchart imagery that's processed to extract the text components and then extracting the geometrical shapes components. The text components are extracted with geometrical components, as well. The internal relationship between the components is set up by tracing the flow lines that connect different components. The extracted components are output to metadata (in XML format), which is machine-readable. This metadata can be archived, stored in a knowledge base, or shared with others.

Below is the code for extracting text from images using the Java Tesseract API from  net.sourceforge.tess4j. 

1. Adding the API

Add the net.sourceforge.tess4j.*; API to your pom.xml:

<dependency> 
 <groupId>net.sourceforge.tess4j</groupId> 
 <artifactId>tess4j</artifactId> 
 <version>3.2.1</version> 
</dependency>

This is the image that we're extracting the text from:
Image title

2. Download the CAPTCHA Language Extractor

Download the CAPTCHA language extractor and put it in the tessdata folder.

For example, if you download eng.trainedata from the above URL, put the file at the project root folder tessdata/eng-trainedata .

 3. Read the Code

Here's the Java code that will read the text from an image in any format:

package com.amudabadmus.awfa;
import net.sourceforge.tess4j.*;
import java.io.*;
public class App {
    public String getImgText(String imageLocation) {
      ITesseract instance = new Tesseract();
      try 
      {
         String imgText = instance.doOCR(new File(imageLocation));
         return imgText;
      } 
      catch (TesseractException e) 
      {
         e.getMessage();
         return "Error while reading image";
      }
   }
   public static void main ( String[] args)
   {
      App app = new App();
      System.out.println(app.getImgText("C:\\Users\\User\\Pictures\\img.png"));
   }
}

Image title

For more information, check out the source code and the demo. 

API Java (programming language) Download Metadata Extract Flowchart

Published at DZone with permission of Amuda Adeolu. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • From APIs to Event-Driven Systems: Modern Java Backend Design
  • Jakarta EE Glossary: The Terms Every Java Engineer Should Actually Understand
  • Translating OData Queries to MongoDB in Java With Jamolingo
  • Scaling AI Workloads in Java Without Breaking Your APIs

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook