DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • How to Convert Excel and CSV Documents to HTML in Java
  • How To Approach Java, Databases, and SQL [Video]
  • An Overview of Programming Languages
  • How To Convert HTML to PNG in Java

Trending

  • Unleashing the Power of Microservices With Spring Cloud
  • Exploring Sorting Algorithms: A Comprehensive Guide
  • A Better Web3 Experience: Account Abstraction From Flow (Part 2)
  • DevSecOps: Integrating Security Into Your DevOps Workflow
  1. DZone
  2. Coding
  3. Languages
  4. How do you parse HTML in Java?

How do you parse HTML in Java?

Geertjan Wielenga user avatar by
Geertjan Wielenga
·
Jan. 16, 08 · News
Like (0)
Save
Tweet
Share
20.29K Views

Join the DZone community and get the full member experience.

Join For Free

The Open Source HTML Parsers in Java page is useful in listing the HTML parsers that are out there. But it doesn't give much of a clue about which are the "best" in a given situation. In other words, how should one decide which HTML parser to use? And, doesn't the proliferation of HTML parsers out there imply that there is something wrong with the JDK's own HTML parser, javax.swing.text.html.HTMLEditorKit.Parser?

All things being equal, shouldn't one prefer to use a utility provided by the JDK over one provided by a third party library? (For this reason, I'm assuming that all things are not equal in this case.) I've been parsing HTML using the JDK's HTML parser, based on the approach described in Parsing HTML with Swing, although that's an article written in 2003, so it may be dated. The author of that article points to this weakness of the Swing HTML parser: "The biggest downside to this HTML parser is that it is not thread safe (thread safety has always been a problem with Swing components). This HTML processor is no different. I have used the Swing parser in heavily threaded environments, and it has resulted in a crash—eventually. If you want to use this HTML processor in a heavily threaded environment, you need to take steps to ensure that only one thread uses it at a time."

Is that the only weakness here? (By the way, on the positive side, the author writes: "I have used this parser with a number of programs that I have written, and I have found it to be very useful. It is particularly helpful for handling improperly formatted HTML, which can trip up some HTML parsers.") I guess the other HTML parsers may have additional features, those that relate to transformation in addition to parsing. And the other parsers probably allow for walking the DOM, rather than inspecting tags in the way that the Swing HTML Parser does. I have used JTidy before, but didn't find the benefits to outweigh the cumbersomeness of having to deal with a third party library.

Anyone care to share their experiences with these utilities?

HTML Java (programming language) Parser (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • How to Convert Excel and CSV Documents to HTML in Java
  • How To Approach Java, Databases, and SQL [Video]
  • An Overview of Programming Languages
  • How To Convert HTML to PNG in Java

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: