DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
  1. DZone
  2. Coding
  3. Languages
  4. Cool Python Module: html2text

Cool Python Module: html2text

Mike Dirolf user avatar by
Mike Dirolf
·
Jan. 12, 12 · Interview
Like (0)
Save
Tweet
Share
11.02K Views

Join the DZone community and get the full member experience.

Join For Free

Parts of the Fiesta API (as well as some new features we’re rolling out for Fiesta itself) rely on the ability to automatically generate clean text (markdown) versions of incoming messages. Our parser tends to prefer using the text version of a message if possible, as it’s generally easier to parse than the HTML version. That said, sometimes a message only contains an HTML version - we need a way to generate our canonical markdown representation from that alone. Enter html2text.

html2text is a great little Python module by Aaron Swartz that takes HTML as input and generates a markdown version as output. Here’s an example:

>>> import html2text
>>> print html2text.html2text('''<html><body><p>Hello World</p>
<ul><li>Here's one thing</li>
<li>And here's another!</li></ul></body></html>''')
Hello World

  * Here's one thing
  * And here's another!

The output is nicely formatted markdown text, exactly what we were looking for. The only problem we’ve noticed is that the module has some trouble dealing with malformed HTML. Our approach has been to run things through BeautifulSoup first, which tends to do a great job even with crappy markup.

 

Source: http://blog.fiesta.cc/post/14575090271/cool-python-module-html2text

Python (language)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • A Gentle Introduction to Kubernetes
  • Stop Using Spring Profiles Per Environment
  • Kubernetes-Native Development With Quarkus and Eclipse JKube
  • File Uploads for the Web (2): Upload Files With JavaScript

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: