DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone >

Python - Very Simple Parser

Snippets Manager user avatar by
Snippets Manager
·
Jan. 09, 07 · · Code Snippet
Like (0)
Save
Tweet
590 Views

Join the DZone community and get the full member experience.

Join For Free
// Very Simple Parser 


from sgmllib import SGMLParser

import urllib

class ParserHTML(SGMLParser):

	def scrivi(self):
		self.f = open('/tmp/fileOUT.html', 'w')

	def unknown_starttag(self, tag, attrs):

		value = 0
		startTAG = '<' + tag
		
		for i in attrs:
			if(i[0].lower() == i[1].lower() and not i[0] == i[1]):
				startTAG = startTAG[:-1] + ' ' + str(i[1])
				value = 1
			else:
				startTAG += ' ' + str(i[0]) + '="' + str(i[1]) + '"'
				value = 0
		
		if(value == 1): startTAG += '"'

		startTAG += '>'
		self.f.write(startTAG + "\n")

	def handle_data(self, data):

		self.f.write(data + "\n")

	def unknown_endtag(self, tag):

		self.f.write('' + "\n")

if __name__ == '__main__':

	p = ParserHTML()
	p.scrivi()
	p.feed(open('/tmp/fileIN.html', 'r').read())
Python (language) Parser (programming language)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • An Overview of 3 Java Embedded Databases
  • 6 Things Startups Can Do to Avoid Tech Debt
  • 10 Steps to Become an Outstanding Java Developer
  • Ultra-Fast Microservices: When Microstream Meets Wildfly

Comments

Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo