Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Python Html2txt

DZone's Guide to

Python Html2txt

·
Free Resource
// description of your code here


p = re.compile('(
  
   )|(
   
    )', re.I)
t = re.compile('
    
     ', re.I)
comm = re.compile('
     ', re.M)
tags = re.compile('<.*?>', re.M)

def html2txt(s, hint = 'entity', code = 'ISO-8859-1'):
    """Convert the html to raw txt
    - suppress all return
    - 
    
   
  

, to return - to tab Need the foolwing regex: p = re.compile('( )|( )', re.I) t = re.compile(' ', re.I) comm = re.compile(' ', re.M) tags = re.compile('<.*?>', re.M) version 0.0.1 20020930 """ s = s.replace('\n', '') # remove returns time this compare to split filter j oin s = p.sub('\n', s) # replace p and tr by \n s = t.sub('\t', s) # replace td by \t s = comm.sub('', s) # remove comments s = tags.sub('', s) # remove all remaining tags s = re.sub(' +', ' ', s) # remove running spaces this remove the \n and \t # handling of entities result = s pass return result

Topics:

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}