Sign In/My Account | View Cart  
advertisement

Article:
 Wrestling HTML
Subject: htmldata
Date: 2005-11-26 03:47:06
From: Connelly


I wrote an HTML/XHTML parser which is available at http://oregonstate.edu/~barnesc/htmldata/ . It's straightforward: it just converts HTML to a list of dictionaries:


>>> htmldata.tagextract('<tr><td>foo<tr a=5>')
[('tr', {}), ('td', {}), 'foo', ('tr', {'a': '5'})]


It's meant to handle "real" HTML, like a browser, and not like the other Python HTML parsers I tried (which balk at even slightly malformed input).


It can also convert the list data structure back to HTML, for filtering/mirroring/proxy reasons. And it can extract/modify URLs in HTML/XHTML/CSS.


- Connelly Barnes


Previous Message Previous Message   Next Message No Next Message


Sponsored By: