> If you have other suggestions I haven't covered, please post them
since you ask, you might try
http://www.dcarlisle.demon.co.uk/htmlparse.xsl
which is an HTML parser written in xslt2 (not sure if it makes sense to call saxon8 from python?)
It currently doesn't fix up bad comments (but could do, I suppose)
|