XML.com: XML From the Inside Out

XML.comWebServices.XML.comO'Reilly Networkoreilly.com
  Articles | Weblogs | Newsletter | Safari Bookshelf
advertisement

Article:
 Wrestling HTML
Subject: parsing html
Date: 2004-09-10 05:33:28
From: David Carlisle


> If you have other suggestions I haven't covered, please post them


since you ask, you might try
http://www.dcarlisle.demon.co.uk/htmlparse.xsl
which is an HTML parser written in xslt2 (not sure if it makes sense to call saxon8 from python?)
It currently doesn't fix up bad comments (but could do, I suppose)


No Previous Message Previous Message   Next Message Next Message

Sponsored By:


Contact Us | Our Mission | Privacy Policy | Advertise With Us | | Submissions Guidelines
Copyright © 2008 O'Reilly Media, Inc. | (707) 827-7000 / (800) 998-9938