Sign In/My Account | View Cart  
advertisement


Listen Print Discuss

Processing Atom 1.0
by Uche Ogbuji | Pages: 1, 2, 3

Using Amara Bindery

Because the DOM code above is so clumsy, I shall present similar code using a friendlier Python library, Amara Bindery, which I covered in an earlier article, Introducing the Amara XML Toolkit. Listing 3 does the same thing as listing 2.

Listing 3. Amara Bindery Code to Print a Text Outline of an Atom Feed

from amara import binderytools

doc = binderytools.bind_file('atomexample.xml')

def get_text_from_construct(element):
    '''
    Return the content of an Atom element declared with the
    atomTextConstruct pattern.  Handle both plain text and XHTML
    forms.  Return a UTF-8 encoded string.
    '''
    if hasattr(element, 'type') and element.type == u'xhtml':
        #Grab the XML serialization of each child
        childtext = [ (not isinstance(c, unicode)
                       and c.xml(encoding=u'utf-8') or c)
                      for c in element.xml_children ]
        #And stitch it together
        content = u''.join(childtext).strip().encode('utf-8')
        return content
    else:
        return unicode(element).encode('utf-8')

print 'Feed title:', get_text_from_construct(doc.feed.title)
print 'Feed link:', doc.feed.link

print
print 'Entries:'

for entry in doc.feed.entry:
    etitletext = get_text_from_construct(entry.title)
    print etitletext, '(', entry.link.href, ')'

Using Feedparser (Atom Processing for the Desperate Hacker)

A third approach to reading Atom is to let someone else handle the parsing and just deal with the resulting data structure. This might be especially convenient if you have to deal with broken feeds (and fixing the broken feeds is not an option). It does usually rob you of some flexibility of interpretation of the data, although a really good library would be flexible enough for most users. Probably the best option is Mark Pilgrim's Universal Feed Parser, which parses almost every flavor of RSS and Atom. In my case, I downloaded the 3.3 zip package and installed using python setup.py install. Listing 4 is code similar in function to that of listings 2 and 3.

Listing 4. Universal Feed Parser Code to Print a Text Outline of an Atom Feed

import feedparser

#A hack until Feed parser supports Atom 1.0 out of the box
#(Feedparser 3.3 does not)
from feedparser import _FeedParserMixin
_FeedParserMixin.namespaces["http://www.w3.org/2005/Atom"] = ""

feed_data = feedparser.parse('atomexample.xml')
channel, entries = feed_data.feed, feed_data.entries

print 'Feed title:', channel['title']
print 'Feed link:', channel['link']

print
print 'Entries:'

for entry in entries:
    print entry['title'], '(', entry['link'], ')'

Overall the code is shorter because we no longer have to worry about the different forms of Atom text construct. The library takes care of that for us. Of course I'm pretty leery of how it does so, especially the fact that it strips Namespaces in XHTML content. This is an example of the flexibility you lose when using a generic parser, especially one designed to be as liberal as Universal Feed Parser. That's a trade-off from the obvious gain in simplicity. Notice the hack near the top of listing 4. These two lines should be temporary, and no longer needed, once Mark Pilgrim updates his package to support Atom 1.0.

Wrapping up, on a Grand Scale

Atom 1.0 is pretty easy to parse and process. I may have serious trouble with some of the design decisions for the format, but I do applaud its overall cleanliness. I've presented several approaches to processing Atom in this article. If I needed to reliably process feeds retrieved from arbitrary locations on the Web, I would definitely go for Universal Feed Parser. Mark Pilgrim has dunked himself into the rancid mess of broken Web feeds so you don't have to. In a project where I controlled the environment, and I could fix broken feeds, I would parse them myself, for the greater flexibility. One trick I've used in the past is to use Universal Feed Parser as a proxy tool to convert arbitrary feeds to a single, valid format (RSS 1.0 in my past experience), so that I could use XML (or in that case RDF) tools to parse the feeds directly.

And with this month's exploration, the Python-XML column has come to an end. After discussions with my editor, I'll replace this column with one with a broader focus. It will cover the intersection of Agile Languages and Web 2.0 technologies. The primary language focus will still be Python, but there will sometimes be coverage of other languages such as Ruby and ECMAScript. I think many of the topics will continue to be of interest to readers of the present column. I look forward to continuing my relationship with the XML.com audience.

This brings me to the last hurrah of the monthly round up of Python-XML community news. Firstly, given the topic of this article, I wanted to mention Sylvain Hellegouarch's atomixlib, a module providing a simple API for generation of Atom 1.0, based on Amara Bindery. See his announcement. And relevant to recent articles in this column, Andrew Kuchling wrote up a Python Unicode HOWTO.

Julien Anguenot writes in XML Schema Support on Zope3:

I added a demo package to illustrate the zope3/xml schema integration. [Download the code here]

The goal of the demo is to get a new content object registered within Zope3, with an "add "and "edit" form driven by an XML Schema definition.

    

Also in Python and XML

Should Python and XML Coexist?

EaseXML: A Python Data-Binding Tool

More Unicode Secrets

Unicode Secrets

Making Old Things New Again

The article goes on to show a bunch of Python and XML code to work a sample W3C XML schema file into a Zope component.

Mark Nottingham announced sparta.py 0.8, a simple API for RDF.

Sparta is a Python API for RDF that is designed to help easily learn and navigate the Semantic Web programmatically. Unlike other RDF interfaces, which are generally triple-based, Sparta binds RDF nodes to Python objects and RDF arcs to attributes of those Python objects.

This makes using RDF very natural for people who understand (and sometimes think in terms of) objects. One way to think of it is as a databinding from RDF to Python objects.

See the announcement.

Guido Wesdorp announced Templess 0.1.

Templess is an XML templating library for Python, which is very compact and simple, fast, and has a strict separation of logic and design. It is different from other templating languages because instead of "asking" for data from the template, you "tell" the template what content there is to render, and the template just provides placeholders. Instead of calling into your code from the template, all data for the template is prepared in the code before it is handed over to the templating engine to render. This makes Templess very suitable for programmers, since everything is done from the Python code layer rather than using some domain-specific language from the XML.


Comment on this articleShare your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • A. Locksmith Los Angeles 877-364-5264 Locksmith Security Services, Los Angeles Safe & Vault Sales and Installation For All Your Home & Business Needs Doors installation wood, metal, Kalamein, and fire rated wood doors for both heavyweight commercial
    2008-12-21 12:45:12 services123 [Reply]

    A. Locksmith Los Angeles 877-364-5264 Locksmith Security Services, Los Angeles Safe & Vault Sales and Installation For All Your Home & Business Needs Doors installation wood, metal, Kalamein, and fire rated wood doors for both heavyweight commercial and standard residential needs. Door Closers Panic Bars Fire Exit Devices Floor or Head Check Bolts Door Frames Hinges Buzzer Systems Card Access Elevator Doors magnetic door locks, electric strikes, CCTV install Dvr Video Camera master key systems, CCTV and video surveillance, Alarm Systems, Keyless Entry Systems and Access Control, Iron Works specializes in custom exterior and interior wrought iron design. We design and hand forge wrought iron entrance gates, driveway gates Child guard gates A/C cage gates Elevator doors Backyard gates Fire exit gates Custom design gates Storefront and Commercial Rolling Access Control Automotive Lockout Tools & Kits CodeLock Pushbutton Locks Cylinders Deadbolt Style Gate Locks DigiLock Digital Locks Door Closers Door Locks Electromagnetic Locks Electronic Hardware Exit Hardware Four Drawer Key File Gate Locks & Weldable Gate Box Government Locks High Security Locks Hospital Locks IEI Digital Pushbutton Locks Key Blanks Key Cabinets & Key Storage Key Cutters Key Tags and Key Rings, Key Towers Knob and Lever Gate Locks Locker Locks Locker, Cabinet & Furn. Locks Locksmith Tools and Equipment Manufacture Line Items Misc Hardware Misc Locking Devices NEW PRODUCTS Omni Lock Pepper Spray Picks and Pick Sets Pins and Pinning Kits Pushbutton Gate Locks Pushbutton Locks Re-Keying Safes Schlage Simplex Parts Simplex Pushbutton Lock Simplex, CodeLock, Trilogy Specials Training and Education Trilogy Digital locks Von Duprin Window Locks Patio Door Locks Padlocks.
    Locksmith in Los Angeles (877) 364-5264

  • Locksmiths Locks installation Los Angeles 1-818-386-1022
    2008-10-19 17:15:29 orellytos [Reply]

    Locksmiths and Locks Los Angeles 1-818-386-1022
    fast, reliable, professional locksmiths, superior workmanship & impeccable locksmith security services. Using only quality locks, tools and materials like: medeco lock, schlag locks, MUL T LOCK, schlag locks, Kwikset locks and baldwin locks, our standards of excellence provide you the most return for your home locksmith security investment. Over the years, we have developed a deep respect for the importance of individual expression in home and auto locksmith services. Right from the start of every locksmith project, we strive to fully understand and incorporate your individuality into every phase of planning, design and security locksmith Los Angeles




  • atomixlib
    2005-10-16 11:38:47 SylvainH [Reply]

    Just to let you know that I've updated atomixlib to support some new functionalities from Amara.


    get it at : http://www.defuze.org/oss/atomixlib/

  • Dateutil moved and more accessible
    2005-10-03 07:12:42 Uche Ogbuji [Reply]

    Gustavo Niemeyer moved Dateutil to a spot that doesn't have the HTTPS hiccup I mentioned in the article. The new spot is:


    http://labix.org/python-dateutil


  • Atom 1.0 support for FeedParser
    2005-09-15 11:57:09 aristotle [Reply]

    Be aware that the FeedParser hack will only allow extremely rudimentary consumption of Atom 1.0 feeds. Because several elements have been renamed between 0.3 and 1.0 and the content model is completely different, you will only get useful results if you want nothing more than titles and links.


    However, there is a patch that adds decent support for Atom 1.0 at


    http://fucoder.com/2005/08/feedparser-atom-10-patch/