|
|
 |
Article:
 |
 |
Parsing RSS At All Costs
|
| Subject: |
Benefits and harms are not evenly distributed |
| Date: |
2003-01-22 21:24:36 |
| From: |
Mark Pilgrim |
|
Response to: Agreed. So what's your solution?
|
|
|
re: "The temporary benefit of being able to read ill-formed RSS feeds is outweighed by the harm caused to XML and the Web"
The problem is that the benefit is accrued by the software vendor, and is direct and immediate, but the harm is caused to everyone equally, and is long-term and abstract. Direct and immediate wins every time.
|
- Benefits and harms are not evenly distributed
2003-01-23 08:03:10 bryan rasmussen
[Reply]
Direct and immediate wins everytime reminds me of Hardin's arguments vis-a-vis the commons, since come under some controversy.
It is in the main a philosophical argument, but as such I can not see how it is a sensible one.
You say the direct and immediate wins everytime, implying that newsreaders will have to parse everything that proclaims itself RSS whether it is or not because of business pressures to do so. But if a public newsreader did not parse the RSS instead returning a broken message to the clients of said feed then would this not create direct and immediate pressures on feed authors and sites to produce valid xml, and would this not spur product sales for RSS producers that produced valid RSS?
Part of the reason for xml (which after all is a simpler set of rules than most other languages) that is not well-formed with RSS is of course that RSS (2.0 and pre 1.0) allows escaped html inside of the description element, a practice I believe much more likely to cause broken feeds. As I've harped on before this hampers the transportability of feeds across media, to for example a non-html email newsletter format, various phone media, or even specific browsers.
It seems to me that a vendor that produced both a RSS producer and consumer that could be relied on to produce only well-formed feeds could derive direct and immediate benefits against other vendors, because of reuse of xml in other media.
- End-user perspective
2003-01-23 08:37:02 Mark Pilgrim
[Reply]
> "implying that newsreaders will have to parse everything that proclaims itself RSS whether it is or not because of business pressures to do so."
Exactly.
> "But if a public newsreader did not parse the RSS instead returning a broken message to the clients of said feed then would this not create direct and immediate pressures on feed authors and sites to produce valid xml"
No. You are punishing the wrong people. You are still operating under the mistaken impression that XML, in and of itself, is important. It is not. It is a means to an end. End users don't care. And they shouldn't have to care.
Look, I was in this position: I tried several news first-generation aggregators that only used real XML parsers. Feeds would go unreadable for days at a time, and by the time they came back I had missed dozens of articles. I tried to switch to another aggregator that could allow me to follow the sites I wanted to follow, but none satisfied me, so I ended up writing the parse-at-all-costs RSS parser and building a homegrown aggregator around it for my own use.
And I'm *technically inclined*. I *care* about XML. Imagine the reaction of an end user who isn't, and doesn't. They bought (downloaded/whatever) a program that purports to help them read all the news and follow all the sites that they care about. They like this idea. Then they find out that sometimes it doesn't work, sometimes sites that worked yesterday don't work today, and some sites don't work at all, because of something called "XML". They don't know from XML, they've never seen XML, they don't care about XML, but this stupid POS program is complaining and saying there's nothing it can do about this "XML" problem and suggesting, in its infinite wisdom, that the end user should take it upon themselves to work around this problem by sending an email to the site owner and waiting an indeterminate length of time before they can read the news they care about, if ever.
You're kidding, right?
Then the user hears about another aggregator, a direct competitor, which claims to be able to let them follow *all* the sites they care about. It doesn't complain; it doesn't whine; it doesn't suggest that they work around the developer's laziness by firing off emails to random people they've never met. It just works.
Which would *you* choose?
- End-user perspective
2003-01-24 03:55:28 bryan rasmussen
[Reply]
>You're kidding, right?
No, but that is because I'm not really viewing an aggregator as a tool in itself, I don't think aggregators have much of a business future. I think they're destined to become part of other products.
>Then the user hears about another aggregator, a >direct competitor, which claims to be able to >let them follow *all* the sites they care about. >It doesn't complain; it doesn't whine; it >doesn't suggest that they work around the >developer's laziness by firing off emails to >random people they've never met. It just works.
Again, I don't believe in aggregators as stand-alone tools, I believe that they will become part of more wide-ranging products.
If such a product has to do with handling XML of widely different formats then it cannot devote development resources to handling stuff that thinks it's XML but really isn't.
A product can provide add-ins to convert legacy formats, but I don't think badly formed RSS will qualify for such attentions.
If such a product is the object then the well-formedness of the XML becomes integral to the product, development will have to provide ways to error report problems with individual XML instances, such as those originating from a feed.
This is not developer laziness, but developer ambition.
Error reporting to a user has always seemed to me to be an exercise in the art of communication. If a non-technical user receives the error message
"XML error at
hello world " then they might well be expected to say "This program sucks" if on the other hand they receive information like "Newsfeed at http://www.myinfo.com/newsfeed7 is not conforming to the technical standards for newsfeeds, if you would like to learn more click More Info" then I would expect the user to think something like "Frigging amateurs at www.myinfo.com" despite not automatically fixing www.myinfo.com for the user the program may still command market share if it does enough other things with various other XML technologies. This may cause you to think again that I'm kidding but I'm not, I think a lot of these problems stem from the technical communities believe that the end user is an idiot. The end user may not understand XML or any other standard, but I have faith enough in the intelligence of people to understand a claim that such and such a thing does not conform to a standard.
But I guess we can't agree on that matter.
- Breaking Industry Standards A Competitive Advantage?
2003-01-23 10:08:12 Dare Obasanjo
[Reply]
I've heard your arguments before from other people and don't agree with them. Thankfully, those of us who work on core XML technologies at Microsoft don't have this attitude towards XML and related standards simply because we want to gain "competitive advantage". If we did many of the gains that XML brings to the our users due to its reusability and ability to foster interoperability would be lost.
Your article highlights a mini tragedy of the commons. If XML applications that process RSS documents begin to lean towards processing ill-formed XML then when RSS files are reused such as many XML formats are wont to be (e.g. some mention using RSS for weblog archives, others have suggested using it as a general push technology) then this sloppiness and lack of standards adherence will creep into this avenues as well.
All in all it's interesting to read a column called Dive Into XML on a website called XML.com which encourages poisoning the XML in the name of "competitive advantage".
- Robustness Principle
2003-01-23 10:40:11 Mark Pilgrim
[Reply]
This has nothing to do with the tragedy of the commons (boy, there's an overused phrase). It has everything to do with the Robustness Principle that Postel nailed years ago in RFC 793: "TCP implementations will follow a general principle of robustness: be conservative in what you do, be liberal in what you accept from others." The same applies here: validators and programs that produce RSS should be as conservative as possible; end user tools that consume RSS should be as liberal as possible. They serve different masters.
I'm tired of arguing with you, Dare. Despite your misrepresentation, we can all see for ourselves that my article clearly demonstrates an actual problem, describes a workaround for consuming tools, and pushes for not one but two long-term social solutions (the centralized advocacy effort at Syndic8, and the decentralized solution of making non-well-formedness visible to the end user).
Meanwhile, it's ironic that you hold up Microsoft as the epitome of XML standards compliance. What short memories we have! Have a quick look back in the XML mailing list archives to see all the confusion their ultra-liberal MSXML parser caused with people who mistook it for an actual validating XML parser. ("Whatdya mean my XML's not well-formed? It looks fine when I open it in IE!") That was not the place to parse at all costs; this is.
- You Prove My Point
2003-01-23 11:11:58 Dare Obasanjo
[Reply]
Actually a number of our customers regularly praise the standards compliance of MSXML.
Unfortunately, we also have customers who mistakenly assume that viewing XML in Internet Explorer causes it to be processed by the validating XML parser instead of the well-formed XML parser which is not the case. This design decision was before my time but was most likely motivated by good intentions similar to yours about reducing user pain and ensuring that even invalid but well-formed XML was viewable in the browser. No one thought to think about what would happen downstream when people assumed that
viewable in IE == well-formed & validated XML
instead of just
viewable in IE == well-formed XML
Your attempted slur actually helps bolster my point as to why your article should not be encouraging supposedly "user-friendly" but standards unconformant behavior.
|
 |
Sponsored By:
|