|
|
 |
Article:
 |
 |
Parsing RSS At All Costs
|
| Subject: |
Agreed. So what's your solution? |
| Date: |
2003-01-22 20:47:33 |
| From: |
Max Daymon |
|
Response to: Agreed. So what's your solution?
|
|
|
Build in functionality to report back to feeds providing garbage data. Make it easy to report to sites that their feeds are causing a problem.
The path of silently dealing with garbage data leads to excessive amounts of development time being spent on a problem which should take virtually no time. Further, it reflects poorly on the aggregator when it does run into a feed it can't deal with. Instead of blaming the feed, users now blame the tool for not handling it.
If I can't reasonably rely on RSS being well formed and complying to an industry standard specification, I'm more inclined to simply remove the functionality than to enter an endless back and forth battle of regular expressions and garbage data.
Put a fence at the top of the cliff, not an ambulance force at the bottom. Tools which generate problems will eventually fall from favor. All things considered, 10% failure for such a technology seems promising. There was a time when it was hardly possible to find ANY well formed web pages.
|
- Agreed. So what's your solution?
2004-03-04 09:56:00 Richard Prosser
[Reply]
As an end user, I want a news aggregator that works for whatever feeds I refer to, thus I am very grateful for Mark's efforts.
I understand the "well formed" arguments however, and the difficulties inherent in providing feedback. I suggest that we shame the poorly-behaving sites by publishing their URLs for all the world to see, then issuing a press release.
How about naffrss.org?
- Auto-reporting
2003-01-22 21:07:36 Mark Pilgrim
[Reply]
Many feeds have no contact information, so this can not be easily automated. Regardless, I believe efforts are underway to do exactly this (when possible) in the next release of Aggie. Users who care about such things can take the time to contact the content provider.
However, this does not negate the fact that, as an end-user product, the #1 responsibility of the software is to the end user. The end user wishes to read news, and has downloaded, installed, and possibly paid for a program to help them read news. If the program refuses to display news for reasons that the end user considers arcane and trivial, the user will find another program that does not throw such technical hissy fits.
- Do both
2003-01-22 22:42:19 Chris Adams
[Reply]
Why not do both? If the XML validator fails, display an unobtrusive quality indicator like iCab (the smiling face in the throbber changes to a frown for malformed HTML), automatically send some sort of request to a tracking site and fall back to the error-prone all-costs parser.
The tracking site would be extremely valuable if it could track the buggy software instead of just individual sites. Feeding crawler with, say, the weblogs.com feed would probably give a pretty accurate indicator of the relatively quality of the RSS implementations. While the users may not care, the authors might be more motivated about getting unlisted from the hall of shame.
|
 |
Sponsored By:
|