Sporkmonger

purveyor of fabulously ambiguous eating utensils

Directory Of Feed Parsers

Posted by sporkmonger
Written February 27th, 2006

I’m only doing a comparison of parsers here, not feed readers or parsers embedded within feed readers that aren’t available as a separate download, although I suppose… I’m really using a very loose interpretation of the word “parser” here.


Parser Language Rating
13th RSS 1.0 to Anything PHP Useless
Only supports RSS 1.0. As such, not very useful.
Atom.NET .NET Poor
Supports only Atom 0.3. Badly.
CaRP / Grouper Evolution PHP Free:
Useless
Commercial:
Not sure
CaRP has built-in caching support, but the cache can be difficult to set up. The free version of CaRP is decent for just displaying someone else’s content, but utterly useless for anyone who actually wants a proper parser. Apparently, Grouper Evolution has support for Atom, and if you want access to the actual data, the non-free API will give it to you.
FeedTools Ruby I’m Biased
I wrote it. I think it’s pretty good, and there’s a bunch of people who use it and seem to like it. It’s far from perfect, but it does a lot better than most. Which isn’t saying much.
Informa Java Halfway Decent
I haven’t used it, but from what I’ve seen of output from programs that do, it does a fairly good job. However, it doesn’t support Atom 1.0.
Jakarta FeedParser Java Halfway Decent
I haven’t used it. Output of programs that do seems to be pretty decent. Supports Atom up through the version 0.5 draft.
lastRSS.php PHP Useless
Doesn’t support any version of Atom and uses regular expressions to parse.
Magpie PHP Decent
Exposes the data pretty well, but can be difficult to use.
PEAR::Package::XML_Feed_Parser PHP Not Sure
Looks like one of the better PHP parsers around, at least on paper, but I haven’t used it, so I don’t want to call it “good” unless someone wants to vouch for it.
PEAR::Package::XML_RSS PHP Useless
Only supports RSS 1.0. As such, not very useful.
PyFeed Python Fair
Support for RSS 2.0 and Atom 1.0 parsing and generation. The code is still at a very early stage. Also supports OPML.
RSS.NET .NET Poor
No support for Atom of any kind.
Ruby Standard Library RSS Parser Ruby Poor
No support for Atom of any kind.
RDF (RSS) Parser PHP Useless
Only supports RSS 1.0. As such, not very useful.
Rome Java Good
Supports all of the major feed formats, including Atom 1.0. It’s a solid contender.
RSS-Parser PHP Useless
Doesn’t support Atom. Has caching support that requires MySQL.
rss2array PHP Useless
Judgeing from the code, this script will die on redirects. That’s kinda bad.
SimplePie PHP Good
Passes many of the Atom conformance tests, and can read all but the most obscure Atom edge cases. Parses RSS quite well. It’s almost certainly the best PHP-based parser right now.
Simple RSS Ruby Halfway Decent
Very, very flexible, but also easy to break.
Suttree PHP RSS parser PHP Useless
Not really a proper parser. Doesn’t seem to handle Atom.
TailRank FeedParser Java Decent
An upgraded version of the Jakarta FeedParser that handles Atom 1.0, among other things.
Universal Feed Parser Python Excellent
Considered by some to be the “golden standard in complete liberal feed parsing”, it offers more unit tests per square inch than all of the competing solutions combined. It’s still not quite perfect though, and does fail a few of the Atom conformance tests at the moment.
Untitled RSS Parser PHP Useless
Doesn’t support Atom, would require effort to adapt to other settings.
XML::RSS Perl Not Sure
I’ve never used it, and know nothing of its capabilities.
XML::RSS::Parser Perl Not Sure
I’ve never used it, and know nothing of its capabilities.
XML::RSSLite Perl Not Sure
I’ve never used it, and know nothing of its capabilities.

Or more accurately, a list of things which claim to parse feeds, but generally do a bloody terrible job of it, however, there’s a few on the list that actually manage to do reasonably well.

I will try to keep this list up to date, so let me know if I’ve missed a parser or if the links to any of the parsers on the list go dead. Feel free to argue with my ratings, since they’re completely subjective, or suggest a rating for the ones I’ve marked “Not Sure”, but be aware that I consider any parser that fails to parse Atom (or similarly, if it fails to parse RSS) to be “poor” unless the author also wrote an adequate sibling parser that does parse Atom.

Y2038 Feed

Posted by sporkmonger
Written December 28th, 2005

So, I’m curious, how many parsers/feed readers show an incorrect date for this feed or just blow up for that matter?

So far in my testing, NetNewsWire has been the only parser that correctly displays the date given in the feed. FeedTools can’t parse the date and reverts to the current date and time instead. So does Google Reader and Bloglines, as well as most of the online feed readers I tried it out with.

The Feed Validator does helpfully warn me of an “implausible date”, but the feed is perfectly valid Atom 1.0, so far as I can tell.

I’m honestly not too worried about the issue of parsers not being able to handle the concept of 2038 when it roles around. By 2038, the concept of feeds will likely seem utterly obsolete. But I can’t help but wonder if some parsers will end up tossing an exception on this feed. I couldn’t find anything in Mark Pilgrim ’s UFP test suite for dates being out of range. The largest date among the tests seems to be some time in 2004.