Sporkmonger

purveyor of fabulously ambiguous eating utensils

Directory Of Feed Parsers

Posted by sporkmonger
Written February 27th, 2006

I’m only doing a comparison of parsers here, not feed readers or parsers embedded within feed readers that aren’t available as a separate download, although I suppose… I’m really using a very loose interpretation of the word “parser” here.


Parser Language Rating
13th RSS 1.0 to Anything PHP Useless
Only supports RSS 1.0. As such, not very useful.
Atom.NET .NET Poor
Supports only Atom 0.3. Badly.
CaRP / Grouper Evolution PHP Free:
Useless
Commercial:
Not sure
CaRP has built-in caching support, but the cache can be difficult to set up. The free version of CaRP is decent for just displaying someone else’s content, but utterly useless for anyone who actually wants a proper parser. Apparently, Grouper Evolution has support for Atom, and if you want access to the actual data, the non-free API will give it to you.
FeedTools Ruby I’m Biased
I wrote it. I think it’s pretty good, and there’s a bunch of people who use it and seem to like it. It’s far from perfect, but it does a lot better than most. Which isn’t saying much.
Informa Java Halfway Decent
I haven’t used it, but from what I’ve seen of output from programs that do, it does a fairly good job. However, it doesn’t support Atom 1.0.
Jakarta FeedParser Java Halfway Decent
I haven’t used it. Output of programs that do seems to be pretty decent. Supports Atom up through the version 0.5 draft.
lastRSS.php PHP Useless
Doesn’t support any version of Atom and uses regular expressions to parse.
Magpie PHP Decent
Exposes the data pretty well, but can be difficult to use.
PEAR::Package::XML_Feed_Parser PHP Not Sure
Looks like one of the better PHP parsers around, at least on paper, but I haven’t used it, so I don’t want to call it “good” unless someone wants to vouch for it.
PEAR::Package::XML_RSS PHP Useless
Only supports RSS 1.0. As such, not very useful.
PyFeed Python Fair
Support for RSS 2.0 and Atom 1.0 parsing and generation. The code is still at a very early stage. Also supports OPML.
RSS.NET .NET Poor
No support for Atom of any kind.
Ruby Standard Library RSS Parser Ruby Poor
No support for Atom of any kind.
RDF (RSS) Parser PHP Useless
Only supports RSS 1.0. As such, not very useful.
Rome Java Good
Supports all of the major feed formats, including Atom 1.0. It’s a solid contender.
RSS-Parser PHP Useless
Doesn’t support Atom. Has caching support that requires MySQL.
rss2array PHP Useless
Judgeing from the code, this script will die on redirects. That’s kinda bad.
SimplePie PHP Good
Passes many of the Atom conformance tests, and can read all but the most obscure Atom edge cases. Parses RSS quite well. It’s almost certainly the best PHP-based parser right now.
Simple RSS Ruby Halfway Decent
Very, very flexible, but also easy to break.
Suttree PHP RSS parser PHP Useless
Not really a proper parser. Doesn’t seem to handle Atom.
TailRank FeedParser Java Decent
An upgraded version of the Jakarta FeedParser that handles Atom 1.0, among other things.
Universal Feed Parser Python Excellent
Considered by some to be the “golden standard in complete liberal feed parsing”, it offers more unit tests per square inch than all of the competing solutions combined. It’s still not quite perfect though, and does fail a few of the Atom conformance tests at the moment.
Untitled RSS Parser PHP Useless
Doesn’t support Atom, would require effort to adapt to other settings.
XML::RSS Perl Not Sure
I’ve never used it, and know nothing of its capabilities.
XML::RSS::Parser Perl Not Sure
I’ve never used it, and know nothing of its capabilities.
XML::RSSLite Perl Not Sure
I’ve never used it, and know nothing of its capabilities.

Or more accurately, a list of things which claim to parse feeds, but generally do a bloody terrible job of it, however, there’s a few on the list that actually manage to do reasonably well.

I will try to keep this list up to date, so let me know if I’ve missed a parser or if the links to any of the parsers on the list go dead. Feel free to argue with my ratings, since they’re completely subjective, or suggest a rating for the ones I’ve marked “Not Sure”, but be aware that I consider any parser that fails to parse Atom (or similarly, if it fails to parse RSS) to be “poor” unless the author also wrote an adequate sibling parser that does parse Atom.

  1. Written February 27th, 2006 at 10:44 PM

    Thanks for including my XML_Feed_Parser (PHP). I started on it as I’d appreciated the Universal Feed Parser when working in python, but sometimes have to work with PHP and needed a parser I could rely on that had a decent level of abstraction.

    I’ve spent some time harnessing the ufp tests and most of the relevant ones seem to run well. There are a few unicode bugs that I am aware of and trying to find workarounds for, but I’m hampered by a lack of time and PHP5’s lack of a good (core) unicode implementation.

  2. Written February 27th, 2006 at 11:34 PM

    James: Ruby doesn’t have a particularly good unicode implementation either. I’m really curious what the switch to Ruby 2.0 will mean for FeedTools.

    The XML_Feed_Parser package looked pretty good on paper, but I couldn’t tell how complete it was and I didn’t want to mistakenly call it good without having tried it. I was a lot more willing to call stuff “Worthless” than “Good” since it’s pretty easy to spot crap, but quality is a lot more subtle and very often a matter of taste.

  3. Written February 28th, 2006 at 11:12 AM

    SimplePie shows some promise—and <q>it’ll parse [Atom] as long as you’re not trying to show off how smart you are by doing all kinds of wicked crazy things with it</q>—but I’ll leave the rating determination to you.

  4. Migs Migs :
    Written February 28th, 2006 at 11:25 AM

    TailRank’s Feed Parser is good. It’s based on the Jakarta FeedParser contributed by Rojo.

  5. Written February 28th, 2006 at 11:26 AM

    Could you be prevailed upon to add a specific comment or two to explain why you rated each as you did? As the list now stands, there’s no way to know whether your assessments are based on a quick glance at each product which missed a bunch of features, whether your assessment is biased by your personal preference for specific features which may not matter to people looking at the list (eg. some people may not care whether a parser handles Atom or has a companion script that does), or whether you’ve really given each a thorough look and rated it accordingly. Do you just care about the quality of what the parser spits out for another program to use, or are you looking for storng HTML formatting features so that you don’t have to write your own code to handle that?

    All that said, I see that this list is only a day old. I presume it will fill in more with age.

  6. Written February 28th, 2006 at 11:46 AM

    Peter: Well SimplePie certainly has the whole marketting thing going on, though I’m not sure that yoinking the 37signals design gets them bonus points. But the demo link certainly does make it easier to evaluate their parser—well not the part where they used Flash, but otherwise it’s handy. If only every parser on the list had such a thing. (Hmm, maybe I should start with myself.)

    ...pulls out Atom conformance tests…

    • Doesn’t know a self link from a hole in the ground.
    • Fails link conformance test badly.
    • Fails order conformance test badly for the same reasons as the link conformance test. Not looking good so far.
    • Fails title conformance test badly.
    • Dies a horrible death trying to read Tim Bray’s feed. Actually explodes.
    • Fails the xml:base conformance test in exceptionally strange ways.
    • Passes the baseline test, but fails all xml namespace tests. The second test produces the text, but strips the tags, so I can only assume that it mistakenly identified them as unsafe markup.

    Since it pretty much failed almost every single Atom conformance test, I think it’s safe to say that SimplePie is currently not a good choice for anything but RSS, which it seems to parse fairly reliably. In all fairness, however, I don’t think there’s any parser library that manages to pass the entire Atom conformance test suite. Only Snarfer seems to pass the whole thing, but it’s a reader, not a parser.

  7. Written February 28th, 2006 at 12:01 PM

    Antone: The “directory” was mainly created for my own purposes—namely, determining how prevalent the practice of supporting RSS but not Atom was. I’m working on a program that will produce feeds, and I have been strongly considering outputting Atom exclusively.

    And honestly, I’m also hoping that giving “worthless” ratings will cause people to adapt. Google gives this site a fairly decent PageRank, which means that chances are pretty good that this page is going to show up on a lot of people’s searches. If I influence parser authors for the better, that can only be a good thing.

    As for the ratings, they’re pretty simple. For a “decent” rating, you need to support, at a minimum, RSS 1.0, RSS 2.0, Atom 0.3, and some portion of Atom 1.0 (where “support” is mostly defined as “lets the programmer get at the relevant information easily”). The parser doesn’t have to get it right every time, but it shouldn’t do anything blatantly and horrendously stupid. Like confuse a self link with an alternate link. For a “good” rating or better, the parser author needs to have actually read the specifications and paid some attention to code design. A rating of “poor” generally means that the parser code works but doesn’t support enough of the common feed formats. A rating of “worthless” means that the code is sufficiently broken that it won’t display many common feeds and/or fails to support either Atom or RSS.

    Also, I was going to put my reasoning in on the table, but there wasn’t really enough room. :-P

    As for the cursory glance thing, yeah, for most of these, mostly I’m just giving them a once over, looking at example code, feature lists, and changelogs. The primary exceptions being CaRP, FeedTools (which I wrote), lastRSS.php, Magpie, Rome, and the Universal Feed Parser, all of which I’ve used in some capacity at some point. For those parsers, I definitely stand by my ratings (or in the case of FeedTools, lack of rating). But seriously, feel free to argue over a rating if you’ve actually used a particular parser and you think I’ve rated it too harshly or if I’ve overrated it.

  8. Antone Roundy Antone Roundy :
    Written February 28th, 2006 at 12:38 PM

    Thanks for the clarification.

    A comment about CaRP-it has a companion script, Grouper Evolution, for handling Atom feeds (the next major release will have native Atom support). You can see the two in action on, for example, Tim Bray’s feed at API. So certainly if what you’re after is a tree full of parsed data or something like that, CaRP Free isn’t going to do you much good.

  9. Written February 28th, 2006 at 01:21 PM

    Antone:I didn’t realize Grouper Evolution existed. Will update the description to reflect this. And yes, getting the data out is kind of important.

  10. jim jim :
    Written February 28th, 2006 at 06:21 PM

    I’ve used all of the Ruby ones and most of the PHP ones and I can rave about FeedTools if you like. So far, since switching I haven’t noticed any broken feed stuff that I was seeing before with SimpleRSS and I got a ton more functionality.

    Thanks for writing it.

  11. Written February 28th, 2006 at 07:19 PM

    Jim: No raving necessary as ruby stuffs are a lot easier to try out thanks to rubygems and irb. I’d just assume let the code speak for itself.

  12. Nick Lothian Nick Lothian :
    Written February 28th, 2006 at 10:21 PM

    Glad you found ROME useful.

  13. Written February 28th, 2006 at 10:33 PM

    Nick: If I’m stuck in Java, Rome is definitely where I end up turning to. But I’ve only once had to deal with feeds in Java that I can remember. Mainly because my employer (a Java-only shop) isn’t likely to require feed parsing anytime soon. It really doesn’t make sense within the context of what they’re doing.

    Fortunately, I get to play with Ruby in my spare time. Nothing against Rome, you see, but Ruby just makes me happier.

  14. Written March 1st, 2006 at 01:21 AM

    As a note, Atom support is a very high priority for SimplePie. We’ve actually got the testcases from feedparser.org, and we’re in the process of going through all 3000+ of them to test (and significantly improve) support.

    Our current goal is to have SimplePie pass at least 75% of the testcases by the time we release 1.0. Our goal is to hit 100% compliance as soon as possible afterwards (no later than a 1.2 release). Keep in mind that we’re still in beta 1 stage.

    As far as “yoinking” the 37signals design, that wasn’t my intention. 37signals’ design is very inspiring, yes. But black text on a plain white background with light yellow highlighted text for emphasis isn’t exactly ripping off a design any more than the millions of other sites that do the same thing. I think that thing that screams 37signals is the blue gradient at the top, and the location of the menu. I created the graphic myself by eyeballing the other one, yes, and all of the code is my own.

    Either way, I’ve gotten a few comments claiming “rip-off”, and although that isn’t true, I’m in the process of modifying the design so that people stop crying foul.

    But please, keep an eye on SimplePie. It’s critics like you that help us make a better parser. Keep the comments coming.

  15. Written March 1st, 2006 at 11:52 AM

    You are right, my clunky old PHP RSS parser doesn’t support Atom (not sure that it even existed all those years ago) nor is it any use.

    When I wrote it, of course, it was a bit more useful but, to be honest, I’d go with the Magpie parser myself.

  16. Written March 1st, 2006 at 12:04 PM

    Ryan: Well, personally, I could care less about “ripping” anything off. I’m a firm proponent of “view source.” But it’s always nice if you put a link, say in the footer, along the lines of “Inspired by the fine 37Signals.” It’s not like we aren’t already aware of this.

  17. Written March 1st, 2006 at 12:10 PM

    Duncan: It quite probably didn’t exist at the time. I think that’s the case for a lot of the “parsers” on this list that support only RSS and not Atom. Unfortunately, with so many people referring to “feeds” as “RSS”, I’m slightly concerned that people don’t realize that their parser won’t do everything it should.

  18. Written November 3rd, 2006 at 05:17 PM

    SimplePie Beta 3 is out now. Support for gzipped feeds, IDNs, date-related RFCs/ISOs, and extending built-in classes!

  19. Written November 4th, 2006 at 06:36 AM

    Awesome job Ryan. May I recommend that you change your code for autodiscovery to select Atom feeds over RSS 2.0 feeds?

Leave a Response

NOTE: I'm afraid Javascript needs to be on in order to comment.

Comments should be formatted using Textile.

Ruby code should be enclosed within a <macro:code lang="ruby"> element. Other languages are supported. For output you can simply omit the lang attribute.