Directory Of Feed Parsers
I’m only doing a comparison of parsers here, not feed readers or parsers embedded within feed readers that aren’t available as a separate download, although I suppose… I’m really using a very loose interpretation of the word “parser” here.
| Parser | Language | Rating |
|---|---|---|
| 13th RSS 1.0 to Anything | PHP | Useless |
| Only supports RSS 1.0. As such, not very useful. | ||
| Atom.NET | .NET | Poor |
| Supports only Atom 0.3. Badly. | ||
| CaRP / Grouper Evolution | PHP | Free: Useless Commercial: Not sure |
| CaRP has built-in caching support, but the cache can be difficult to set up. The free version of CaRP is decent for just displaying someone else’s content, but utterly useless for anyone who actually wants a proper parser. Apparently, Grouper Evolution has support for Atom, and if you want access to the actual data, the non-free API will give it to you. | ||
| FeedTools | Ruby | I’m Biased |
| I wrote it. I think it’s pretty good, and there’s a bunch of people who use it and seem to like it. It’s far from perfect, but it does a lot better than most. Which isn’t saying much. | ||
| Informa | Java | Halfway Decent |
| I haven’t used it, but from what I’ve seen of output from programs that do, it does a fairly good job. However, it doesn’t support Atom 1.0. | ||
| Jakarta FeedParser | Java | Halfway Decent |
| I haven’t used it. Output of programs that do seems to be pretty decent. Supports Atom up through the version 0.5 draft. | ||
| lastRSS.php | PHP | Useless |
| Doesn’t support any version of Atom and uses regular expressions to parse. | ||
| Magpie | PHP | Decent |
| Exposes the data pretty well, but can be difficult to use. | ||
| PEAR::Package::XML_Feed_Parser | PHP | Not Sure |
| Looks like one of the better PHP parsers around, at least on paper, but I haven’t used it, so I don’t want to call it “good” unless someone wants to vouch for it. | ||
| PEAR::Package::XML_RSS | PHP | Useless |
| Only supports RSS 1.0. As such, not very useful. | ||
| PyFeed | Python | Fair |
| Support for RSS 2.0 and Atom 1.0 parsing and generation. The code is still at a very early stage. Also supports OPML. | ||
| RSS.NET | .NET | Poor |
| No support for Atom of any kind. | ||
| Ruby Standard Library RSS Parser | Ruby | Poor |
| No support for Atom of any kind. | ||
| RDF (RSS) Parser | PHP | Useless |
| Only supports RSS 1.0. As such, not very useful. | ||
| Rome | Java | Good |
| Supports all of the major feed formats, including Atom 1.0. It’s a solid contender. | ||
| RSS-Parser | PHP | Useless |
| Doesn’t support Atom. Has caching support that requires MySQL. | ||
| rss2array | PHP | Useless |
| Judgeing from the code, this script will die on redirects. That’s kinda bad. | ||
| SimplePie | PHP | Good |
| Passes many of the Atom conformance tests, and can read all but the most obscure Atom edge cases. Parses RSS quite well. It’s almost certainly the best PHP-based parser right now. | ||
| Simple RSS | Ruby | Halfway Decent |
| Very, very flexible, but also easy to break. | ||
| Suttree PHP RSS parser | PHP | Useless |
| Not really a proper parser. Doesn’t seem to handle Atom. | ||
| TailRank FeedParser | Java | Decent |
| An upgraded version of the Jakarta FeedParser that handles Atom 1.0, among other things. | ||
| Universal Feed Parser | Python | Excellent |
| Considered by some to be the “golden standard in complete liberal feed parsing”, it offers more unit tests per square inch than all of the competing solutions combined. It’s still not quite perfect though, and does fail a few of the Atom conformance tests at the moment. | ||
| Untitled RSS Parser | PHP | Useless |
| Doesn’t support Atom, would require effort to adapt to other settings. | ||
| XML::RSS | Perl | Not Sure |
| I’ve never used it, and know nothing of its capabilities. | ||
| XML::RSS::Parser | Perl | Not Sure |
| I’ve never used it, and know nothing of its capabilities. | ||
| XML::RSSLite | Perl | Not Sure |
| I’ve never used it, and know nothing of its capabilities. | ||
Or more accurately, a list of things which claim to parse feeds, but generally do a bloody terrible job of it, however, there’s a few on the list that actually manage to do reasonably well.
I will try to keep this list up to date, so let me know if I’ve missed a parser or if the links to any of the parsers on the list go dead. Feel free to argue with my ratings, since they’re completely subjective, or suggest a rating for the ones I’ve marked “Not Sure”, but be aware that I consider any parser that fails to parse Atom (or similarly, if it fails to parse RSS) to be “poor” unless the author also wrote an adequate sibling parser that does parse Atom.
Thanks for including my XML_Feed_Parser (PHP). I started on it as I’d appreciated the Universal Feed Parser when working in python, but sometimes have to work with PHP and needed a parser I could rely on that had a decent level of abstraction.
I’ve spent some time harnessing the ufp tests and most of the relevant ones seem to run well. There are a few unicode bugs that I am aware of and trying to find workarounds for, but I’m hampered by a lack of time and PHP5’s lack of a good (core) unicode implementation.
James: Ruby doesn’t have a particularly good unicode implementation either. I’m really curious what the switch to Ruby 2.0 will mean for FeedTools.
The XML_Feed_Parser package looked pretty good on paper, but I couldn’t tell how complete it was and I didn’t want to mistakenly call it good without having tried it. I was a lot more willing to call stuff “Worthless” than “Good” since it’s pretty easy to spot crap, but quality is a lot more subtle and very often a matter of taste.
SimplePie shows some promise—and <q>it’ll parse [Atom] as long as you’re not trying to show off how smart you are by doing all kinds of wicked crazy things with it</q>—but I’ll leave the rating determination to you.
TailRank’s Feed Parser is good. It’s based on the Jakarta FeedParser contributed by Rojo.
Could you be prevailed upon to add a specific comment or two to explain why you rated each as you did? As the list now stands, there’s no way to know whether your assessments are based on a quick glance at each product which missed a bunch of features, whether your assessment is biased by your personal preference for specific features which may not matter to people looking at the list (eg. some people may not care whether a parser handles Atom or has a companion script that does), or whether you’ve really given each a thorough look and rated it accordingly. Do you just care about the quality of what the parser spits out for another program to use, or are you looking for storng HTML formatting features so that you don’t have to write your own code to handle that?
All that said, I see that this list is only a day old. I presume it will fill in more with age.
Peter: Well SimplePie certainly has the whole marketting thing going on, though I’m not sure that yoinking the 37signals design gets them bonus points. But the demo link certainly does make it easier to evaluate their parser—well not the part where they used Flash, but otherwise it’s handy. If only every parser on the list had such a thing. (Hmm, maybe I should start with myself.)
...pulls out Atom conformance tests…
Since it pretty much failed almost every single Atom conformance test, I think it’s safe to say that SimplePie is currently not a good choice for anything but RSS, which it seems to parse fairly reliably. In all fairness, however, I don’t think there’s any parser library that manages to pass the entire Atom conformance test suite. Only Snarfer seems to pass the whole thing, but it’s a reader, not a parser.
Antone: The “directory” was mainly created for my own purposes—namely, determining how prevalent the practice of supporting RSS but not Atom was. I’m working on a program that will produce feeds, and I have been strongly considering outputting Atom exclusively.
And honestly, I’m also hoping that giving “worthless” ratings will cause people to adapt. Google gives this site a fairly decent PageRank, which means that chances are pretty good that this page is going to show up on a lot of people’s searches. If I influence parser authors for the better, that can only be a good thing.
As for the ratings, they’re pretty simple. For a “decent” rating, you need to support, at a minimum, RSS 1.0, RSS 2.0, Atom 0.3, and some portion of Atom 1.0 (where “support” is mostly defined as “lets the programmer get at the relevant information easily”). The parser doesn’t have to get it right every time, but it shouldn’t do anything blatantly and horrendously stupid. Like confuse a self link with an alternate link. For a “good” rating or better, the parser author needs to have actually read the specifications and paid some attention to code design. A rating of “poor” generally means that the parser code works but doesn’t support enough of the common feed formats. A rating of “worthless” means that the code is sufficiently broken that it won’t display many common feeds and/or fails to support either Atom or RSS.
Also, I was going to put my reasoning in on the table, but there wasn’t really enough room. :-P
As for the cursory glance thing, yeah, for most of these, mostly I’m just giving them a once over, looking at example code, feature lists, and changelogs. The primary exceptions being CaRP, FeedTools (which I wrote), lastRSS.php, Magpie, Rome, and the Universal Feed Parser, all of which I’ve used in some capacity at some point. For those parsers, I definitely stand by my ratings (or in the case of FeedTools, lack of rating). But seriously, feel free to argue over a rating if you’ve actually used a particular parser and you think I’ve rated it too harshly or if I’ve overrated it.
Thanks for the clarification.
A comment about CaRP-
it has a companion script, Grouper Evolution, for handling Atom feeds (the next major release will have native Atom support). You can see the two in action on, for example, Tim Bray’s feed at API. So certainly if what you’re after is a tree full of parsed data or something like that, CaRP Free isn’t going to do you much good.Antone:I didn’t realize Grouper Evolution existed. Will update the description to reflect this. And yes, getting the data out is kind of important.
I’ve used all of the Ruby ones and most of the PHP ones and I can rave about FeedTools if you like. So far, since switching I haven’t noticed any broken feed stuff that I was seeing before with SimpleRSS and I got a ton more functionality.
Thanks for writing it.
Jim: No raving necessary as ruby stuffs are a lot easier to try out thanks to rubygems and irb. I’d just assume let the code speak for itself.
Glad you found ROME useful.
Nick: If I’m stuck in Java, Rome is definitely where I end up turning to. But I’ve only once had to deal with feeds in Java that I can remember. Mainly because my employer (a Java-only shop) isn’t likely to require feed parsing anytime soon. It really doesn’t make sense within the context of what they’re doing.
Fortunately, I get to play with Ruby in my spare time. Nothing against Rome, you see, but Ruby just makes me happier.
As a note, Atom support is a very high priority for SimplePie. We’ve actually got the testcases from feedparser.org, and we’re in the process of going through all 3000+ of them to test (and significantly improve) support.
Our current goal is to have SimplePie pass at least 75% of the testcases by the time we release 1.0. Our goal is to hit 100% compliance as soon as possible afterwards (no later than a 1.2 release). Keep in mind that we’re still in beta 1 stage.
As far as “yoinking” the 37signals design, that wasn’t my intention. 37signals’ design is very inspiring, yes. But black text on a plain white background with light yellow highlighted text for emphasis isn’t exactly ripping off a design any more than the millions of other sites that do the same thing. I think that thing that screams 37signals is the blue gradient at the top, and the location of the menu. I created the graphic myself by eyeballing the other one, yes, and all of the code is my own.
Either way, I’ve gotten a few comments claiming “rip-off”, and although that isn’t true, I’m in the process of modifying the design so that people stop crying foul.
But please, keep an eye on SimplePie. It’s critics like you that help us make a better parser. Keep the comments coming.
You are right, my clunky old PHP RSS parser doesn’t support Atom (not sure that it even existed all those years ago) nor is it any use.
When I wrote it, of course, it was a bit more useful but, to be honest, I’d go with the Magpie parser myself.
Ryan: Well, personally, I could care less about “ripping” anything off. I’m a firm proponent of “view source.” But it’s always nice if you put a link, say in the footer, along the lines of “Inspired by the fine 37Signals.” It’s not like we aren’t already aware of this.
Duncan: It quite probably didn’t exist at the time. I think that’s the case for a lot of the “parsers” on this list that support only RSS and not Atom. Unfortunately, with so many people referring to “feeds” as “RSS”, I’m slightly concerned that people don’t realize that their parser won’t do everything it should.
SimplePie Beta 3 is out now. Support for gzipped feeds, IDNs, date-related RFCs/ISOs, and extending built-in classes!
Awesome job Ryan. May I recommend that you change your code for autodiscovery to select Atom feeds over RSS 2.0 feeds?
Leave a Response