FeedTools 0.2.27
It’s been a long time coming I guess. Haven’t really had much time to work on it, and when I have, I really haven’t wanted to. Of all the software I’ve ever written, I consider FeedTools to be the most embarrassingly bad. Ironically, it’s probably also the most popular piece of software I’ve ever written.
Anyways, it’s been so long since I made some of the changes that I have listed in the CHANGELOG that I don’t even remember making them. They probably really did happen. Everything finally green-bars in any case. I ditched HTree and replaced it with html5lib. I was surprised by how easy the transition was, but it’s definitely slower for it. I fixed the issues with resolving relative URIs. Got rid of some ugly hacks, added a few more. The schema for the cache changed slightly. On balance, it’s better than 0.2.26, but in keeping with tradition, it’s also a little slower. If you need speed, you’ve come to the wrong place. That hasn’t changed at all.
I’ve learned a lot in the time (What has it been now? 3 years?) since I originally set out to write this parser. Certainly the most important lesson I learned was, “If you build on a lousy foundation, the entire thing’s going to be unstable. At best.” I’m not sure why I didn’t realize much earlier on that REXML was a mistake. Once I was far enough in that I couldn’t easily turn back, I started trying very hard not to make the original mistake in other areas. Incidentally, that’s how Addressable came into existence. It became clear to me that Ruby’s built-in URI parser was terrible, so I stopped using it and wrote a replacement. A replacement that I now swear by. If FeedTools is my greatest embarrassment, Addressable is (for the moment) my pride and joy. (Actually, that might be a bit much. It’s still just a URI parser.)
Also, while I was at it, I put the API up again.
Hopefully I won’t have to touch FeedTools again for a very long time.
I agree that rexml is painful. What would you do if you had it all to do over again? Would you choose to use something other than rexml?
(Hobo requires rexml 3.1.6 and breaks with later packages, but I need bug fixes present in 3.1.7.1. REXML FAIL.)
Yeah, I’d probably opt for libxml2.
I have a pure Ruby script (not using Rails) that is using FeedTools and sqlite3 for caching.
I can run the script fine from the command line via:
<macro:code>ruby myscript.rb</macro>But when I run the script via CRON with the following
<macro>/usr/bin/ruby /path/to/myscript.rb</macro>I get the following error for each feed I try to process:
<macro:code> The feed cache seems to be having trouble with the find_by_href method. This may cause unexpected results. The feed cache seems to be having trouble with the find_by_href method. This may cause unexpected results. The feed cache seems to be having trouble with the find_by_href method. This may cause unexpected results. The feed cache seems to be having trouble with the find_by_href method. This may cause unexpected results. The feed cache seems to be having trouble with the find_by_href method. This may cause unexpected results. Error processing group: 1, feed: http://hosted.ap.org/lineups/POLITICSHEADS-rss_2.0.xml?SITE=CTNHR&SECTION=HOME #<runtimeerror:> /usr/lib/ruby/gems/1.8/gems/feedtools-0.2.29/lib/feed_tools/feed.rb:190:in `update!’ /usr/lib/ruby/gems/1.8/gems/feedtools-0.2.29/lib/feed_tools/feed.rb:155:in `open’ </macro>Any thoughts because I’m baffled
Oh great. At the moment we have the same problem. But Feed Tools seems to be a nice solution for this. Thanx for the post.
Leave a Response