Sporkmonger

purveyor of fabulously ambiguous eating utensils

FeedTools 0.2.27

Posted by sporkmonger
Written January 31st, 2008

It’s been a long time coming I guess. Haven’t really had much time to work on it, and when I have, I really haven’t wanted to. Of all the software I’ve ever written, I consider FeedTools to be the most embarrassingly bad. Ironically, it’s probably also the most popular piece of software I’ve ever written.

Anyways, it’s been so long since I made some of the changes that I have listed in the CHANGELOG that I don’t even remember making them. They probably really did happen. Everything finally green-bars in any case. I ditched HTree and replaced it with html5lib. I was surprised by how easy the transition was, but it’s definitely slower for it. I fixed the issues with resolving relative URIs. Got rid of some ugly hacks, added a few more. The schema for the cache changed slightly. On balance, it’s better than 0.2.26, but in keeping with tradition, it’s also a little slower. If you need speed, you’ve come to the wrong place. That hasn’t changed at all.

I’ve learned a lot in the time (What has it been now? 3 years?) since I originally set out to write this parser. Certainly the most important lesson I learned was, “If you build on a lousy foundation, the entire thing’s going to be unstable. At best.” I’m not sure why I didn’t realize much earlier on that REXML was a mistake. Once I was far enough in that I couldn’t easily turn back, I started trying very hard not to make the original mistake in other areas. Incidentally, that’s how Addressable came into existence. It became clear to me that Ruby’s built-in URI parser was terrible, so I stopped using it and wrote a replacement. A replacement that I now swear by. If FeedTools is my greatest embarrassment, Addressable is (for the moment) my pride and joy. (Actually, that might be a bit much. It’s still just a URI parser.)

Also, while I was at it, I put the API up again.

Hopefully I won’t have to touch FeedTools again for a very long time.

Project List

Posted by sporkmonger
Written April 1st, 2007

I’m willing to admit to being responsible for:

There’s some other bits of code floating around in various states of disrepair.

I’ve also got a couple of C libraries I’m working on, but I’m not likely to talk much about them until they’re further along.

Monkey Patching Goodness

Posted by sporkmonger
Written July 8th, 2006

FeedTools 0.2.25: Now with 625 lines of monkey patching, and all the same terrible performance you’ve come to expect!

I decided to extract all of my REXML monkey patches out into a single file instead of leaving them all in feed_tools.rb for this release. Tests should all pass on Ruby 1.8.4 now. And Sam Ruby’s feed should be handled correctly again. His use of ”.” as his link uri caused one of the parser’s heuristics to throw a hissy fit and misreport the feed’s uri as nil and the value of the feed’s link as the feed’s uri. Weird stuff. Anyways, that works again. (NetNewsWire was breaking on Sam’s feed last I checked.) HTTP redirection handling has been changed in that FeedTools won’t barf if a relative Location: header is supplied. And the parser should generally work a little bit better with FeedUpdater.

I’ll probably make another release when I get around to integrating my new URI code. After that, that will likely be the last release for quite some time. Virtually all of my free coding time will be being spent on GentleCMS instead. Just a heads-up.

Please Don't Waste Other People's Bandwidth

Posted by sporkmonger
Written May 28th, 2006

This is just a quick note to everyone out there who’s using FeedTools. If you don’t enable some sort of caching mechanism, you are inevitably wasting other people’s bandwidth, and by extension, their money. Please be courteous and enable caching. It’s not hard to do so, just run this prior to using any other FeedTools functionality:

1
2
3

require 'feed_tools'
FeedTools.configurations[:feed_cache] = "FeedTools::DatabaseFeedCache"

For Rails users, that line should go in environment.rb.

And you’ll need to either run the SQL table creation code in FeedTools’ db folder or if you’re a Rails user, you’ll need to use the supplied migration file (also in the db folder) and update your database.

If you need to, it is also possible to create your own caching mechanisms for use with FeedTools.

In any case, the point is, you should never, ever use FeedTools in a production capacity without first enabling a caching mechanism. It’s just impolite.

FeedUpdater and FeedTools

Posted by sporkmonger
Written April 13th, 2006

Sitting here at Canada on Rails… Just released two new gems, FeedTools 0.2.24 and FeedUpdater 0.1.0.

I rewrote the HTTP retrieval code for FeedTools, and it now has full support for HTTP proxies as well as basic auth. Even better, I pulled all the HTTP stuff out of the load_remote_feed! method and put it into its own helper module so that the advanced HTTP stuff is available throughout FeedTools. Additionally, in the midst of the rewrite, the strange timeouts that would sometimes occur when caching was enabled, those went away by magic. Yay.

There is also a nice handy new tool I wrote for correctly using FeedTools in a Rails app. It doesn’t have to be used within a Rails app, but certainly, that’s where it will work best. It probably won’t work on Windows though, due to a lack of fork().

To use, sudo gem install feedupdater, then cd to your Rails app and feed_updater install. This will install the feed_updater script into your Rails app’s scripts directory, add a new config file for controlling it, and it unpacks the gems for FeedTools and FeedUpdater into your vendor directory. You will need to also write a new updater script for FeedUpdater. You’ll probably want to put this in your lib folder, and point to it with the config/feed_updater.yml config file.

Note: The install command will overwrite the currently existing feedtools or feedupdater directory in the vendor directory.

Here’s an example file (included with FeedUpdater, in the example directory):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

class CustomUpdater < FeedTools::FeedUpdater
  on_update do |feed, seconds|
    logger.info(
      "Loaded '#{feed.href}'.")
    logger.info(
      "=> Updated (#{feed.title}) in #{seconds} seconds.")
  end
  
  on_error do |href, error|
    logger.info("Error updating '#{href}':")
    logger.info(error)
  end

  on_complete do |updated_feed_hrefs|
  end
end

You would use the on_update method to copy data from the parsed feed into whatever database tables or other locations where it’s required.

Then once you’ve got your custom updater script, cd to your Rails app and run script/feed_updater start.

I will likely update FeedUpdater again fairly soon.

Update: Now, with multithreading!

Tags: