Sporkmonger

purveyor of fabulously ambiguous eating utensils

Five Hundred and Two

Posted by sporkmonger
Written June 6th, 2006

I’m beginning to think that perhaps, maybe, I should write a daemon that will, oh, I don’t know, auto-repair my blog when it breaks. Nothing like Daedalus of course, TextDrive doesn’t like that. Just something to make these incessant 502 errors go away, since it seems that lighttpd wants to just die off every once in a while. That and the cron job for bringing the site up after a reboot always seems to go off before PostgreSQL is good and ready to do its thing.

Yeah, that might be a good idea.

More FeedTools Tastiness

Posted by sporkmonger
Written September 27th, 2005

The FeedTools schema has changed.

You’ll want to take a look at the schema.*.sql files in the /db folder.

I finally got around to renaming the xml_data field to feed_data and adding the feed_data_type field. It wouldn’t make any sense to be putting yaml (!okay/news) into a field named xml_data now would it?

I fixed a couple bugs caused by the redirect improvements as well that are pretty much guaranteed to rear their ugly head (and yikes, probably mess up your feeds table as well). Not sure how they slipped through all those unit tests, but they did. So now there’s a couple new tests in place to make sure that doesn’t happen again.

So try to avoid 0.2.11 if you can help it. 0.2.12 is much more cuddly.

FeedTools Schema And Other Short Stories

Posted by sporkmonger
Written September 27th, 2005

Now that FeedTools no longer automatically creates the database schema for you, I thought it might be best to put the schema files into rdoc. Of course, rdoc runs those schema files through its text formatter, and the schema files come out more typographically correct on the other side. Except that that’s not really what we want. After a little bit of experimentation, I discovered that if I prefixed the SQL with a SQL comment, and then 2-space indented the SQL that followed the comment, that it would get parsed by rdoc in such a way that you could still copy-paste from the docs straight to whatever SQL frontend you happen to be using.

E.g.:
1
2
3
4
5
6
7
8
9
10
11

-- Example PostgreSQL schema
  CREATE TABLE feeds (
    id                SERIAL PRIMARY KEY NOT NULL,
    url               varchar(255) default NULL,
    title             varchar(255) default NULL,
    link              varchar(255) default NULL,
    xml_data          text default NULL,
    http_headers      text default NULL,
    last_retrieved    timestamp default NULL
  );

By the way, does anyone know of a good SQL frontend for PostgreSQL for OS X? pgAdmin3 crashes on me every 5 seconds or so, and that’s more than a little irritating. At this point, I don’t even care if it’s free/open-source (though that’s a huge bonus). I just want something that works well and doesn’t look hideous.

FeedTools also got a significant speed-up for instances in which http redirection occurs, and the url doesn’t get updated (usually because it’s a permanent redirection instead of a temporary one). In other words, the cache gets updated with the new url, but the open method continues to get called with the old url. FeedTools used to be unaware of the updated feed in the cache and would go out and pull the feed again. This has been changed so that now FeedTools will check the cache before following a redirection to see if the feed is in the cache already and to see whether it’s expired or not. While this definately does increase the number of cache misses during redirection, misses are pretty painless, and the potential speed-up for a hit far outweighs the potential slow-down from the extra misses. I’ll take one or two extra SQL queries over an unnecessary HTTP request any day of the year.

HTTP error messages should now include a list of locations that FeedTools was redirected through before hitting the error. This was inserted primarily for the purposes of debugging.

I removed the global FeedTools.cache_only option in favor of a more granular approach. You can now say:

1
2
3
4

feed = FeedTools::Feed.open(
  'http://rss.slashdot.org/Slashdot/slashdot',
  :cache_only => true)

You may notice that I removed the attribute dictionary functionality. If you were using it, sorry about that, but I decided it was too ugly and hackish, not to mention slow. It had to go.

I split the feed_tools.rb file into a couple pieces as well. No more 5000 line files that are a huge pain to navigate.

FeedTools should now also automatically detect User-Agent blocking and deliver a warn if it runs into that.

Update:

The :cache_only configuration option has been renamed to :disable_update_from_remote. So the code should now be:

1
2
3
4

feed = FeedTools::Feed.open(
  'http://rss.slashdot.org/Slashdot/slashdot',
  :disable_update_from_remote => true)

Net::HTTP

Posted by sporkmonger
Written July 28th, 2005

I decided to move away from the delightful open-uri method of dealing with feed retrieval in favor of using the Net::HTTP module because of the lack of control over http redirection that open-uri has.

Unfortunately, this has proven to be a lot more difficult than I would have expected—even with the source code for open-uri right in front of me, I’m still having trouble figuring out why, for example, Slashdot’s rss feed seems to be giving me a 301 Moved Permanently redirect to http://slashdot.org/index.rss, and then a 503 Service Unavailable error when I try to retrieve that.

This is irksome because the feed shows up fine in Firefox, I can retrieve it easily with open-uri (which wraps Net::HTTP, so I know it’s not an issue with Net::HTTP itself), and the same code I’m using above manages to retrieve everything but Slashdot quite well.

Basically, I know I’ve got a phantom bug in here somewhere… just gotta track it down somehow…

<grumble>There should be cheat codes for programming.</grumble>

Update (2 hours later):

Wrong:
1
2
3
4
5

Net::HTTP.start('slashdot.org', 80) do |http|
  response = http.request_get(
    'http://slashdot.org/index.rss', http_headers)
end

(Apparently translates to this.)

Right:
1
2
3
4

Net::HTTP.start('slashdot.org', 80) do |http|
  response = http.request_get('/index.rss', http_headers)
end

I’m pretty much kicking myself, because that really should have been obvious…

Also… I should fix Typo so that it’ll let me put <pre> blocks into the comments. Posting code snippets in the comments doesn’t really work too well without it, as I just discovered (tried to put the post update into a comment)…

Update: Code in the comments works now.