Now that FeedTools no longer automatically creates the database schema for you, I thought it might be best to put the schema files into rdoc. Of course, rdoc runs those schema files through its text formatter, and the schema files come out more typographically correct on the other side. Except that that’s not really what we want. After a little bit of experimentation, I discovered that if I prefixed the SQL with a SQL comment, and then 2-space indented the SQL that followed the comment, that it would get parsed by rdoc in such a way that you could still copy-paste from the docs straight to whatever SQL frontend you happen to be using.
E.g.:
1
2
3
4
5
6
7
8
9
10
11
|
-- Example PostgreSQL schema
CREATE TABLE feeds (
id SERIAL PRIMARY KEY NOT NULL,
url varchar(255) default NULL,
title varchar(255) default NULL,
link varchar(255) default NULL,
xml_data text default NULL,
http_headers text default NULL,
last_retrieved timestamp default NULL
);
|
By the way, does anyone know of a good SQL frontend for PostgreSQL for OS X? pgAdmin3 crashes on me every 5 seconds or so, and that’s more than a little irritating. At this point, I don’t even care if it’s free/open-source (though that’s a huge bonus). I just want something that works well and doesn’t look hideous.
FeedTools also got a significant speed-up for instances in which http redirection occurs, and the url doesn’t get updated (usually because it’s a permanent redirection instead of a temporary one). In other words, the cache gets updated with the new url, but the open method continues to get called with the old url. FeedTools used to be unaware of the updated feed in the cache and would go out and pull the feed again. This has been changed so that now FeedTools will check the cache before following a redirection to see if the feed is in the cache already and to see whether it’s expired or not. While this definately does increase the number of cache misses during redirection, misses are pretty painless, and the potential speed-up for a hit far outweighs the potential slow-down from the extra misses. I’ll take one or two extra SQL queries over an unnecessary HTTP request any day of the year.
HTTP error messages should now include a list of locations that FeedTools was redirected through before hitting the error. This was inserted primarily for the purposes of debugging.
I removed the global FeedTools.cache_only option in favor of a more granular approach. You can now say:
1
2
3
4
|
feed = FeedTools::Feed.open(
'http://rss.slashdot.org/Slashdot/slashdot',
:cache_only => true)
|
You may notice that I removed the attribute dictionary functionality. If you were using it, sorry about that, but I decided it was too ugly and hackish, not to mention slow. It had to go.
I split the feed_tools.rb file into a couple pieces as well. No more 5000 line files that are a huge pain to navigate.
FeedTools should now also automatically detect User-Agent blocking and deliver a warn if it runs into that.
Update:
The :cache_only configuration option has been renamed to :disable_update_from_remote. So the code should now be:
1
2
3
4
|
feed = FeedTools::Feed.open(
'http://rss.slashdot.org/Slashdot/slashdot',
:disable_update_from_remote => true)
|