Tutorial
Since a tutorial was requested for setting FeedTools up with Rails, I thought I’d do a quick example. It’s really, really not hard. I packaged the whole thing up and put it on RubyForge, and maybe someday I’ll polish it up. Probably not until I have support for subscriptions in place. Not really worth it until then.
Anyhow, without further ado, I give you… your really short tutorial.
Ok, first off, you’re going to want to start loading the feed as early as possible. Ideally, you should start loading it within a thread at the beginning of the controller’s action, do whatever other activities you need to, then join on the thread. This is especially true if you’re loading multiple feeds at once. In the interests of simplicity though, we’re only going to load one feed, and since the feed is the only thing we care about in this action, we’ll just skip the threading bit.
So our ‘view’ action will simply look like this:1 2 3 4 |
def view @feed = FeedTools::Feed.open(@params['feed_url']) end |
We have a form as part of our layout that points to the view action and passes off a feed_url, so you can pick whatever feed you want to display.
In the view, it’s a simple matter of displaying the feed’s attributes and looping through the feed items and then displaying their attributes as well.
Take a look:1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
<div class="contentBox">
<h1><a href="<%= h @feed.link %>">
<%= @feed.title %></a></h1>
<p><%= @feed.description %></p><%
for feed_item in @feed.items %>
<div class="feedItemBox">
<h2><a href="<%= h feed_item.link %>">
<%= feed_item.title %></a></h2>
<p><%= feed_item.description %></p>
</div><%
end
%>
</div>
|
The second part of the tutorial is even simpler. Just for fun I threw in some of the xml building capabilities. If you switch over to the ‘xml’ action, you can translate any feed to another variety of xml feed. Right now, RSS 1.0, 2.0, and Atom 0.3 are the only valid options, but eventually the xml generation should run the whole gamut of varieties.
Here’s the ‘xml’ action:1 2 3 4 5 |
def xml @feed_type = @params['id'] @feed = FeedTools::Feed.open(@params['feed_url']) end |
1 2 |
@feed.build_xml((@feed_type or "atom"), 0.0, xml) |
That’s all there is to it.
You’ll probably notice in the code that I disabled the cache because I wanted to have the thing be configuration-free, but feel free to point the database.yml file to your own database and turn caching back on.
The hardcore XHTML fans should also be happy to discover that I chose to deliver the content for this tutorial as XHTML 1.1 with the application/xhtml+xml content type. Thanks to Tidy, all of the nasty-looking html from the feed gets polished up, and everything works out perfectly.
Also, FeedTools 0.2.3 is out. It’s just a hotfix for a nasty bug that several people have pointed out, where get parameters were ignored.
Update: Martin Dittus wrote a really nice introduction to FeedTools, focussing on how to get it working in a non-Rails project.
Update: Apparently, threads aren’t so hot when a database-backed cache is involved because it opens up tons of extra connections to the database.
What, actually, would one want to use this for? Republishing feed content?
The example seems focused on reassuring Railer Nubies rather than exposing the general value of the tool for general Ruby hacking. Are there more examples forthcoming?
Hmm, what would you use it for? I doubt you would want to use the tutorial code for anything in the real world. That wasn’t the point. The point was to demonstrate that it’s not nearly as difficult to use as some people thought it was.
At the time I wrote it, people were wondering where all the documentation was, and the point I was trying to make with the tutorial was mostly that there wasn’t a pressing need for extensive documentation, because the library itself had such a simple interface.
This article was written quite a while ago. Since then, documentation has gotten a lot more complete, and there’s a lot more example code floating around that would probably be a better starting point for a new user. I just linked one such example at the end of the article.
And yes, more examples will likely show up from time to time, but as more people start to use it, you should expect that an increasing number of those examples will be coming from not-me. When good ones do appear, I’ll try to make sure to link to them.
Excellent, But I have two questions concerning this framework. Is the feed_tool thread safe ? Can you add to feed_tool a support for http proxy?
Thank You
Regard thread safety: To the best of my knowledge, threading should not be an issue. However, there are known to be major issues with trying to use the caching system within threads. If you wrote your own custom caching system, everything should be ok.
Regarding http proxying: There’s no reason why it couldn’t happen, but it’s not something I think many people need. Meanwhile there are lots of other features that are needed by lots of people. I don’t really have the time to implement http proxy support myself, but you are quite welcome to create it and submit a patch. That’s the beauty of open source! If you were going to do this, take a look at the open-uri source code. I know there’s some good code in there for dealing with http proxies.
Bob, how you include this into your code? i’ve tried require ‘feedtools’ but not luck
Close.
require 'feed_tools'Hi Bob,
I am fighting to get FeedTools (0.2.24) to work from within a Rails App. Parsing feeds works fine, but I cannot enable caching. I used the provided migration file, my db is mysql.
the logfile just says ‘select id, href, title, link, feed_data, feed_data_type, http_headers, last_retrieved from cached_feeds limit 1’ a few times.
Any ideas what could be wrong? thanks Frank
Add this to environment.rb:
The feed cache is off by default. You have to turn it on.
Ooh, nice. Thanks for producing feedtools—you’ve just made a project I’m working on soo much easier. :)
I’m hoping that my cunning plan of having thin wrapper models (for example the Feed model will just have the feed url itself, the rest will be pulled from the feed itself) along with cached feeds will work quite nicely and still perform OK… Perhaps even with a cron `script/runner refresh_all_feeds` to decrease the likelihood of a browser request catching an expired feed…
That’s largely what the FeedUpdater is for. As for your plan, yes, it will work just fine, but it will be slow. My recommendation currently is to decide ahead of time which fields you need, create columns in your tables for each of them, and copy them over when the feed is retrieved so that the user’s page request doesn’t ever even touch FeedTools. If you were going to implement it like that, you’d also obviously need a FeedEntries model as well.
I’m so glad I found feed tools. I got it installed and seems to be configured properly as Rails is calling all neccesary files as it tries to load a feed. Then it bottoms out with error “Cannot retrieve feed using invalid URL:” My URL is relative and searching the excellent documentation I see this isn’t a problem. Is there something I am missing?
Adam: It’s a problem if you try to load the feed with a relative path, it’s not a problem if the feed itself contains relative paths.
Hi, thanks for your library… most examples I found were aimed at loading an RSS feed and then using the data from it. I quite what to do the opposite. simply creating an RSS feed.
so I followed your quick howto on http://rails.techno-weenie.net/question/2005/12/25/how_can_i_generate_rss_fees_in_a_rails_app
it works as is in IRB but in my APP, I keep getting.
Cannot build feed, missing feed unique id.
although @feed.id returns a string and @feed.link returns the URL of my blog.
what’s wrong?
completely useless tutorial. utter crap.
On the off-chance that the above comment isn’t actually spam, I’m going to leave it there. (I’m getting a lot of context-sensitive spam that doesn’t make sense.)
Somekool:
Try:I’m writing a rails app that consumes feeds. I found FeedTools, which seems to be just what I’m looking for, but I think I might be missing something.
I need to put each element of a feed item into a column in a separate table. I can see, from the API doc, how to get things like item description, item title, item link, but how do you get other stuff? For example, digg.com has diggCount in their feed, how would I go about pulling that element out? Can FeedTools do this with a built-in, or do i need to access item.feed_data and parse it out myself with a regexp or some other library?
Shawn:
Use
feed.find_node("xpath query")to get at those nodes. You’ll have to do some of the work, but most of the hard stuff (namespaces, etc) is done for you with thefind_nodemethod.Thank you for your quick reply! That helps a ton.
Great work on FeedTools!
Hey, many thanks for this. I’m just getting my feet wet with RoR and little by little it gets more exciting.
After spending a few hours looking at various ways to consume RSS feeds and weighing the options I installed FeedTools and had it up and running in about 10 minutes.
Great, great work, thanks again!
How can I limit the number of items that are displayed from a feed?
Kevin:
Just slice the entries array like you would any other array:
feed.entries[0..9]
Cannot retrieve feed using invalid URL:
for http://feeds.feedburner.com/Techcrunch
How is this URL invalid, or how should I represent it for feedtools?
jayjee:
Can’t duplicate:
Perhaps you should update to the latest version? I may have fixed the bug you’re seeing.
Perfect.
Amazing product you have here.
Could you perhaps get a little more in depth regarding how to limit the number of headlines (feed items)? I’m totally new to Rails, so I’m sorry for the dumb question.
Where/how should slice the array?
Nevermind! Awesome!
I’m trying to do this with 3-4 feeds. I just duplicated the @feed variable, changing it to @feed_two, etc, and changing the code within the div to reflect it (again, basically just duplicating).
Is this what I should be doing in order to display multiple feeds? I ask because I’m currently in development, have a cashed_feeds table set up according to the SQL schema file, but there’s no way they’re caching—it’s still taking forever to load a friggin’ page.
But no exceptions are being thrown.
No, they’re caching, but it’s honestly taking forever for a page to load.
Also getting this error in Terminal:
FeedTools may have been loaded improperly. This may be caused by the presence of the RUBYOPT environment variable or by using load instead of require. This can also be caused by missing the Iconv library, which is common on Windows.
werd:
It’s taking forever to load because FeedTools is slow. Not much that can be done about that without rewriting it. However, most people who are using it in any kind of production environment get around this by using FeedTools in conjunction with FeedUpdater, which is a daemon that sits in the background and does the heavy lifting, then copies the desired information back to the database after parsing has occurred.
As for the error, did you make sure that iconv is available?
How can be sure that iconv is available?
Thank you so much for your help by the way.
And how do I exactly get feedupdater working?
werd:
require 'iconv'If that errors out, iconv is missing and needs to be present. You should easily find the information for getting iconv installed if you Google for it.
FeedUpdater is a non-trivial system and is more difficult to get going, but you should find the information you need if you search for FeedUpdater on my site.
Should I include “require ‘iconv’” in the same controller that I declare the @feed variables?
Also, you mentioned that most users work around the slow speeds by using FeedUpdater in production environments. Does that mean it doesn’t work in development?
Thanks.
In this context, “production” simply means “on a live site that’s publicly facing”. FeedUpdater doesn’t care what your Rails environment is set to.
Regarding iconv, I intended for you to simply type that into IRB. If it errors out, that’s your problem. If it doesn’t, you have some other problem.
Bob, thanks for making FeedTools. It’s been great for reading feeds, but I’ve noticed that, for writing outgoing feeds, performance when calling feed.build_xml for a feed with 100 entries takes on the order of 30 seconds to run on a high-end multicore development machine. e.g. in rhtml where @feed has 100 entries: <%= @feed.build_xml(“rss”, 2.0) %>
Is there something I can do (configuration, parameter, etc) to speed xml creation up? Thanks again.
Chris:
Hmmm… that is an awefully long time… The only thing I can think of that might be causing that is if you weren’t creating the feed from scratch, because then the generation code also has to parse at the same time. That said, FeedTools in general is extremely slow due to a dependancy on REXML, and isn’t really in active development anymore. I intend to do a complete rewrite in C eventually, but it’s a long way off. In the meantime, if you want to do some performance improvements, you are quite welcome to do so. Patches are always welcome.
Is there a way to specify your own expiration time for a feed so FeedTools will download updated feeds to the cache less often? I believe by default the expiration time is taken from the feed itself.
Kevin:
Not with the current version of FeedTools. The next version will allow you to change the default time to live to something else, but that still doesn’t override the time to live values that are actually supplied by the feeds themselves.
Have you considered replacing REXML with LibXML-Ruby which I understand is significantly faster and the APIs are almost plug and play. Nice work.
Michael:
I considered it, but it’s not going to happen. I had to monkey patch REXML to get the behavior I needed in the first place. Monkey patching doesn’t work quite so well on C code.
The plan is, once I have the time, to rewrite the whole thing from scratch in straight C, with a slightly different approach to parsing stuff, and then just give it a Ruby wrapper. But honestly, that could be a year or two off. (Unless, of course, someone decides to fund the project.)
There is no date?
feed.date does not work? how do I get the date of a post?
somekool:
You might want to try
feed.entries[i].time.I’d recommend adding this to
execute “ALTER TABLE cached_feeds CHANGE COLUMN feed_data feed_data MEDIUMTEXT”
to your FeedTools data migration for mysql. I’ve run into a number of feeds that have more than 2^16 in data.
Brent:
Hmm, that’s a good point. I’ll see what I can do.
Is IDN required? My logs keep filling up with this error:
no such file to load—idn (MissingSourceFile) /home/local/lib/site_ruby/1.8/rubygems/custom_require.rb:27:in `gem_original_require’ /home/local/lib/site_ruby/1.8/rubygems/custom_require.rb:27:in `require’ /home/.gems/gems/activesupport-1.3.1/lib/active_support/dependencies.rb:147:in `require’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/helpers/uri_helper.rb:42:in `idn_enabled?’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/helpers/uri_helper.rb:118:in `normalize_url’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/feed_item.rb:530:in `links’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/feed_item.rb:503:in `each’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/feed_item.rb:503:in `links’ /home/.gems/gems/feedtools-0.2.26/lib/feed_tools/feed_item.rb:422:in `link’
Brent:
Blame ActiveSupport. No, IDN is not required, it’s optional, but FeedTools determines whether it’s available or not by trying to load it. If loading fails, it moves on and doesn’t worry about it. I assume ActiveSupport logs the failure anyways even though the exception is caught and handled. You can fix the log issue simply by installing IDN, I guess. You might also try seeing if there isn’t a way to change ActiveSupport’s default logging level. Are you running this in the development environment or the production environment? The log errors may not show up in production, but only development. Not sure.
I’m running this in a semi-production environment with logging turned down but I’m using FeedTools from a daemon and the daemon’s log is catching all of this rather than production.log. It doesn’t seem to affect application functionality however, so you’re probably right that it’s just being logged.
I am having a problem getting FeedTools to parse. I am getting the following error message NoMethodError: undefined method `parse_xml’ for HTree:Module
Any ideas. I used gem to install. I also uninstalled and reinstalled it. I am running on a windows xp machine with ruby 1.8.2
Thanks for any help you can provide.
Alex:
I no longer target 1.8.2. Upgrade to 1.8.5 and see if that helps. Also verify that you have Iconv properly installed. (You can check by simply doing
require 'iconv'in anirbsession.)Brent:
If you mean the FeedUpdater daemon, you can change the logging level in the config file.
Excellent. Thanks for the quick response. Upgrading to 1.8.5 fixed the problem. And many thanks for feedtools. I’m going to see if I can get FeedUpdater working as well.
FeedUpdater won’t work on Windows XP. No support for daemons.
From the on_update() function in the custom updater script, is it possible to access the feed’s ID record?
Custom updater loads all feeds from feeds table. On update, it creates or updates the feed_items table. However, I need to assign feed_items.feed_id to the feed table’s ID.
Hi there and thanks a lot for your great work.
I’m using the latest version of feet tools but I’m having problems with UTF-8 encoded feeds.
Is there any specific settings I should use?
I added a set_charset method with charset=utf-8” as after_filter on my application_controller but this does not solve the issues I’m having.
If I parse an RSS feed with special chars (like ‘à’) somehow the results I get from Feed.title (as example) are badly encoded (at least that’s what I see if I do a puts feed.title.
Thanks for any suggestion!
Andrea:
Unfortunately, there’s no way to tell for sure what the problem is without having the feed in question.
What’s the URI?
I’m having the problem with any feed containing some italian text, like this one: http://feeds.feedburner.com/IlNissardo
I’ve been trying to generate a RSS 2.0 feed.
Your post here suggest that FeedTools can generate RSS 2.0 http://www.railsweenie.com/forums/1/topics/56
However, when I looked at the code for FeedTools::Feed.build_xml
I did not see any code that suggested support for RSS 2.0
FeedTools::FeedItem doesn’t seem to have the all methods to support RSS 2.0 either. ( e.g. pubDate )
Any suggestions are appreciated.
thanks, RailsRoad
Railsroad:
Not sure how you missed the code, it’s pretty obviously there. If you want to generate RSS 2.0, it’s as simple as:
And as the API indicates, FeedTools uses the
FeedItem#timeandFeedItem#updatedmethods rather than a pubDate method. But it obtains the data from the pubDate element. So yes, it supports RSS 2.0.Hi Bob, Quick question – (I’d use the official forum if there was one)
I’m trying to parse an Atom feed from Flickr, specifically trying to access the url to the image.
<link href="http://farm1.static.flickr.com/175/372236890_5dba07aabc_o.jpg" />I’ve been struggling to find where in the documentation this is available.
I assumed this would be the link() method, but link() is returning the URL from <link>
Any help on this is greatly appreciated. Thanks!
Woops, looks like tags are being stripped out of the comments.
The XML tag I’m trying to access from the feed is:
<link rel=”enclosure” type=”image/jpeg” href=”...” />
But the link() method is returning the URL from this tag:
<link rel=”alternate” type=”text/html” href=”...” />
Jason:
Use the
FeedItem#enclosuresmethod or alternatively, theFeedItem#linksmethod.Why does is not support http_proxy?
Most corporations do not allow http access directly from clients on the inside.
Exmaple of http_proxy would be nice.
/MartOn
MartOn:
It does support http proxy. Use the
:proxy_address,:proxy_port,:proxy_user, and:proxy_passwordconfiguration options.Hi Bob,
I just installed it and it works great!
In a tag like:
<enclosure>How can I access the URL, LENGTH or TYPE?
Sorry for the nubyness
The comment removed my exmaple tag, I meant:
<enclosure url="urlhere" size+"34" type="audio/mpeg">Hi,
I tried FeedTools with Ruby on Rails behind a proxy. I know there are proxy options and I tried to configure it but I can’t make it works. I used several method, including this one:feed = FeedTools::Feed.open(url, { :proxy_address => "http://172.16.0.30", :proxy_port => 8080 })and I always get this error: “Socket error prevented feed retrieval”
My proxy address and port are good (it works with other RSS parsers), and the syntax looks ok.
If anybody knows the exact configuration of FeedTools with a proxy, it would be great ;)
Thanks.
Ok I got it…
The :proxy_address parameter should be called :proxy_host. Indeed, without ‘http://’, it works :)
Thanks anyway for this great parser ;)
Hi, FeedTools looks like the solution for my project, but I am having a bit of an issue getting it to run. I type:
If I run the code, I get,
"no such file to load feed_tools". This sounds like I don’t have the gem, but I did install FeedTools 2.26.Any ideas?
paul
Paul,
Well, first, you need to make sure you’ve called:
require 'rubygems'Assuming you’ve done that, the only other reason I can think of is that you might be running a different copy of ruby than the one you think you’re running. Try calling
which rubyto check that you’re using the runtime that actually has the FeedTools gem. On OS X, it’s fairly common to see paths that have been set up in such a way that/usr/bin/rubyis the runtime being used, but you may have the gems installed for the/usr/local/bin/rubyruntime. If that doesn’t help, I don’t really know what totell you. If the gem is installed, and ubyGems has been loaded, it should work.That said, I think you’ve misunderstood how the code in this tutorial works. In the example above,
'feed_url'is a String key that allows you to pass a URI in through Rails’ params Hash. You’re not supposed to replace that string with the URI of the feed you want to retrieve.This is the code for what I assume you wanted to do:
Hope that helps!
Hi All,
I am new to ROR. Please suggest how to find the list of keys in “item” variable in the following code.
thanks in advance.
FeedItems aren’t Hash objects. To access the fields of an item you need to look at the API. Which apparently has bit-rotted out of existence. Use the gem server to access the API instead.
Then browse to
http://localhost:8808/The API has been uploaded to RubyForge now. Should be easier to just use that.
It seems FeedTools doesn’t manage UTF-8 strings very well. Look at following code. @feed.description should return “Tecnologia, cultura, costume, novità….” last word is incorrectly reported as “novitÔ.
Any help?
Franco, which version of FeedTools were you using?
Thanks Bob, I’m using 0.2.28 I downloaded with gem
Macintosh:~ francosolerio$ gem list feedtools -l
feedtools (0.2.28) Parsing, generation, and caching system for xml news feeds. Macintosh:~ francosolerio$
Looks like feedtools formats everything in ISO-8859-1. Doing the following gives the correct result:
(I’m using ruby 1.8.6 on OSX Leopard)
I’m having the same problem with UTF-8 on Debian Testing and PostgreSQL (all in utf8). Using iconv it solves the problem
The problem isn’t gone. When I do ic = Iconv.new(‘ISO-8859-1’, ‘UTF-8’) and try to insert the data in the DB says that I have a “PGERror: ERROR, the bytes secuence is not a valid codification UTF8”, because my DB needs UTF8 (but feed.title seems that is not utf8).
Anybody knows how to solve the problem?
Unfortunately, there are obviously utf-8 and iso-8859-1 issues going on. I currently have no plans to fix these issues or to invest any further time in FeedTools, but if someone wants to submit a patch and test it, I will happily commit the patch and put out a new release.
I traced the problem in html_Helper.rb -> resolve_relative_uris() calls parse_fragment() without specifying the encoding.
I’m an absolute beginner in developement, so I just hard-coded the encoding like this:
It works! I suppose the correct solution would be to retrieve the exact encoding of the original feed and submit that to parse_fragment().
Franco Solerio is right, I made a small patch to fix this encoding problem. I don’t know if it’s the better way to do this, but it works for me.
http://n0life.org/~julbouln/feedtools_encoding.patch
Hi, I was wondering why activerecord is a dependency for feedtools. Is there an automatic database interaction that isn’t mentioned in the tutorials?
It would be nice if you could modularize feedtools into its finer components: reader, builder, and … cacher?
That way, I don’t need to install builder and activerecord.
Franco, Julien:
Good catch on the encoding thing. My bad there. I’m pretty sure at that point in the parsing processing, everything should have been converted to utf-8 anyways, so Franco’s code should probably be correct as-is. I’ll give it a try and see about releasing another gem.
David:
Yes, there is a caching system built into FeedTools. Yes, if I was writing the library today, that would almost certainly the route I would go. However, as you will discover if you play with the actual code, it’s a big project, and it’s not something you can easily fix. Plus, I’ve effectively abandoned it, so I’m really not going to go taking on huge code reorganization efforts.
Will the ‘UTF-8’ changes be packaged into a gem in the near term?
If a gem fix won’t be available soon, is the patch the preferred method to solving the problem?
Hi! I have a little problem when I try to show images from a feed, it works fine, the html code is correct (
Thanks in advance!
Oh, some problem in the last comment, sorry. The problem is show images that are inside of the description of a feed item, the html code is correct but I can’t see anything, any idea?
Thanks in advance!
Leave a Response