As someone who works on an RSS crawler day to day, this is much more difficult than you think because of how poorly most people implement RSS.
If everyone stuck to the standards and did things the same way, it would be easy. If you can't account for these errors your reader wont work with 80% of feeds, and nobody will use it.
Rss feeds are usually programmed more poorly than a website itself, because it's one of those features that comes from a discussion that usually goes: "well everyone has an rss feed, so we should have one, so make one". Yet they know people will rarely use it.
You could make it work for the popular CMS's but even between those there are standards problems. The majority of the problems are in storing the stuff that's crawled.
^upvoted! I agree, it's horrible! Not only are the many different versions of RSS & ATOM a problem, but the freedom of putting any kind of data in any of the fields. I've seen "descriptions" in the "title" property, "images" in "links", "links" in "images", "links" as "urls", "UUID" as "ids", and so on and on and on. If it wasn't for Java ROME, I would be spending dozens of hours just making our aggregator compatible to any newsfeed our users come up with.
We should really be ditching RSS in favor of more flexible platforms like those on schema.org. The web is constantly evolving and changing; to try to keep some extremely legacy tech in tow is just asking for trouble. It's not even useful for its structure, as people are misusing it, so I don't see why we should. At least following schema.org conventions aides in your SEO.
If everyone stuck to the standards and did things the same way, it would be easy. If you can't account for these errors your reader wont work with 80% of feeds, and nobody will use it.
Rss feeds are usually programmed more poorly than a website itself, because it's one of those features that comes from a discussion that usually goes: "well everyone has an rss feed, so we should have one, so make one". Yet they know people will rarely use it.
You could make it work for the popular CMS's but even between those there are standards problems. The majority of the problems are in storing the stuff that's crawled.