I also have been exploring what it may mean to automagically categorize news content.. I took an approach on http://www.rivyr.com of following RSS feeds.. then reverse searching the titles against my product database to attempt to autoclassify news by product. Wish I could see your backend as that's where the secret sauce is but definitely understand not doing that ;)
That is super interesting. I'm actually working more closely with Apache OpenNLP for feature extraction and LibLINEAR to classify the news according to their category. Its not too far from coming out of alpha to an open beta version. Why not subscribe to the mailinglist and I'll shoot you more infos over in the upcoming weeks?