I recently built a crawler of my own in Ruby after trying, and ultimately deciding against, Nutch. Depending on what you want to do with your crawl there is a very good change that you'll be able to write a small crawler that is much easier to extend on your own, and you'll probably be able to write it in the time it would take you to install, setup, and configure Nutch.
As I said I used Ruby and specifically Hpricot for the page parsing. I'm starting to run into problems with Hpricot right now though and I may actually try a python version with Beautiful Soup very soon. Let me know how it goes for you and maybe we can share some code.
Hi there. I think I've decided we need to build our own, mainly as what we're wanting to do is quite specific - monitoring of sites for keywords and really we should understand,to a degree, this technology. Happy to share if you end up heading down the Python route. If there is anyone else on here who is doing crawling/data mining then maybe we could share ideas, help each other somehow :) My email is in my profile
As I said I used Ruby and specifically Hpricot for the page parsing. I'm starting to run into problems with Hpricot right now though and I may actually try a python version with Beautiful Soup very soon. Let me know how it goes for you and maybe we can share some code.