http://mojolicio.us is way better for this kind of stuff. Here's the synopsis ex...

chrisohara · on Jan 23, 2011

The one liner is cool, but I guarantee that node.js's non-blocking IO will outperform perl any day of the week. Try scraping thousands of pages at once using perl..

marcusramberg · on Jan 23, 2011

mojolicious is using a non-blocking async runloop as well =)

harryf · on Jan 23, 2011

The problem you'd have with anything that represents a page as some kind of graph is you have to construct the whole tree before you can start doing anything with it. The API largely precludes streams. Callbacks would be possible but some of the conditional CSS selectors need a complete knowledge of the page before they can be resolved.

So while GET-ting pages to scrape can benefit from async IO, you're effectively "blocked" while scraping pieces out of the page itself.