Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I'm not sure reading and fully comprehending the flow of the complete JavaScript source for every site you might wish to scrape is actually easier than making a WebKit-based scraper that runs load and click handlers.


Relevant: http://code.google.com/p/phantomjs/ is a headless browser based on WebKit

I don't have any hands-on experience with it, but if one were to go down the path you just described, that project would likely be a great start on that journey.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: