Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

How do you crawl the web? Do you follow links around? How do you reach a page that isn't linked from anywhere you've crawled?


I'm just using common crawl for now


I mean that's what web crawling is, right? By extension, you just can't reach a page unless you stumble upon a link to it _somewhere_. Google gives you an option to submit a link and schedule a crawl that way, so that's another option if it's not being linked to from anywhere.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: