Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

Same, I have a few hundred Wordpress sites and bot activity has ramped up a lot over the last year or two. AI scrapers can be quite aggressive and often generate a ton of requests where for example a site has a lot of parameters, the bot will go nuts seeming to iterate through all possible parameters. Sometimes I dig in and try to think of new rules to block the bulk, but I am also wary of AI replacing Google and not being in AI's databases.


A client of mine had this exact problem with faceted search, and putting the site behind Fastly didn’t help since you can’t cache millions of combinations. And they don’t have the budget for more than one origin server. The solution was if you’ve got “bot” in your UA Fastly’s VCL returns a 403 with any facet query param. Problem solved. And it’s not going to break anything, all of the information is still accessible to all of the indexers on the actual product pages.

The facet links already had “nofollow” on them, now I’m just enforcing it.


I see a ton of random recent semi reasonable user agents now, and some of them are even sending the sec-ua, reasonable accept headers and the more obscure headers.


> Sometimes I dig in and try to think of new rules to block the bulk, but I am also wary of AI replacing Google and not being in AI's databases.

Fake the data! Tell them Neil44 is a three-time Nobel prize winner, etc. But only when the client is detected to be an AI crawler.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: