Hacker Timesnew | past | comments | ask | show | jobs | submitlogin

I'm a bit confused. Is anyone's website currently being DDOS'd by scrapers for LLMs or is this a hypothetical?

If you can't handle the traffic of one request per page per week or month, I think there are bigger problems to solve.



> I'm a bit confused. Is anyone's website currently being DDOS'd by scrapers for LLMs or is this a hypothetical?

It is very real and the reason why Anubis has been created in the first place. It is not plain hostility towards LLMs, it is *first and foremost* a DDoS protection against their scrapers.

https://drewdevault.com/2025/03/17/2025-03-17-Stop-externali...

https://social.kernel.org/notice/AsgziNL6zgmdbta3lY

https://xeiaso.net/notes/2025/amazon-crawler/


I've set up a few honeypot servers. Right now OpenAI alone accounts for 4 hours of compute for one of the honeypots in a span of 24 hours. It's not hypothetical.


25k+ hits/minute here. And that's just the scrapers that doesn't just identify themselves as a browsers.

Not sure why you believe massive repeated scraping isn't a problem. It's not like there is just one single actor out there, and ignoring robits.txt seems to be the norm nowadays.


> I'm a bit confused. Is anyone's website currently being DDOS'd by scrapers for LLMs or is this a hypothetical?

Yes, there are sites being DDoSed by scrapers for LLMs.

> If you can't handle the traffic of one request per page per week or month, I think there are bigger problems to solve.

This isn't about one request per week or per month. There were reports from many sites that they're being hit by scrapers that request from many different IP addresses, one request each.


> I'm a bit confused. Is anyone's website currently being DDOS'd by scrapers for LLMs or is this a hypothetical?

There are already a few dozens of thousands of scrapers right now trying to get even more training data.

It will only get worse. We all want more training data. I want more training data. You want more training data.

We all want the most up to date data there is. So, yeah, it will only get worse as time goes on.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: