Why not just rate limit responses? That ends up costing bot makers about the same amount of money as captchas (which are often solved by workers earning slave wages). This arms race will never end and if the insistence is always to prove you're human, then humans will always be exploited for this proof. Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.
Specifically one way to rate limit would be a cookie value that changes on each request, the previous cookie value expires, and only the site knows what the next valid cookie value is. Bots will pay for the cost of waiting in terms of computing time, and in terms of memory if they get around this by parallelization. As these costs go down due to cheaper computers, then so too will the costs of serving the site.
> Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.
... did you just solve the whole problem of on-going automation? Capitalism is saved! We will all work as CAPTCHA breakers. ;).
The article claims bits can solve the most distorted text with 99.8% accuracy. I'm not that accurate. Perhaps someone can write a captcha breaking chrome extension, so I don't have to bother.
An issue that I've run into is that 1) Google registers traffic from Tor proxies as suspicious (with some reason), it 2) puts Captchas in front of you (which are getting quite difficult to solve), and 3) if you're rate-limiting Tor proxies (around 6,000 - 7,000 worldwide as I was checking earlier), you're going to block a lot of legitimate Tor traffic.
Similarly for VPNs and other tools. Whose use is fairly likely to increase as people start seeking ways to avoid ubiquitous surveillance.
There are other options, including a few tools that look at how to provide a fair and anonymous reputation system for Tor clients:
I think you can rate limit without tying it to IP address. If each page returns a session key only valid for the next page request, then you force bots to wait as long as you want and/or spend money on extra memory for parallel sessions. One problem with this is, e.g., if people come to your site from an indexed link and have no possible session yet. In that case you probably would want to add a delay after some amount of requests per IP, so you'd slow Tor users down but only on the first request to your site. If your page is JS or browser dependent in some way, then bots would probably need about 100 MB per thread. All of this is in the ballpark of paying people to solve captchas.
This was a problem before Tor anyhow. You can run a proxy for a few cents a day.
I just don't see how captchas are some awesome solution. In any anti-bot technology, the cost to circumvent it is pennies. It strikes me more as something like DRM which just makes content producers feel good, but really only punishes average people.
edit: sorry I hadn't read your links. Good points and hopefully someone like CloudFlare would make this easy for people to add to their sites.
You'd have to limit novel sessions to very low activity rates. That would require some sort of persistence token (not necessarily a cookie), and if provided on an anonymised basis, one that's verifiable but not predictable or traceable to prior cookies. Which is what much of the references I provided covers.
Sorting a mechanism for allocating those tokens' seed values is difficult. FAUST requires an unblinded token request initially.
CAPTCHAs had been useful, though always problematic. The goal isn't perfection but costs. Problem is that costs keep falling.
Request rate is definitely one thing you can limit, but it's tricky when attackers potentially control large numbers of IP addresses.
There's an annoying triangle here: wanting to preserve privacy (== unlinkability), machine-independence, and "working well for good traffic with limited resources, as well as blocking attackers with substantially more resources". Ideally it is "choose zero", I'd be happy if the state of the art were even at "choose one".
Specifically one way to rate limit would be a cookie value that changes on each request, the previous cookie value expires, and only the site knows what the next valid cookie value is. Bots will pay for the cost of waiting in terms of computing time, and in terms of memory if they get around this by parallelization. As these costs go down due to cheaper computers, then so too will the costs of serving the site.