Does anyone have good alternatives to old/new ReCAPTCHA? I've been scratching the surface of academic research in the area, and it's all kind of messy.
(It's no great secret that CloudFlare would love to switch away from ReCAPTCHA, for a whole variety of reasons. It's one of the things Tor users complain about the most, but it's an issue for a lot more users than that. We're doing a lot of stuff to reduce reliance on CAPTCHAs overall throughout 2015, but we still need a good one for some checks.)
I wonder if some kind of prize (anti-Turing prize?) would help. There's the core algorithm/approach question, as well as the infrastructure and deployment model question. I'm a lot more comfortable answering the latter; the former is a black art mixture of science and art.
You could disable CAPTCHAs for Tor at any point, but don't, presumably because there actually is a lot of scraping and other abuse coming through there. I don't think replacing reCAPTCHA is going to make Tor users magically happy. They will just complain about whatever alternative is used.
One thing you could look in to (email me if you'd like more info on this) is the notion of using Bitcoin-based "anonymous passports". The idea here is that someone sacrifices some Bitcoins to miner fees in such a way that they effectively mint themselves a certificate over a public key, without paying a certificate authority. Seen one way, the block chain itself is the CA.
Once such a certificate/"anonymous passport" has been created the owner can sign challenges with it to prove ownership, and if the user is observed engaging in abusive activity it can be blacklisted - forcing another sacrifice of money if they want to keep going.
The downside of this is that of course it relies on Bitcoin. However, there's a whole army of people working on the problem of obtaining bitcoins. There are local traders. There are Bitcoin ATM's being deployed throughout the world. If you want a private way to demonstrate some sacrifice of effort or wealth I don't think there's a better alternative.
Currently there's no convenient GUI for making these certificates. If a major provider like CloudFlare were willing to get behind the concept, such a GUI could be easily built though. For instance I could add it to Lighthouse, which is a cross platform Bitcoin wallet app that specialises in smart contracts.
Why not just rate limit responses? That ends up costing bot makers about the same amount of money as captchas (which are often solved by workers earning slave wages). This arms race will never end and if the insistence is always to prove you're human, then humans will always be exploited for this proof. Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.
Specifically one way to rate limit would be a cookie value that changes on each request, the previous cookie value expires, and only the site knows what the next valid cookie value is. Bots will pay for the cost of waiting in terms of computing time, and in terms of memory if they get around this by parallelization. As these costs go down due to cheaper computers, then so too will the costs of serving the site.
> Imagine one day when we've automated the world and the only reason humans have to do any work is so that robots can prove they're human. This whole thing is ridiculous.
... did you just solve the whole problem of on-going automation? Capitalism is saved! We will all work as CAPTCHA breakers. ;).
The article claims bits can solve the most distorted text with 99.8% accuracy. I'm not that accurate. Perhaps someone can write a captcha breaking chrome extension, so I don't have to bother.
An issue that I've run into is that 1) Google registers traffic from Tor proxies as suspicious (with some reason), it 2) puts Captchas in front of you (which are getting quite difficult to solve), and 3) if you're rate-limiting Tor proxies (around 6,000 - 7,000 worldwide as I was checking earlier), you're going to block a lot of legitimate Tor traffic.
Similarly for VPNs and other tools. Whose use is fairly likely to increase as people start seeking ways to avoid ubiquitous surveillance.
There are other options, including a few tools that look at how to provide a fair and anonymous reputation system for Tor clients:
I think you can rate limit without tying it to IP address. If each page returns a session key only valid for the next page request, then you force bots to wait as long as you want and/or spend money on extra memory for parallel sessions. One problem with this is, e.g., if people come to your site from an indexed link and have no possible session yet. In that case you probably would want to add a delay after some amount of requests per IP, so you'd slow Tor users down but only on the first request to your site. If your page is JS or browser dependent in some way, then bots would probably need about 100 MB per thread. All of this is in the ballpark of paying people to solve captchas.
This was a problem before Tor anyhow. You can run a proxy for a few cents a day.
I just don't see how captchas are some awesome solution. In any anti-bot technology, the cost to circumvent it is pennies. It strikes me more as something like DRM which just makes content producers feel good, but really only punishes average people.
edit: sorry I hadn't read your links. Good points and hopefully someone like CloudFlare would make this easy for people to add to their sites.
You'd have to limit novel sessions to very low activity rates. That would require some sort of persistence token (not necessarily a cookie), and if provided on an anonymised basis, one that's verifiable but not predictable or traceable to prior cookies. Which is what much of the references I provided covers.
Sorting a mechanism for allocating those tokens' seed values is difficult. FAUST requires an unblinded token request initially.
CAPTCHAs had been useful, though always problematic. The goal isn't perfection but costs. Problem is that costs keep falling.
Request rate is definitely one thing you can limit, but it's tricky when attackers potentially control large numbers of IP addresses.
There's an annoying triangle here: wanting to preserve privacy (== unlinkability), machine-independence, and "working well for good traffic with limited resources, as well as blocking attackers with substantially more resources". Ideally it is "choose zero", I'd be happy if the state of the art were even at "choose one".
Apologies if this feels promotional - if you have any questions, I'd be happy to answer them. This is an area of web sec that we're, obviously, very dedicated to.
I looked into this research area as well. In fact, I considered writing a thesis about CAPTCHA. The problem is really that there is little to no formal theory about this subject. CAPTCHAs remind me of how cryptography used to work: Someone invented a cipher, someone else broke it and so the cipher had to be improved to prevent this attack (well, it still works kinda like that, but the theoretical background is much better now). Much more an art than a science. What is a strong CAPTCHA? Again, almost no theory behind this. It's strong if programmers have a hard time figuring out how to break it with software. It is a frustrating situation, really, and computers aren't getting any dumber so CAPTCHAS must get harder to the point where they're not recognizable by humans.
Yes, but unlike crypto, a CAPTCHA doesn't need forward protection. If I deploy one, and it turns out to be weak, I can just upgrade and then not worry. I'm not worried about people time-traveling to the past.
I run a help site for an online game, and I used to use CAPTCHA's on our various forms. We had tons of bot problems. So, one day I switched the CAPTCHA to a simple trivia question that only people who play the game would know the answer to. (And helped the people who didn't/were newer by linking to the answer.)
Our bot problem disappeared over night, and we haven't had a problem since. Definitely not a solution for everyone, but it could be a great solution for some.
You should reduce reliance on IPs and instead give people unique IDs once they solve your captcha. Then any abusive traffic will be de-muxed based on ID and so legitimate users will be effected much less by the bad behavior of others.
(It's no great secret that CloudFlare would love to switch away from ReCAPTCHA, for a whole variety of reasons. It's one of the things Tor users complain about the most, but it's an issue for a lot more users than that. We're doing a lot of stuff to reduce reliance on CAPTCHAs overall throughout 2015, but we still need a good one for some checks.)
I wonder if some kind of prize (anti-Turing prize?) would help. There's the core algorithm/approach question, as well as the infrastructure and deployment model question. I'm a lot more comfortable answering the latter; the former is a black art mixture of science and art.