HN2new | past | comments | ask | show | jobs | submitlogin

I've got the impression that lots of sites block AWS IP addresses. I wonder if this would hamper the practical use of this on Lambda.

I'm doing something similar, and this concern was one motivation for running in our datacentre vs EC2.

Does anyone have concrete info on rates of bots blocked from AWS IPs?



I assume the number one use of this would be test automation for one's own sites so blocking would not be an issue.

What are sites' motivations for blocking AWS IPs? I bet there are some reasons I would agree with even though the somewhat crude method of blocking ip range would have some unintended consequences (e.g. blocking people running a personal VPN).


>What are sites' motivations for blocking AWS IPs?

I block AWS. So many crawlers up to so much nonsense! I don't block by IP, but by hostname.

  $block='.amazonaws.com';
  $ua = @$_SERVER['HTTP_USER_AGENT'];
  
  if (stripos($rh,$block)!==false &&
  	stripos($ua,'Silk')===false &&
  	stripos($ua,'Safari')===false){
  
    	$block_visitor=true;
  	$message="Blocked Host:Amazon Web Services";
  }


Just curious what have you seen crawlers do to make you conclude they're up to nonsense?


Well, from amazonaws.com, there are so many requests for wp-login.php!

And then all the off-brand scraping companies use amazonaws.com.


What do you mean by off-brand scraping? You mean search engines that you haven't heard of, or copyright violating orgs?


Some AWS visitors:

Cliqzbot, VidibleScraper/1.0, CheckMarkNetwork, CCBot/2.0 (http://commoncrawl.org/faq/), linkdexbot/2.2; +http://www.linkdex.com/bots/

That last one is a "SEO platform".


Indeed, so many of AWS’s IP ranges are used for DDoS and malicious behaviour they end up getting blacklisted due to their poor reputation. It’s a bit like using one third party resellers shared IP ranges in your mail relay - you’re asking to end up on reputation based lists. There’s a lot to be said for your IP reputation on the internet and when you outsource that - you outsource your freedom to maintain your reputation risk.


Fairly simple (if this is like the other programmatic headless browsers) to make a request through a proxy.


And where do you run the proxy?


In my experience this isn't happening.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: