I've got the impression that lots of sites block AWS IP addresses. I wonder if t...

extra88 · on July 26, 2017

I assume the number one use of this would be test automation for one's own sites so blocking would not be an issue.

What are sites' motivations for blocking AWS IPs? I bet there are some reasons I would agree with even though the somewhat crude method of blocking ip range would have some unintended consequences (e.g. blocking people running a personal VPN).

pravda · on July 27, 2017

>What are sites' motivations for blocking AWS IPs?

I block AWS. So many crawlers up to so much nonsense! I don't block by IP, but by hostname.

  $block='.amazonaws.com';
  $ua = @$_SERVER['HTTP_USER_AGENT'];
  
  if (stripos($rh,$block)!==false &&
  	stripos($ua,'Silk')===false &&
  	stripos($ua,'Safari')===false){
  
    	$block_visitor=true;
  	$message="Blocked Host:Amazon Web Services";
  }

hackits · on July 27, 2017

Just curious what have you seen crawlers do to make you conclude they're up to nonsense?

pravda · on July 27, 2017

Well, from amazonaws.com, there are so many requests for wp-login.php!

And then all the off-brand scraping companies use amazonaws.com.

afandian · on July 27, 2017

What do you mean by off-brand scraping? You mean search engines that you haven't heard of, or copyright violating orgs?

pravda · on July 27, 2017

Some AWS visitors:

Cliqzbot, VidibleScraper/1.0, CheckMarkNetwork, CCBot/2.0 (http://commoncrawl.org/faq/), linkdexbot/2.2; +http://www.linkdex.com/bots/

That last one is a "SEO platform".

smcleod · on July 27, 2017

Indeed, so many of AWS’s IP ranges are used for DDoS and malicious behaviour they end up getting blacklisted due to their poor reputation. It’s a bit like using one third party resellers shared IP ranges in your mail relay - you’re asking to end up on reputation based lists. There’s a lot to be said for your IP reputation on the internet and when you outsource that - you outsource your freedom to maintain your reputation risk.

ingenuous2 · on July 27, 2017

Fairly simple (if this is like the other programmatic headless browsers) to make a request through a proxy.

afandian · on July 27, 2017

And where do you run the proxy?

bdcravens · on July 27, 2017

In my experience this isn't happening.