It’s hard to see how this isn’t anticompetitive. Using their market power in search to show their own ads but prevent anyone navigating to view any others seems like something that should end up in court.
Google has always respected robots.txt and has well documented how to restrict crawling.
I don’t see it as anticompetitive so much as copyright violations. The fact that Google makes advert revenue using this strategy just contributes to damages.
Anticompetitive would likely require Alphabet (and potentially other companies) from preventing users from accessing the original web page (which might be possible but more difficult to prove).
Now I’m curious: what would happen if someone had no idea that Google existed (insert wild reason here. I’m leaning towards lived in the woods for 40 years and self-hosts content without reading anything, like a hermited author)?
Does Google still get to say “everyone knows we scrape your content” and “We have instructions about opting out”?
My take is that if you serve web content to unauthenticated clients, you must expect humans and robot users will look at, remember, and catalogue your content.
To the extent that you want to control your content, you need to take measures to do so. Reject unauthenticated, assert copyright/trademark controls, or do some basic searching to implement open standards which most/all legal agents adhere to.
The thought exercise is moot if you consider how many laws people are governed by in the modern world that they don’t know about but are still expected to learn and abide by. Also worth thinking about what recourse you expect this naive website owner to be able to effect given they have done the absolute minimum to protect their content.
I feel like it might be dangerous to equate Google with societal laws, but you have some compelling arguments.
If I may counter to flesh this out:
Take books. Authors have copyright laws to protect the content but they also have to accept that humans will look at, remember, and catalog the content of their books.
And we ended up with libraries which catalog on a large scale, as well as laws that dictate how people can use the content of those books (especially when they do borrow them).
But Google could be considered as a company that borrows all the books, has people transcribe and file them, returns the books, and then advertises that they are “better than the library” because they have hired the best and fastest librarians in the world. However, they charge a cover fee. (In reality this fee is paid by advertisers, but it’s still there and Google is still a for-profit company)
I think that it would be hard to argue that that type of business is reasonable or expected, and to bring it back to the example of a content creator: I think many of us would push back if a reclusive author brought a manuscript back to the city after 20 years of being a hermit, had it turned in to a book, and then found out that every single person who read it did so without paying him, without him being able to connect with those readers, and without him even being able to know how many people because this for-profit library chose not to tell him.
You can absolutely copyright the packaging of facts (see any textbook ever printed).
Google often has entire paragraphs ripped from the underlying page in their knowledge graph. Would it be copyright infringement if Google displayed paragraphs of text from copyrighted textbook? If so, is there a difference between doing that and doing the same with, say, "facts" from a blog post?
IANAL, but my understanding is that an individual fact can’t be copyrighted, but an organized collection absolutely can be.
The example I remember is that A baseball game score can’t be copyrighted, but if you run a website that lists all of the baseball games throughout history and you gathered that data and did work to organize it (other than to scrape it from another collection), you have a claim to a copyright.