Tip: The result count estimate on the first result page is often off by several orders of magnitude. Often it's possible to navigate to the end of the results pretty quickly, and the number drops to something like 469:
I agree that is's annoying when journalists do it to demonstrate that a topic is important/controversial/etc. But I'm not sure there's anything wrong with it when the point being made it literally (and only) about the number of times a specific piece of text appears on the Internet.
Yeah, I'm curious about that. My guess is it has 600,000 results for things it thinks matches your query, based on how it interpreted it. But if it's not in the top 10 or 20, your query must be interpreted more strictly, and synonyms or slight misspellings or related pages fall off the radar.
I don't get that. I clicked your link, then the first results page, and now I'm browsing result pages 10, 15, 25, 30 for some reason and it doesn't stop.
Personalization should affect ranking, but not so much the total set of results. Domain, language & safe search prefs could. I wonder if the dedup filter can vary a lot (did the results end in "similar results omitted"?).
Google knows people rarely browse beyond the first few pages of results. Everything else is derived from the days when it was important to show off how big your archive is. They're basically still performing that vanity game. 600,000 results, nearly all of which are of such low quality it'd be embarrassing for Google to show off what they pretend is a valid result so they can fake a large result count (but they know users don't go there, so it doesn't matter).
I'm rather inclined to believe this is an artefact of the algorithm, something like:
1) grab index for each word in the query
2) grab first x results from each index
3) cross these to filter down to actual matches
I believe step 3 is as expensive as rendering the results AND nobody would like to wait for rendering 600k results before getting some of it. This they stick with some simple estimate, like max(size of found indices), or average or whatnot.
Seems like a reasonable hypothesis, but you can easily prove that it's wrong by doing a search for a single word, so there is no step (3).
I did a search for "hackerspace" (no quotes) and it claims "About 588,000 results." Paging through the results, I eventually got "In order to show you the most relevant results, we have omitted some entries very similar to the 390 already displayed." Even with omitted entries included, there only seems to be 832 results.
I agree. When I follow kuschku's link (starting at the 900th result), there is no next page link at the bottom, and if I try hacking the "start" value in the URL to go even one document farther I get no results and a "Sorry, Google does not serve more than 1000 results for any query. (You asked for results starting from 901.)" below the search box.
Google quickly gets a set of results from some servers, and then estimates the total.
Say, for example, that their Search cluster has 10000 nodes. And suppose that the query returns 60 results from the first node itself; so it could multiply 60*10,000 and claim there may be 600,000 results. But when it is asked to actually go and fetch the results, for various reasons, it may not get to that figure; most of the nodes may just shrug and say "we got nothin'".
Tip: The result count estimate on the first result page is often off by several orders of magnitude. Often it's possible to navigate to the end of the results pretty quickly, and the number drops to something like 469:
https://www.google.com/search?q="time+to+make+some+plans+for...