HN2new | past | comments | ask | show | jobs | submitlogin
Google Safe Browsing is blocking small Mastodon servers (snake.club)
106 points by brenns10 on Nov 9, 2022 | hide | past | favorite | 52 comments


Hi folks, my Mastodon server is now blocked as a "deceptive site," likely because its login pages are similar to other Mastodon servers. To a machine learning or rule-based system, I could see how that seems deceptive. But I'm now stuck in a several days to several weeks long wait for manual review. And of course there's no guarantee that manual review will result in a correct decision.

Meanwhile, it's truly incredible how fully my site is now blocked off from the world. Firefox Android won't even let me access it, while Chrome hides the click-through underneath the "details" button. My friends who use the site are understandably concerned, but there's not much I can do. It's crazy how a false positive in one centralized system can impact my little piece of the fediverse.


Pushbullet has also learned the hard way how to respond to Google's faceless review process:

> Now that you know you’re up against an automated policy enforcement AI, your goal is to look as much as possible like the training data. Unfortunately, this can be easier said than done since we do not have access to the training data.

> All is not lost, however. You have actionable info in the simple fact that what your app looks like now is not acceptable.

https://blog.pushbullet.com/2022/10/27/how-we-became-the-wor...


As an update, Google's "warning level" seems to have dropped to Yellow from Red: https://transparencyreport.google.com/safe-browsing/search?u...

I still see an exclamation mark in mobile Chrome, but it looks like the big scary red interstitial is gone from most browsers I've seen. Perhaps it was enough different people reporting false positive? Or maybe a re-scan or manual action. Unclear but it's appreciated. Wish there was more transparency (ironic given the "transparencyreport" subdomain that URL is hosted at).


Stuff like this is why I never use those "block malicious sites" sort of features -- not only because I assume they involve sending every URL I access to some kind of checking service, but also because I don't trust Google or Mozilla (or any other corporate software vendor) to be the arbiter of what makes a site "trustworthy". Further, maybe I wouldn't have to worry so much about whether I can "trust a website" if they would stop turning their web browsers into full-blown operating systems?

To me there's also some irony to the name "Google Safe Browsing", because IMHO that is an oxymoron -- browsing with Chrome is never "safe" for you because of the inherent/persistent violations of your privacy.


As much as I love to hate Google:

> not only because I assume they involve sending every URL I access to some kind of checking service

This isn't even remotely close to true.

> but also because I don't trust Google or Mozilla (or any other corporate software vendor) to be the arbiter of what makes a site "trustworthy"

Maybe you're a genius that can navigate the web with ease, but the reality is this feature has probably stopped millions or even billions of malware downloads and/or phishing attempts.

> Further, maybe I wouldn't have to worry so much about whether I can "trust a website" if they would stop turning their web browsers into full-blown operating systems?

Safe browsing is more about stopping people from accidentally downloading malware or falling for a phish, and very little about exploits in the browser itself which are few and far between.


At least some data is definitely shared with Google's safe browsing service by default, more and with greater detail still when "enhanced protection" or "make searches and browsing better" is enabled.

https://security.googleblog.com/2022/08/how-hash-based-safe-...

https://security.googleblog.com/2020/05/enhanced-safe-browsi...


"Some data" is WAY different than "Google receives the URL of every page you visit".

I'm not saying Google's Safe Browsing feature is immune to criticism, but when someone posts such blatantly false information it discredits everything else they have to say on the topic.


Please stop misrepresenting what I said and reacting vocally as a result of your misunderstanding. I didn't assert that every URL is sent. I _assume_ every URL is sent, because that's my "threat model" regarding this feature. I have no reason to trust them, and have every reason to _assume_ that every URL is sent if I use a feature like that.


Guess what. Nobody cares about your assumptions that are demonstrably untrue, as have been explained multiple times in this thread.

The Chromium project's Safe Browsing code is open source: https://source.chromium.org/chromium/chromium/src/+/main:com...

Of course given your "threat model" you are probably going to assume what's shipped in Chrome could be different from the open source code, but hey you are free to believe in whatever falsehoods you choose to believe in.


That's right. I am free to believe whatever I want, and I am free to share my thoughts directly related to the OP, in a sincere and respectful manner -- quite in contrast with how you've responded. Take your vitriolic sentiments elsewhere.


Dude they use predictive text in the address bar so your point is kinda moot. Yes, they DO see the address you are navigating to when done directly.

Just not in the way you are arguing about.


Given that Safe Browsing is used in other browsers, and that you can disable that feature and/or change your default search engine, yes it does matter.

Also, even if you are using Chrome with the default search settings, there are lots of cases where it would matter. I suspect people would care a lot more about Google logging the URL for their Amazon search for <embarrassing thing they purchase> than when they typed `amazon.com` in their URL bar to get to Amazon.


[flagged]


I didn't "understand" anything incorrectly -- I made zero attempt to understand the feature. How many times do I have to quote the word "assume"? I ASSUMED the worst without knowing anything about the feature, based on my prior experience with "protecting the user" features and the way megacorps treat their users. I thought "probably sends my stuff to their servers" and unchecked the box. Again, for the nth time, I have made no concrete assertion as to the actual functioning of the feature. The only misinformation here has been your repeated misrepresentation of what I communicated.


Sorry, let me restate my original post then:

I ASSUME that you're completely wrong about Safe Browsing and have no idea what you're talking about. I ASSUME you're intentionally posting misinformation on the internet. I ASSUME you're as evil as Adolf Hitler.

If you prove that what I ASSUMED is demonstrably false, I'll just fall back on saying it's an assumption so that makes being wrong completely ok (and it means that you are misrepresenting me when you call that out)!


Ah man... I mean, at no time did I indicate or suggest how Safe Browsing works. What I communicated to the world is that I am making a totally subjective judgement of a feature, perhaps totally ignorantly, without any suggestion of correlation with reality. For some reason, you missed this subtlety of me saying "I assume [thing does blah]". I'm not out here trying to convince anyone that Safe Browsing does bad things. I pretty clearly said I, myself, in my own head, made a subjective decision about the feature. That's all I said! Come back to this thread in a day or month or year and realize I was pretty forthright/honest and communicating in good faith the entire time.


Please stop misrepresenting what I said and reacting vocally as a result of your misunderstanding. I didn't assert that you were wrong. I _assume_ you have no idea what you're talking about.

I would appreciate if you stop acting like I was asserting something when I clearly was ASSUMING it, I even put ASSUME in all caps in my comment to make it clear.


These words you posted are a direct assertion: "when someone posts such blatantly false information", in regards to my comment. I didn't post false information, thus you misrepresented what I said. Please spend more time understanding the words people write before you respond to them in an inflammatory manner.


I clearly restated my original post here in a way that contains only assumptions and not assertions: https://hackertimes.com/item?id=33528294 - so I don't know why you're mad at me.

I can't edit my original post anymore, but just replace it in your mind with my edited version. Also, please spend more time understanding the words people write before you respond to them in an inflammatory manner.


It is also quite a ways away from "This isn't even remotely close to true.", which is what I felt the need to correct.


Yeah for the first thing, I said I assume it -- I wasn't actually making an assertion that all the requests get sent to a server. I can't realistically spend the effort to double-check every single thing like this (as in how badly every corp is spying on me), and it's substantially easier to just assume the worst and act accordingly.

Right, I'm sure the "Safe Browsing" stuff has prevented downloads of .exes that the user never would have run anyways (but I digress). Regardless, I still am not going to rely on shareholder-driven megacorps to decide what sites I can trust or not. They have proven they are not trustworthy themselves. To start, Google does things like hiding portions of the URL of the website you're on[0], making the web browsing experience even more opaque than it already is for the average user. In fact, Google has even engaged in phishing-like behaviour themselves, replacing an original website with their own AMP version, while showing the original URL[1].

Of course, this is all without even touching on the shocking depth of surveillance/tracking undertaken by Google.

Regarding exploits in the browser iself, in fact, the extremely massive surface area of web browser software is indeed absolutely a vector for malware. In this sense, Safe Browsing is a solution to a problem browser vendors created by nearly turning the web browser into an OS. By April this year, Chrome had already hit its third zero-day exploit[2] affecting its billions of users. By last month, it had hit its seventh[3]. Basically every month or two there's a new actively-exploited vulnerability in Chrome that may very well allow malware to be installed on your system by simply visiting a web page. If only trying to read some text on a website didn't mean risking having my home network turned into a botnet, or getting ransomware-locked...

[0] https://www.bleepingcomputer.com/news/google/google-chrome-h...

[1] https://www.androidpolice.com/2019/04/16/amp-pages-will-now-...

[2] https://www.forbes.com/sites/gordonkelly/2022/04/16/google-c...

[3] https://www.forbes.com/sites/daveywinder/2022/10/28/emergenc...


Except [0] has been demonstrated to actually be more useful for average users, as it hides parts of the url that are not actually relevant. I don't know what google's current UI for it is, but in safari by default https:// is not shown. Instead if you visit a http: site you get a completely different UI "not secure - example.com".

The reason for these UI changes is very simple: signaling secure vs insecure via a single character in a string that is meaningless to most people makes it essentially useless. Similarly making "secure" websites get a padlock is not helpful - it's essentially the same as a car engine light, except the engine light is surrounded by a dozen other lights, and it turns off when there's a problem. Unending amounts of research show that that kind of design means people will not notice problems.

This is not to say Google is a shining example of privacy engineering (see their core business model), but you're being incredibly dismissive of the work that their chrome and browser privacy engineering teams do.


I am definitely dismissive of the benefits of work that obfuscates or conceals the most concrete piece of information I require to know what site I am on: the complete URL, including schema, that I have been looking at on every single web page I visit since the early 90's.

IMO when a browser vendor does something like this, I'm not in a rush to sit there and read all their blog posts and give huge consideration to yet another megacorp imposing their will upon me. I just don't want my URL hidden, and I will definitely turn off features like this where possible. (n.b. I don't use Chrome, so its irrelevant for me)

Interesting. http "not secure", but I bet every piece of malware delivered to a user's machine via browser exploit was delivered on an https URL...

BTW, you make some assertions about the feature being more useful for average users -- can you provide any links/documents about this? I am very skeptical, though you are under no obligation to convince me (don't spend the time finding link(s) unless you specifically feel like it). I'd be curious, because I even feel doubtful that there's any solid measurement indicating hiding part of the URL has actually benefited anyone.


Ok, so it sounds like you misunderstand how the Safe Browsing system works.

If you want to be a provider of url blocking you do the following:

    1. Compile a list of "unsafe" urls (malware, phishing, cookie clicker, etc)
    2. Generate a SHA256 hash of every url in (1)
    3. Truncate those hashes to 32 bits (Because there are a lot of them :( )
    4. Publish that list
If you're writing a client that wants to use these providers you do this:

    1. Download the list published above
    2. Before you load a url calculate a SHA256 hash of that url
    3. Truncate that hash to 32 bits
    4. Check your giant list of hashes to see if there's a collision.
    5. If there is no collision you're done, and carry on the load as usual; otherwise
    6. Ask the source of step 1 to give you the full set of hashes corresponding to the prefix from (3)
    7. See if the hash from (2) is in the full hashes you get in (6).
    8. If there is no collision then load the url, otherwise display scary warnings.

You can see that at no point does the url being checked go in either direction.


Oh, no, I didn't even look into it. I only just tonight read the docs about these features in Chrome & Firefox for the first time. I see "modern-day NetNanny" and turn it off. I quite literally assume the worst and disable it.


> 3. Truncate those hashes to 32 bits (Because there are a lot of them :( )

Is it worth all this trouble just to get a Nx space saving?

Because if you just published the 256-bit hashes directly, you could skip the following steps entirely:

> 6. Ask the source of step 1 to give you the full set of hashes corresponding to the prefix from (3) > 7. See if the hash from (2) is in the full hashes you get in (6).

which are the parts that require an actual network call, infrastructure costs, and privacy issues. Speaking of which:

> You can see that at no point does the url being checked go in either direction.

You're not sending the whole URL, but in step (6) you're still telling the source that you're visiting a website belonging to a particular subset of hashes, hashes which the source knows the plaintext of.

I guess the question is exactly how large the set of 'unsafe' urls is. If it's merely in the billions, then (a) truncating the hashes will only get you a roughly ~4x space saving and (b) the 32-bit hashes you send will almost uniquely identify the website you tried to visit. If it's in the trillions, then it makes more sense.


This list needs to be redownloaded multiple times daily by a billion or so devices.


A billion is about half an order of magnitude off. The official homepage says five billion.

https://safebrowsing.google.com/


It's terrifying that a small change at a single URL controlled by a single national government can censor content on five billion machines.

Is there anything else on the entire internet so centralized?


The root DNS, or DNS for "com"


> 6. Ask the source of step 1 to give you the full set of hashes corresponding to the prefix from (3)

This is quite similar to sending the url. Like there might only be a single site with that hash. Or there might be a handful of which one is far more likely than the others because it has 1000x their dau.


The goal is to avoid leaking the url you actually loaded. In principle you could make a hash of every url and work back from there, but if you’re Google you already have much more effective ways of tracking what sites someone is visiting, that are “acceptable”, if Google or some other vendor were discovered to be attempting that I suspect there would be fairly substantial and immediate blowback.

There are obviously ways that you could improve the privacy aspects of this - you could have two versions of the database hashing urls with slightly different initial conditions. A client can keep one up to date and only fetch the other when the get a collision. Assuming SHA256 is pretty good then we can just multiply the false positive probabilities (I think?).

And alternative would be for the client to request multiple regions at a time.


If you login into a Google account (Gmail, YouTube, Blogger, Google drive, etc...) and have not adjusted your personal preferences (which even then may not mean much), they are tracking every place you go and more. The amount of information they are collecting on people is astounding. To include they will lock people out of their accounts, for whatever, and keep the information as long as they want or hand it over to whoever 3rd parties that they please.


Firefox on Android says this:

> has been reported as a deceptive site and has been blocked based on your security preferences.

I checked its settings page and saw no such preference, and about:config is a blank page there. So where is this alleged preference?


There's a setting on the Firefox desktop version; I guess it's not exposed in the Android version for some reason. These mobile phone browsers tend to be "dumbed-down" in general. Not hard to imagine it just re-uses the same error message from the desktop version, probably just as a simple oversight.


Yeah it's probably an oversight, but that oversight feels particularly irresponsible given that it completely blocks the ability to access web sites (the primary function of a web browser) based on a third-party block list which can't be disabled.


Remember that visited sites are also phoned home to google when the setting is on.


Last time I checked, which to be fair is a couple years ago now, only a partial hash is shared with google.


Sure thats true? It would be trivial to just package the whole block list locally.


They removed about:config from android firefox builds except for the nightly builds. Mildly enraging, that.


I don't know, I got that exact error and searched the settings. I couldn't find that preference either.


maybe the "safe browsing" setting that uses a list from google to block sites (which I always disable)... Mozilla is in bed with Google on so many levels...


On Firefox Android there is no such option displayed in the settings.


It is hidden in about:config, I believe....


Mozilla explicitly disables about:config in the stable release of Firefox on Android. Works fine in the beta and nightly releases though.

As for why they disabled it, no clue. Just bitter, disdainful speculation.


What good is that? 99.9% or more of users will never visit this page or know what to do with it, much less be able to figure out which knob to toggle.

It's a product design failure to say "blocked because your configuration says so!" and then hide it from settings.


Firefox for Android sucks... but it sucks less then the alternatives, as far as I know


This isn't just Firefox for android. Try visiting a previously HSTS enabled site with a now expired certificate. No option for the user to say "I don't really care that much about the MITM risk, just show me the damned page" even with desktop Firefox. Absolutely mindboggling disrespect for user agency for a so-called user agent.


Google has also been blocking access to a lot (read: more than 1) of Yunohost selfhosted servers lately: https://forum.yunohost.org/t/google-flags-my-sites-as-danger...

My server hosted at home got flagged. After 2 weeks of submitting appeals every 3-4 days it was finally reviewed and the flag was removed.

How did we let megacorps have so much control over who can access our websites?!


Good to see more people realizing the kind of control browser makers give Google via this misfeature. It should not be acceptable for an unaccountable for-profit corporation to decide what the average person can or cannot see without any due process for site operators. Yet here we are. Get you software flagged by machine-learning algorithms? Too bad, pretty much every browser will prevent users from downloading it. And yes, prevent - even those browsers that still provide an override hide it well enough that many users will not find it. Appeal? If you are lucky and Google doesn't just deny it without reason then look forward to getting re-listed in a couple of weeks.


I saw a similar aggressively red warning on Mobile Safari. Apparently it uses the same database of "deceptive" pages.

I turned off "Fraudulent Website Warning" in Settings and then I was able to access your site.


Yes, sadly all browser's just trust Google even when Google has shown that they don't care about false positives.


This thread has been a good reminder to go check your settings in about:config. I was unaware that firefox had turned one single option into several which seem to be on by default ignoring my previous change.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: