Google flags Immich sites as dangerous

(immich.app)

205 points | by janpio 4 hours ago

15 comments

  • arccy 2 hours ago
    If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ . That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....
    • CaptainOfCoit 29 minutes ago
      I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.

      I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.

    • aftbit 4 minutes ago
      I thought this story would be about some malicious PR that convinced their CI to build a page featuring phishing, malware, porn, etc. It looks like Google is simply flagging their legit, self-created Preview builds as being phishing, and banning the entire domain. Getting immich.cloud on the PSL is probably the right thing to do for other reasons, and may decrease the blast radius here.
    • ggm 20 minutes ago
      I think this only is true if you host independent entities. If you simply construct deep names about yourself with demonstrable chain of authority back, I don't think the PSL wants to know. Otherwise there is no hierarchy the dots are just convenience strings and it's a flat namespace the size of the PSLs length.
    • o11c 1 hour ago
      Is that actually relevant when only images are user content?

      Normally I see the PSL in context of e.g. cookies or user-supplied forms.

      • dspillett 1 minute ago
        > Is that actually relevant when only images are user content?

        Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.

        Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.

    • r_lee 40 minutes ago
      Does Google use this for Safe Browsing though?
    • andrewstuart2 1 hour ago
      Aw. I saw Jothan Frakes and briefly thought my favorite Starfleet first officer's actor had gotten into writing software later in life.
  • NelsonMinar 2 hours ago
    Be sure to see the team's whole list of Cursed Knowledge. https://immich.app/cursed-knowledge
    • levkk 1 hour ago
      The Postgres query parameters one is funny. 65k parameters is not enough for you?!
      • strken 1 hour ago
        As it says, bulk inserts with large datasets can fail. Inserting a few thousand rows into a table with 30 columns will hit the limit. You might run into this if you were synchronising data between systems or running big batch jobs.

        Sqlite used to have a limit of 999 query parameters, which was much easier to hit. It's now a roomy 32k.

        • tym0 47 minutes ago
          Right, for postgres I would use unnest for inserting a non-static amount of rows.
        • evertedsphere 1 hour ago
          COPY is often a usable alternative.
  • heavyset_go 13 minutes ago
    Insane that one company can dictate what websites you're allowed to visit. Telling you what apps you can run wasn't far enough.
  • kevinsundar 2 hours ago
    This may not be a huge issue depending on mitigating controls but are they saying that anyone can submit a PR (containing anything) to Immich, tag the pr with `preview` and have the contents of that PR hosted on https://pr-<num>.preview.internal.immich.cloud?

    Doesn't that effectively let anyone host anything there?

    • daemonologist 2 hours ago
      I think only collaborators can add labels on github, so not quite. Does seem a bit hazardous though (you could submit a legit PR, get the label, and then commit whatever you want?).
      • ajross 1 hour ago
        Exposure also extends not just to the owner of the PR but anyone with write access to the branch from which it was submitted. GitHub pushes are ssh-authenticated and often automated in many workflows.
    • warkdarrior 2 hours ago
      Excellent idea for cost-free phishing.
  • trollbridge 1 hour ago
    A friend / client of mine used some kind of WordPress type of hosting service with a simple redirect. The host got on the bad sites list.

    This also polluted their own domain, even when the redirect was removed, and had the odd side effect that Google would no longer accept email from them. We requested a review and passed it, but the email blacklist appears to be permanent. (I already checked and there are no spam problems with the domain.)

    We registered a new domain. Google’s behaviour here incidentally just incentivises bulk registering throwaway domains, which doesn’t make anything any better.

    • donmcronald 1 hour ago
      Wow. That scares me. I've been using my own domain that got (wrongly) blacklisted this week for 25 years and can't imagine having email impacted.
  • akerl_ 49 minutes ago
    Tangential to the flagging issue, but is there any documentation on how Immich is doing the PR site generation feature? That seems pretty cool, and I'd be curious to learn more.
  • ggm 19 minutes ago
    Is there any linkage to the semifactoid that immich Web gui looks very like Google Photos or is that just one of the coincidences?
  • Animats 2 hours ago
    If you block those internal subdomains from search with robots.txt, does Google still whine?
    • snailmailman 1 hour ago
      I’ve heard anecdotes of people using an entirely internal domain like “plex.example.com” even if it’s never exposed to the public internet, google might flag it as impersonating plex. Google will sometimes block it based only on name, if they think the name is impersonating another service.

      Its unclear exactly what conditions cause a site to get blocked by safe browsing. My nextcloud.something.tld domain has never been flagged, but I’ve seen support threads of other people having issues and the domain name is the best guess.

      • donmcronald 1 hour ago
        I'm almost positive GMail scanning messages is one cause. My domain got put on the list for a URL that would have been unknowable to anyone but GMail and my sister who I invited to a shared Immich album. It was a URL like this that got emailed directly to 1 person:

        https://photos.example.com/albums/xxxxxxxx-xxxx-xxxx-xxxx-xx...

        Then suddenly the domain is banned even though there was never a way to discover that URL besides GMail scanning messages. In my case, the server is public so my siblings can access it, but there's nothing stopping Google from banning domains for internal sites that show up in emails they wrongly classify as phishing.

        Think of how Google and Microsoft destroyed self hosted email with their spam filters. Now imagine that happening to all self hosted services via abuse of the safe browsing block lists.

        • beala 45 minutes ago
          It doesn’t seem like email scanning is necessary to explain this. It appears that simply having a “bad” subdomain can trigger this. Obviously this heuristic isn’t working well, but you can see the naive logic of it: anything with the subdomain “apple” might be trying to impersonate Apple, so let’s flag it. This has happened to me on internal domains on my home network that I've exposed to no one. This also has been reported at the jellyfin project: https://github.com/jellyfin/jellyfin-web/issues/4076
        • EdwardKrayer 35 minutes ago
          Well, that's potentially horrifying. I would love for someone to attempt this in as controlled of a manner as possible. I would assume it's possible for anyone using Google DNS servers to also trigger some type of metadata inspection resulting in this type of situation as well.

          Also - when you say banned, you're speaking of the "red screen of death" right? Not a broader ban from the domain using Google Workplace services, yeah?

        • r_lee 35 minutes ago
          if it was just the domain, remember that there is a Cert Transparency log for all TLS certs issued nowadays by valid CAs, which is probably what Google is also using to discover new active domains
        • im3w1l 39 minutes ago
          Chrome sends visited urls to Google (ymmv depending on settings and consents you have given)
  • captnasia 2 hours ago
    This seems related to another hosting site that got caught out by this recently:

    https://news.ycombinator.com/item?id=45538760

    • o11c 1 hour ago
      Not quite the same (other than being an abuse of the same monopoly) since this one is explicitly pointing to first-party content, not user content.
  • jakub_g 1 hour ago
    Regarding how Google safe browsing actually works under the hood, here is a good writeup from Chromium team:

    https://blog.chromium.org/2021/07/m92-faster-and-more-effici...

    Not sure if this is exactly the scenario from the discussed article but it's interesting to understand it nonetheless.

    TL;DR the browser regularly downloads a dump of color profile fingerprints of known bad websites. Then when you load whatever website, it calculates the color profile fingerprint of it as well, and looks for matches.

    (This could be outdated and there are probably many other signals.)

  • jstrong 24 minutes ago
    google: we make going to the DMV look delightful by comparison!
    • elphinstone 18 minutes ago
      They are not the government and should not have this vast, unaccountable monopoly power with no accountability and no customer service.
  • donmcronald 3 hours ago
    I tried to submit this, but the direct link here is probably better than the Reddit thread I linked to:

    https://old.reddit.com/r/immich/comments/1oby8fq/immich_is_a...

    I had my personal domain I use for self-hosting flagged. I've had the domain for 25 years and it's never had a hint of spam, phishing, or even unintentional issues like compromised sites / services.

    It's impossible to know what Google's black box is doing, but, in my case, I suspect my flagging was the result of failing to use a large email provider. I use MXRoute for locally hosted services and network devices because they do a better job of giving me simple, hard limits for sending accounts. That way if anything I have ever gets compromised, the damage in terms of spam will be limited to (ex) 10 messages every 24h.

    I invited my sister to a shared Immich album a couple days ago, so I'm guessing that GMail scanned the email notifying her, used the contents + some kind of not-google-or-microsoft sender penalty, and flagged the message as potential spam or phishing. From there, I'd assume the linked domain gets pushed into another system that eventually decides they should blacklist the whole domain.

    The thing that really pisses me off is that I just received an email in reply to my request for review and the whole thing is a gas-lighting extravaganza. Google systems indicate your domain no longer contains harmful links or downloads. Keep yourself safe in the future by blah blah blah blah.

    Umm. No! It's actually Google's crappy, non-deterministic, careless detection that's flagging my legitimate resources as malicious. Then I have to spend my time running it down and double checking everything before submitting a request to have the false positive mistake on Google's end fixed.

    Convince me that Google won't abuse this to make self hosting unbearable.

    • akerl_ 50 minutes ago
      > I suspect my flagging was the result of failing to use a large email provider.

      This seems like the flagging was a result of the same login page detection that the Immich blog post is referencing? What makes you think it's tied to self-hosted email?

    • foobarian 58 minutes ago
      Wonder if there would be any way to redress this in small claims court.
  • renewiltord 1 hour ago
    I think the other very interesting thing in the reddit thread[0] for this is that if you do well-known-domain.yourdomain.tld then you're likely to get whacked by this too. It makes sense I guess. Lots of people are probably clicking gmail.shady.info and getting phished.

    0: https://old.reddit.com/r/immich/comments/1oby8fq/immich_is_a...

    • donmcronald 1 hour ago
      So we can't use photos or immich or images or pics as a sub-domain, but anything nondescript will be considered obfuscated and malicious. Awesome!
  • 7363288236973 1 hour ago
    [dead]
  • nautilus12 2 hours ago
    [flagged]
    • ocdtrekkie 2 hours ago
      As someone who doesn't like Google and absolutely thinks they need to be broken up, no probably not. Google's algorithms around security are so incompetent and useless that stupidity is far more likely than malice here.
      • o11c 1 hour ago
        Incompetently or "coincidentally" abusing your monopoly in a way that "happens" to suppress competitors (while whitelisting your own sites) probably won't fly in court. Unless you buy the judge of course.

        Intent does not always matter to the law ... and if a C&D is sent, doesn't that imply that intent is subsequently present?

        Defamation laws could also apply independently of monopoly laws.

      • dare944 1 hour ago
        Callous disregard for the wellbeing of others is not stupidity, especially when demonstrated by a company ostensibly full of very intelligent people. This behavior - in particular, implementing an overly eager mechanism for damaging the reputation of other people - is simply malicious.