I got into the self-hosting scene this year when I wanted to start up my own website run on old recycled thinkpad. A lot of time was spent learning about ufw, reverse proxies, header security hardening, fail2ban.

Despite all that I still had a problem with bots knocking on my ports spamming my logs. I tried some hackery getting fail2ban to read caddy logs but that didnt work for me. I nearly considered giving up and going with cloudflare like half the internet does. But my stubbornness for open source self hosting and the recent cloudflare outages this year have encouraged trying alternatives.

Coinciding with that has been an increase in exposure to seeing this thing in the places I frequent like codeberg. This is Anubis, a proxy type firewall that forces the browser client to do a proof-of-work security check and some other nice clever things to stop bots from knocking. I got interested and started thinking about beefing up security.

I’m here to tell you to try it if you have a public facing site and want to break away from cloudflare It was VERY easy to install and configure with caddyfile on a debian distro with systemctl. In an hour its filtered multiple bots and so far it seems the knocks have slowed down.

https://anubis.techaro.lol/

My botspam woes have seemingly been seriously mitigated if not completely eradicated. I’m very happy with tonights little security upgrade project that took no more than an hour of my time to install and read through documentation. Current chain is caddy reverse proxy -> points to Anubis -> points to services

Good place to start for install is here

https://anubis.techaro.lol/docs/admin/native-install/

  • non_burglar@lemmy.world
    link
    fedilink
    English
    arrow-up
    122
    arrow-down
    1
    ·
    9 hours ago

    Anubis is an elegant solution to the ai bot scraper issue, I just wish the solution to everything wasn’t just spending compute everywhere. In a world where we need to rethink our energy consumption and generation, even on clients, this is a stupid use of computing power.

    • Leon@pawb.social
      link
      fedilink
      English
      arrow-up
      73
      arrow-down
      4
      ·
      edit-2
      9 hours ago

      It also doesn’t function without JavaScript. If you’re security or privacy conscious chances are not zero that you have JS disabled, in which case this presents a roadblock.

      On the flip side of things, if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.

      No hate on Anubis. Quite the opposite, really. It just sucks that we need it.

      • SmokeyDope@piefed.socialOP
        link
        fedilink
        English
        arrow-up
        36
        ·
        edit-2
        8 hours ago

        Theres a compute option that doesnt require javascript. The responsibility lays on site owners to properly configure IMO, though you can make the argument its not default I guess.

        https://anubis.techaro.lol/docs/admin/configuration/challenges/metarefresh

        From docs on Meta Refresh Method

        Meta Refresh (No JavaScript)

        The metarefresh challenge sends a browser a much simpler challenge that makes it refresh the page after a set period of time. This enables clients to pass challenges without executing JavaScript.

        To use it in your Anubis configuration:

        # Generic catchall rule
        - name: generic-browser
          user_agent_regex: >-
            Mozilla|Opera
          action: CHALLENGE
          challenge:
            difficulty: 1 # Number of seconds to wait before refreshing the page
            algorithm: metarefresh # Specify a non-JS challenge method
        

        This is not enabled by default while this method is tested and its false positive rate is ascertained. Many modern scrapers use headless Google Chrome, so this will have a much higher false positive rate.

        • z3rOR0ne@lemmy.ml
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 hours ago

          Yeah I actually use the noscript extension and i refuse to just whitelist certain sites unless I’m very certain I trust them.

          I run into Anubis checks all the time and while I appreciate the software, having to consistently temporarily whitelist these sites does get cumbersome at times. I hope they make this noJS implementation the default soon.

      • cecilkorik@piefed.ca
        link
        fedilink
        English
        arrow-up
        7
        ·
        8 hours ago

        if you are a creator and you’d prefer to not make use of JS (there’s dozens of us) then forcing people to go through a JS “security check” feels kind of shit. The alternative is to just take the hammering, and that feels just as bad.

        I’m with you here. I come from an older time on the Internet. I’m not much of a creator, but I do have websites, and unlike many self-hosters I think, in the spirit of the internet, they should be open to the public as a matter of principle, not cowering away for my own private use behind some encrypted VPN. I want it to be shared. Sometimes that means taking a hammering. It’s fine. It’s nothing that’s going to end the world if it goes down or goes away, and I try not to make a habit of being so irritating that anyone would have much legitimate reason to target me.

        I don’t like any of these sort of protections that put the burden onto legitimate users. I get that’s the reality we live in, but I reject that reality, and substitute my own. I understand that some people need to be able to block that sort of traffic to be able to limit and justify the very real costs of providing services for free on the Internet and Anubis does its job for that. But I’m not one of those people. It has yet to cost me a cent above what I have already decided to pay, and until it does, I have the freedom to adhere to my principles on this.

        To paraphrase another great movie: Why should any legitimate user be inconvenienced when the bots are the ones who suck. I refuse to punish the wrong party.

      • Nate Cox@programming.dev
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        17
        ·
        9 hours ago

        I feel comfortable hating on Anubis for this. The compute cost per validation is vanishingly small to someone with the existing budget to run a cloud scraping farm, it’s just another cost of doing business.

        The cost to actual users though, particularly to lower income segments who may not have compute power to spare, is annoyingly large. There are plenty of complaints out there about Anubis being painfully slow on old or underpowered devices.

        Some of us do actually prefer to use the internet minus JS, too.

        Plus the minor irritation of having anime catgirls suddenly be a part of my daily browsing.

    • cadekat@pawb.social
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      1
      ·
      9 hours ago

      Scarcity is what powers this type of challenge: you have to prove you spent a certain amount of electricity in exchange for access to the site, and because electricity isn’t free, this imposes a dollar cost on bots.

      You could skip the detour through hashes/electricity and do something with a proof-of-stake cryptocurrency, and just pay for access. The site owner actually gets compensated instead of burning dead dinosaurs.

      Obviously there are practical roadblocks to this today that a JavaScript proof-of-work challenge doesn’t face, but longer term…

      • Nate Cox@programming.dev
        link
        fedilink
        English
        arrow-up
        10
        arrow-down
        4
        ·
        8 hours ago

        The cost here only really impacts regular users, too. The type of users you actually want to block have budgets which easily allow for the compute needed anyways.

        • chicken@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          9
          ·
          6 hours ago

          I think maybe they wouldn’t if they are trying to scale their operations to scanning through millions of sites and your site is just one of them

          • cadekat@pawb.social
            link
            fedilink
            English
            arrow-up
            9
            ·
            6 hours ago

            Yeah, exactly. A regular user isn’t going to notice an extra few cents on their electricity bill (boiling water costs more), but a data centre certainly will when you scale up.