Incoherent rant.

I’ve, once again, noticed Amazon and Anthropic absolutely hammering my Lemmy instance to the point of the lemmy-ui container crashing. Multiple IPs all over the US.

So I’ve decided to do some restructuring of how I run things. Ditched Fedora on my VPS in favour of Alpine, just to start with a clean slate. And started looking into different options on how to combat things better.

Behold, Anubis.

“Weighs the soul of incoming HTTP requests to stop AI crawlers”

From how I understand it, it works like a reverse proxy per each service. It took me a while to actually understand how it’s supposed to integrate, but once I figured it out all bot activity instantly stopped. Not a single one got through yet.

My setup is basically just a home server -> tailscale tunnel (not funnel) -> VPS -> caddy reverse proxy, now with anubis integrated.

I’m not really sure why I’m posting this, but I hope at least one other goober trying to find a possible solution to these things finds this post.

Anubis Github, Anubis Website

Edit: Further elaboration for those who care, since I realized that might be important.

  • You don’t have to use caddy/nginx/whatever as your reverse proxy in the first place, it’s just how my setup works.
  • My Anubis sits between my local server and inside Caddy reverse proxy docker compose stack. So when a request is made, Caddy redirects to Anubis from its Caddyfile and Anubis decides whether or not to forward the request to the service or stop it in its tracks.
  • There are some minor issues, like it requiring javascript enabled, which might get a bit annoying for NoScript/Librewolf/whatever users, but considering most crawlbots don’t do js at all, I believe this is a great tradeoff.
  • The most confusing part were the docs and understanding what it’s supposed to do in the first place.
  • There’s an option to apply your own rules via json/yaml, but I haven’t figured out how to do that properly in docker yet. As in, there’s a main configuration file you can override, but there’s apparently also a way to add additional bots to block in separate files in a subdirectory. I’m sure I’ll figure that out eventually.

Cheers and I really hope someone finds this as useful as I did.

  • BakedCatboy@lemmy.ml
    link
    fedilink
    English
    arrow-up
    44
    ·
    23 hours ago

    Fwiw Anubis is adding a nojs meta refresh challenge that if it doesn’t have issues will soon be the new default challenge

    • dan@upvote.au
      link
      fedilink
      English
      arrow-up
      2
      ·
      17 hours ago

      Won’t the bots just switch to using that instead of the heavier JS challenge?

      • Sekoia@lemmy.blahaj.zone
        link
        fedilink
        English
        arrow-up
        12
        ·
        17 hours ago

        They can, but it’s not trivial. The challenge uses a bunch of modern browser features that these scrapers don’t use, regarding metadata and compression and a few other things. Things that are annoying to implement and not worth the effort. Check the recent discussion on lobste.rs if you’re interested in the exact details.