Incoherent rant.
I’ve, once again, noticed Amazon and Anthropic absolutely hammering my Lemmy instance to the point of the lemmy-ui container crashing. Multiple IPs all over the US.
So I’ve decided to do some restructuring of how I run things. Ditched Fedora on my VPS in favour of Alpine, just to start with a clean slate. And started looking into different options on how to combat things better.
Behold, Anubis.
“Weighs the soul of incoming HTTP requests to stop AI crawlers”
From how I understand it, it works like a reverse proxy per each service. It took me a while to actually understand how it’s supposed to integrate, but once I figured it out all bot activity instantly stopped. Not a single one got through yet.
My setup is basically just a home server -> tailscale tunnel (not funnel) -> VPS -> caddy reverse proxy, now with anubis integrated.
I’m not really sure why I’m posting this, but I hope at least one other goober trying to find a possible solution to these things finds this post.
Edit: Further elaboration for those who care, since I realized that might be important.
- You don’t have to use caddy/nginx/whatever as your reverse proxy in the first place, it’s just how my setup works.
- My Anubis sits between my local server and inside Caddy reverse proxy docker compose stack. So when a request is made, Caddy redirects to Anubis from its Caddyfile and Anubis decides whether or not to forward the request to the service or stop it in its tracks.
- There are some minor issues, like it requiring javascript enabled, which might get a bit annoying for NoScript/Librewolf/whatever users, but considering most crawlbots don’t do js at all, I believe this is a great tradeoff.
- The most confusing part were the docs and understanding what it’s supposed to do in the first place.
- There’s an option to apply your own rules via json/yaml, but I haven’t figured out how to do that properly in docker yet. As in, there’s a main configuration file you can override, but there’s apparently also a way to add additional bots to block in separate files in a subdirectory. I’m sure I’ll figure that out eventually.
Cheers and I really hope someone finds this as useful as I did.
Could you elaborate on how it’s ableist?
As far as I’m aware, not only are they making a version that doesn’t even require JS, but the JS is only needed for the challenge itself, and the browser can then view the page(s) afterwards entirely without JS being necessary to parse the content in any way. Things like screen readers should still do perfectly fine at parsing content after the browser solves the challenge.