• r00ty@kbin.life
    link
    fedilink
    arrow-up
    4
    ·
    11 hours ago

    Yeah, I probably should look to see if there’s any good plugins that do this on some community submission basis. Because yes, it’s a pain to keep up with whatever trick they’re doing next.

    And unlike web crawlers that generally check a url here and there, AI bots absolutely rip through your sites like something rabid.

    • Admiral Patrick@dubvee.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      11 hours ago

      AI bots absolutely rip through your sites like something rabid.

      SemrushBot being the most rabid from my experience. Just will not take “fuck off” as an answer.

      That looks pretty much like how I’m doing it, also as an include for each virtual host. The only difference is I don’t even bother with a 403. I just use Nginx’s 444 “response” to immediately close the connection.

      Are you doing the IP blocks also in Nginx or lower at the firewall level? Currently I’m doing it at firewall level since many of those will also attempt SSH brute forces (good luck since I only use keys, but still…)

      • r00ty@kbin.life
        link
        fedilink
        arrow-up
        3
        ·
        11 hours ago

        So on my mbin instance, it’s on cloudflare. So I filter the AS numbers there. Don’t even reach my server.

        On the sites that aren’t behind cloudflare. Yep it’s on the nginx level. I did consider firewall level. Maybe just make a specific chain for it. But since I was blocking at the nginx level I just did it there for now. I mean it keeps them off the content, but yes it does tell them there’s a website there to leech if they change their tactics for example.

        You need to block the whole ASN too. Those that are using chrome/firefox UAs change IP every 5 minutes from a random other one in their huuuuuge pools.