A tiny mouse, a hacker.

See here for an introduction, and my link tree for socials.

  • 0 Posts
  • 37 Comments
Joined 2 years ago
cake
Cake day: December 24th, 2023

help-circle

  • I… have my doubts. I do not doubt that a wider variety of poisoned data can improve training, by implementing new ways to filter out unusable training data. In itself, this would, indeed, improve the model.

    But in many cases, the point of poisoning is not to poison the data, but to deny the crawlers access to the real work (and provide an opportunity to poison their URL queue, which is something I can demonstrate as working). If poison is served instead of the real content, that will hurt the model, because even if it filters out the junk, it will have access to less new data to train on.



  • I had a short tootstorm about this, because oh my god, this is some terribly ineffective, useless piece of nothing.

    For one, Poison Fountain tells us to join the war effort and cache responses. Okay…

     curl -i https://rnsaffn.com/poison2/ --compressed -s
    HTTP/2 200
    content-disposition: inline
    content-encoding: gzip
    content-type: text/plain; charset=utf-8
    x-content-type-options: nosniff
    content-length: 959
    date: Sun, 11 Jan 2026 21:17:36 GMT
    
    

    Yeaah… how am I supposed to cache this? Do I cache one response and then continue serving that for the 50+ million crawlers that visit my sites every day? And you think a single, repetitive thing will poison anything at all? Really?

    Then, the Poison Fountain explanation goes on to explain that serving garbage to the crawlers will end up in the training data. I’m fairly sure the person who set this up never worked with model training, because this is not what happens. Not even the AI companies are that clueless, they do not train on anything and everything, they do filter it down.

    And what this fountain provides, is trivial to filter.

    It’s also mighty hard to set up! It’s not just a reverse_proxy https://rnsaffn.com/posion2, because then you leak all the headers you got. No, you have to make a sanitized request that doesn’t leak data. Good luck!

    Meanwhile, there are a gazillion of self-hostable garbage generators and tarpits that you can literally shove in a docker container and reverse proxy tarpit URLs to them, safely, locally. Much more efficient, far more effective. And, seeing as this is practically uncacheable, if I were to use it, I’d have to send all the shit that hits my servers, their way. As far as I can tell, this is a single Linode server. It probably wouldn’t crumble under my 50 million requests / day, but if ten more people would join the “war effort” without caching, my well educated guess is that it would fall over and die.

    Besides, we have no idea whether poisoning works. We can’t measure that. What we can measure, is the load on our servers, and this helps fuck all in that regard. The bots will still come, they’ll still hit everything, and I’d have additional load due to the network traffic between my server and theirs (remember: the returned response provides no sane indicators that’d allow caching while keeping the responses useful for poisoning purposes).

    Not only is this ineffective in poisoning, it’s not usable at all in its current state. And they call for joining the war effort. C’mon.




  • Whats are pros of XMPP?

    Pros of XMPP is that I can fully self host it, it can do video & audio calls too, and has good clients that aren’t just a webpage wrapped in Blink (aka, Electron). Matrix is a pain in the ass to self host, especially if I don’t want to federate. My XMPP server is private, friends & family can use it, and that’s it. That’s what I needed, and it delivered perfectly. It does End-to-End encryption. It is weaker than Signal, for sure, but it’s enough for what I need it for. In short: it’s reasonably simple to self host, has good, usable clients for both platforms I care about (Linux & Android), we can chat, we can have group chats, we can have audio & video calls.

    Also could u tell me about self hosting cost and time you spend on it?

    Well, I’ve been self-hosting since about 1998, so the time I spend on it nowadays is very little. One of my servers has been running for ~4 years without any significant change. I upgrade it once in a while, tweak my spam filters once a week or so, and go my merry way. I haven’t rebooted it in… checks uptime 983 days. Maybe I should. My other, newer server, is only about a year old - it took a LOT of time to set that up, and the first few months required a lot of time. But that was because I switched from Debian to NixOS, and had to figure out a lot of stuff. Nowadays, I run just sys update && just sys deploy (at home, on my desktop pc), and both my tiny VPS and my homelab is upgraded. I do tweak it from time to time - because I want to, and I enjoy doing so. I don’t have to. Strictly necessary maintenance time is about an hour a week if I try to be a good sysadmin, ~10-15 minutes otherwise. It Just Works™.

    As for costs: my setup is… complicated. I have a 2014-era Mac Mini in my home office, which hosts half my self-hosted things (Miniflux, Atuin server, EteBase, Grafana, Prometheus, ntfy, readeck, vaultwarden, victorialogs, and postgres to serve as a database for many of these). It’s power consumption is inconsequential, and the network traffic is negligible too - in a large part because I’m the primary user of it anyway. It is not connected to the public internet directly, however: I have an €5/month tiny VPS I rented from Hetzner, that fronts for it. The VPS runs WireGuard, and fronts the services on the Mac Mini through Caddy. iocaine takes care of the scrapers and other web-based annoyances (so hardly anything reaches my backend), unbound provides a resolver for my infra, vector ferries select logs from the VPS to VictoriaLogs in my homelab, and I’m running HAProxy to front for stuff Caddy isn’t good for (ie, anything other than http).

    Oh, yeah, I forgot… we have poweroutages here every once in a while, so I have to turn the mac mini back on once a month or so. It happens so rarely that I didn’t set up proper Clang + Tavis-based LUKS unlocking, so I have to plug a monitor and a keyboard in. It didn’t reach a level of annoying to make me address it properly.

    A bunch of my other services (GoToSocial, Forgejo + Forgejo runner, Minio [to be replaced with SeaweedFS or Garage], and my email) are still on an old server, because the mac mini doesn’t have enough juice to run them along with everything else it is already running. I plan to buy a refurbished ThinkCentre or similar, and host these in my homelab too. That’s going to be a notable up front cost, but as I plan to run the same thing for a decade, it will be a lot cheaper than paying for a similarly sized VPS for 10 years. The expensive part of this is storage (I have a lot of Stuff™), but only comparatively.

    By far the most expensive part of my self-hosting are backups. I like to have at least two backups (so three copies total, including the original) of important things, and that’s not cheap - I have a lot of data to backup (granted, that includes my music, photo & media library, both of which are large).


    • Music -> Navidrome / mpd + various clients
    • Google maps -> When we’re driving, I have an offline GPS. Otherwise CoMaps.
    • Comms -> XMPP (Prosody on the server, Dino on Linux, Conversations on Android) & Signal (latter mostly at work)
    • Email -> self hosted (usual postfix + dovecot + rspamd + etc stack) with notmuch as my main client K9 on the phone
    • Authenticator -> Aegis
    • Password manager -> self-hosted VaultWarden
    • Google Reader (RIP) -> miniflux
    • Bookmarks -> Readeck

  • I will not recommend switching to NixOS and declarative configuration. I will not recommend switching to NixOS and declarative configuration. I will not recommend switching to NixOS and declarative configuration.

    …fuck. I failed the saving throw. I’m sorry.

    Do look into Ansible, and the whole configuration management topic, though.



  • I have an unfederated XMPP server (running Prosody), family’s using Conversations (Android) & Dino (Linux) with it. We can chat, send images, do voice & video calls. Has been working fine & reliably for the past ~6 years or so. Took about 1.5 minutes for them to get used to the clients.

    I’m slowly opening it up for friends too, so friends, neighbours, classmates, etc can chat with us too. It’s going great so far, no complaints.


  • We pay more for ingress of logs than service uptime

    I cried on this part, it hit home so hard. My homelab went down a couple of months ago, when Chinese LLM scrapers hit me with a wave of a few thousand requests per second. It didn’t go down because my services couldn’t serve a few k requests/second - they could, without batting an eye. However, every request also produced a log, which was sent over to my VictoriaLogs, behind a WireGuard tunnel, running on an overloaded 2014-era Mac Mini. VictoriaLogs could kind of maybe handle it, but the amount of traffic on the WireGuard tunnel saturated my connection at home, which meant that the fronting VPS started to buffer them, and that cascaded into disaster.



    1. Email

    I self-host my email using postfix, dovecot, rspamd and others. The only tradeoff I had to make here is that some of the entities I have to communicate with via email use an allow-list, so some of my outgoing mail is sent through a relay (SMTP2Go).

    1. Cloud storage / file sync

    I self-host a minio for cloud storage. I don’t need file sync, so nothing there. If I would, I would likely use syncthing.

    1. Maps & navigation

    OpenStreetMaps & CoMaps. Works much better than Google Maps did.

    1. Search engine

    Currently a self-hosted YaCy. I have my own index. Not entirely happy with this setup, will switch to something else (still self-hosted, I have no need for a general purpose search engine that indexes the entire internet of slop).

    1. Web browser

    LibreWolf

    1. Calendar

    I’m using Emacs & Org for most calendaring. Wife’s using GNOME Calendar & a Calendar app I found for her on f-droid (unsure which one).

    1. Contacts management

    Nothing on desktop, some random contacts app from f-droid on the phone. I do use EteSync to keep a backup, and potentially sync later. (EteSync syncs her calendar too)

    1. Notes / to-do lists

    Emacs & Org.

    1. Office suite (docs, spreadsheets, etc.)

    Most of my “office” needs are covered by a combination of Emacs, Typst and Zola one way or another. For the rare case where I need Office compatiblity: LibreOffice.

    1. Messaging / chat

    XMPP. Dino on Linux, Conversations on Android. I use Matrix too, from time to time (Element), and have Signal too. Not a big fan of the latter two, because it isn’t practical to self-host those.

    1. Video calling

    XMPP. Dino & Conversations. If I need to video call with someone else, I’ll use whatever they use, usually.

    1. Social media / microblogging RSS reader / news

    For social media, the Fediverse is my only social media. I’m using Tuba on desktop, Tusky on the phone for it. For RSS, self-hosted Miniflux. For Lemmy, the web ui on desktop, Voyager on phone.

    1. Music streaming / podcast app

    Lollypop & Shortwave.

    1. Video streaming / YouTube alternative

    FreeTube or yt-dlp if I need to watch youtube, PeerTube otherwise.

    1. Password manager

    Bitwarden (via a self-hosted Vaultwarden on the server side).

    1. VPN / DNS / Firewall

    The only VPN I use is WireGuard between my systems, but I don’t tunnel everything through it. For DNS, I’m using unbound on my VPS, which in turn dispatches to Quad9. Firewall? nftables.

    1. Launcher / Android OS (if you use custom ROMs)

    I haven’t de-googled my phone, because my bank app refuses to work on rooted phones, and I unfortunately need that for the bank’s 2FA. No, I am not changing banks. I do use a custom launcher (Nova), though.

    1. App store / APKs

    F-droid.

    1. Photo backup / gallery

    I manually copy photos from the phone to my PC, and it gets backed up with the rest of the stuff. I do my backups with restic, and save a copy on my own server, and another at BorgBase. I’ll have a third copy at a third place later.

    1. Weather

    wttr.in, mostly.

    1. Smart assistant (if any)

    My wife. <3

    1. Anything else you’ve replaced?

    Not strictly de-googling, but I’m using Codeberg & my own self-hosted Forgejo instead of GitHub. I replaced LibreWolf’s bookmark manager with Readeck. For push notifications on Android, I’m using a self-hosted nfty.sh.

    Would love to hear about your setup — both what works well and any trade-offs you’ve had to make. Always looking for better FOSS or privacy-friendly alternatives

    Oh dear. Strap in, for you’re in for a Journey! The entire configuration of both my desktop and the rest of my fleet (my VPS, my homelab server, and my Mom’s miniPC at the moment) are all free software. Based on NixOS, declarative configuration written in a literate programming manner using Org mode. There is a lot of documentation.




  • While I am not a fan of Nix the language, it is no more insane than ansible or kubernetes yaml soups.

    As for packages… nixpkgs is by far the largest repo of packaged software. There are very few things I haven’t found there - and they are usually not in any other distro either.


  • I switched to NixOS because I wanted a declarative system that isnt’t yaml soup bolted onto a genetic distro.

    By 2022, my desktop system was an unmanagable mess. It was a direct descendant of the Debian I installed in 1997. Migrated piece by piece, even switched architectures (multiple times! I386->ppc-i386->amd64), but its roots remained firmly in 1997. It was an unsalvagable mess.

    My server, although much younger, also showed signs of accumulating junk, even though it was ansible-managed.

    I tried documenting my systems, but it was a pain to maintain. With NixOS, due to it being declarative, I was able to write my configuration in a literate programming style. That helps immensely in keeping my system sane. It also makes debugging easy.

    On top of that, with stuff like Impermanence, my backups are super simple: btrfs snapshot of /persist, exclude a few things, ship it to backup. Done. And my systems always have a freshly installed feel! Because they are! Every boot, they’re pretty much rebuilt from the booted config + persisted data.

    In short, declarative NixOS + literate style config gave me superpowers.

    Oh, and nixos’s packaging story is much more convenient than Debian’s (and I say that as an ex-DD, who used to be intimately familiar with debian packaging).