

I… have my doubts. I do not doubt that a wider variety of poisoned data can improve training, by implementing new ways to filter out unusable training data. In itself, this would, indeed, improve the model.
But in many cases, the point of poisoning is not to poison the data, but to deny the crawlers access to the real work (and provide an opportunity to poison their URL queue, which is something I can demonstrate as working). If poison is served instead of the real content, that will hurt the model, because even if it filters out the junk, it will have access to less new data to train on.





Most often, yes. But there are exceptions. A lot of Ubuntu developers were on Debian in the early days for example. I imagine some still are - there’s a bit of overlap between Debian & Ubuntu developers here and there.
I maintained packages in various BSDs pkgsrc/ports tree, even though I never daily drove any of them, and had them in a virtual machine at best.