

I can’t see how a handful of people can poison the LLM this way.
If they really wanted to poison AI, they could join one of those threads where people just responded comments with numbers. Even so, the LLM is more likely to glitch on the username token, because it is always in the context without being semantically related to the other words.
Interesting stuff: https://www.lesswrong.com/posts/aPeJE8bSo6rAFoLqg/solidgoldmagikarp-plus-prompgener







Gross