Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    12 days ago

    Switch to a non-open protocol or walled garden, preferably controlled by a large and litigious organization that guards its content jealously. They’ll probably still sell access to their data to LLM trainers but not necessarily Facebook.

    Reddit, for example, may fit the bill. IIRC they sell their data to OpenAI for training, so there might be exclusivity deals intended to keep Facebook out.

    • TachyonTele@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      11 days ago

      I was thinking more what could instinces themselves do. Is it something that can be mitigated, like with bot accounts.

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        11 days ago

        I don’t see any way to “mitigate” this while still using the ActivityPub protocol. This isn’t about a bot posting on the Fediverse, it’s about reading the Fediverse. If you want to prevent that then you’re probably talking about some form of DRM or a walled garden.