• PierceTheBubble@lemmy.ml
    link
    fedilink
    English
    arrow-up
    8
    ·
    edit-2
    11 hours ago

    So the amend alleges, Nvidia having used/stored/copied/obtained/distributed copyrighted works (including plaintiffs’), both through databases available on Hugging Face (‘Books3’ featured in both ‘The Pile’ and ‘SlimPajama’), or pirating from shadow libraries (like Anna’s Archive), to train multiple LLMs (primarily their ‘NeMo Megatron’ series), and distributing the copyrighted data through the ‘NeMo Megatron Framework’; data which was ultimately sourced from shadow libraries.

    It’s quite an interesting read actually, especially the link to this Anna’s Archive blog post. Which it grossly pulls out of context, as plaintiffs clearly despise the shadow libraries too: as they have ultimately provided access to their copyrighted material.

    Especially the part: “Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.” makes me wonder if that’s the reason why models like Deepseek, initially blew Western models out of the water.

    • Knock_Knock_Lemmy_In@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      5 hours ago

      You can ask deepseek detailed questions about Harry Potter books and it responds intelligently with (almost) quotes from the book.

      Ask chatGPT and it will respond to questions but denys it has read any book.

      • Corkyskog@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 hour ago

        Interesting, I was using Deepseek for book recommendations and it was exceptionally good at recommending books that are similar to one I just read compared to other models.