• Hackworth@piefed.ca
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    12 hours ago

    The Firefly image generator is a diffusion model, and the Firefly video generator is a diffusion transformer. LLMs aren’t involved in either process - rather the models learn image-text relationships from meta tags. I believe there are some ChatGPT integrations with Reader and Acrobat, but that’s unrelated to Firefly.

    • utopiah@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 hours ago

      Surprising, I would expect it’d rely at some point on something like CLIP in order to be prompted.