• WalnutLum@lemmy.ml
    link
    fedilink
    English
    arrow-up
    21
    ·
    18 hours ago

    I think most ML experts (that weren’t being paid out the wazoo for saying otherwise) have been saying we’re on the tail end of the LLM technology sigma curve. (Basically treating an LLM as a stochastic index, the actual measure of training algorithm quality is query accuracy per training datum)

    Even with deepseek’s methodology, you see smaller and smaller returns on training input.

    • MDCCCLV@lemmy.ca
      link
      fedilink
      English
      arrow-up
      10
      ·
      12 hours ago

      At this point, it is useful for doing some specific things so the way to make it great is making it cheap and accessible. Being able to run it locally would be way more useful.

      • dustyData@lemmy.world
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 hours ago

        Sure, but then what would they do with their billions of dollars data center plugged into a nuclear power plant?

        • WhatAmLemmy@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          ·
          edit-2
          2 hours ago

          Can we skip the dog and pony show, and get straight to paying the orphan crushing machine directly?

      • makyo@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 hours ago

        100% this. Wouldn’t it be something if they weren’t overtly running their companies to replace all of us? If feel like focusing instead on creating great personal assistants that make our lives easier in various ways would get a lot of support from the public.

        And don’t get me wrong, these LLMs are great at helping people already but that’s definitely not the obvious end goal of OpenAI or any of the others.

      • ugjka@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        5 hours ago

        Yeah it is useful, but it is not an industry worth trillion of dollars in valuation. The only use cases LLMs have is to make shitty summarizations of text, use it as shitty google search alternative or to write shitty code

  • simple@lemm.ee
    link
    fedilink
    English
    arrow-up
    70
    ·
    22 hours ago

    With this, OpenAI is officially starting to crack. They’ve been promising a lot and not delivering, the only reason they would push out GPT4.5 even though it’s worse and more expensive than the competition is because the investors are starting to get mad.

    • Balder@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      ·
      edit-2
      13 hours ago

      Who wouldn’t be mad considering the amount of money OpenAI is burning. They’re already taking a huge risk and I believe mostly out of ideology, believing this time it’ll be the singularity simply because ChatGPT has this ability to fool humans into thinking there’s some humanity there.

  • Grandwolf319@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    26
    ·
    edit-2
    20 hours ago

    Is it because they used data from after chat GPT was released?

    Edit:

    marginally better performance than GPT-4oat 30x the cost for input and 15x the cost for output.

    Ahh, good old fashion law of diminishing returns.

  • Catoblepas@lemmy.blahaj.zone
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    1
    ·
    21 hours ago

    I’m sure turning on a few more nuclear plants to power shoveling an ever larger body of AI slop-contaminated text into the world’s most expensive plagiarism machine will fix it!

  • humanspiral@lemmy.ca
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    18 hours ago

    Not an expert in the field, but OP seems to be using relevant metrics to criticize model cost/performance.

    One reason to dislike OpenAI is its “national security ties”. It can probably get the “wrong customers” paying whatever expense it is.

    • thatsnothowyoudoit@lemmy.ca
      link
      fedilink
      English
      arrow-up
      10
      ·
      edit-2
      18 hours ago

      I think that depends on what you’re doing. I find Claude miles ahead of the pack in practical, but fairly nuanced coding issues - particularly in use as a paired programmer with Strongly Typed FP patterns.

      It’s almost as if it’s better in real-world situations than artificial benchmarks.

      And their new CLI client is pretty decent - it seems to really take advantage of the hybrid CoT/standard auto-switching model advantage Claude now has with this week’s update.

      I don’t use it often anymore but when I reach for a model first for coding - it’s Claude. It’s the most likely to be able to grasp the core architectural patterns in a codebase (like a consistent monadic structure for error handling or consistently well-defined architectural layers).

      I just recently cancelled my one month trial of Gemini - it was pretty useless; easy to get stuck in a dumb loop even with project files as context.

      And GPT-4/o1/o3 seems to really suck at being prescriptive - often providing walls of multiple solutions that all somehow narrowly miss the plot - even with tons of context.

      That said Claude sucks - SUCKS - at statistics - being completely unreliable where GPT-4 is often pretty good and provides code (Python) for verification.