People have been betting on independent reasoning as an emergent property of AI without much success so far. So it was exciting when OpenAI said their AI had scored at a Gold Medal level at the International Mathematical Olympiad (IMO), a test of Math reasoning among the best of high school math students.

However, Australian mathematician Terence Tao says it may not be as impressive as it seems. In short, the test conditions were potentially far easier for the AI than the humans, and the AI was given way more time and resources to achieve the same results. On top of which, we don’t know how many wrong results there were before OpenAI selected the best. Something else that doesn’t happen with the human test.

There’s another problem, too. Unlike with humans, AI being good at Math is not a good indicator for general reasoning skills. It’s easy to copy techniques from the corpus of human knowledge it’s been trained on, which gives the semblance of understanding. AI still doesn’t seem good at transferring that reasoning to novel, unrelated problems.

  • hendrik@palaver.p3x.de
    link
    fedilink
    English
    arrow-up
    15
    ·
    3 days ago

    These claims often come with limited significance. ChatGPT also supposedly passed the bar exam in early 2023 and all kinds of benchmarks. Yet it still messed up my fairly simple emails, couldn’t do other tasks back then and to this day it often even fails summarizing random news articles and doesn’t get the gist of it. These claims make good headlines, but that’s pretty much it. And it’s really hard to come up with meaningful benchmarks.

    • Tar_Alcaran@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      6
      ·
      3 days ago

      Yeah, when I read that, I decided to ask it a few bar exam test questions that you can find online. It scored worse than my guessing and I’m nooooot a lawyer.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      It all reminds me of how everyone made a huge deal the first time a human beat a horse in an endurance race…

      Which, used to be a normal thing. People would run races and for an advertisement it was “try to beat a horse”.

      People noticed what conditions led to it being closer, and the closer it was the more attention your event got

      So people kept fine-tuning those conditions until they came up with the exact right scenario a human could outrun a horse.

      But a horse is still faster 99.99999% of the time.

      https://en.m.wikipedia.org/wiki/Man_versus_Horse_Marathon

      So yeah, if I had billions of dollars in VC capital on the line, I could design a test that shows AI just broke an important milestone and there’s no better time to invest in AI…

      But that doesn’t mean it is, just that I have a financial incentive to convince you that’s true.

      VC are like the guys who think every stripper loves them cuz they got a fake phone number one time 20 years ago.

    • phdepressed@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      ·
      2 days ago

      More likely believes in the con himself. All these rich people are convincing themselves that it just needs “a bit more” when the reality is that an LLM is closer to t9 text prediction than it is gAI from sci-fi. The gAI they want needs a different paradigm of computing few if any AI researchers actually believe that increasing the size of datasets/training will result in a gAI.

      A similar thing occured in genetics, we had the whole thing about sequencing the whole human genome and then about sequencing populations/the world. And while these sequences and analyses have been very helpful the data can not given a full understanding of genetics regardless of how many people you sequence. Updated analysis algorithms can do better than early stuff but they still won’t understand the human genome.