“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

misk@sopuli.xyz · 2 months ago

“It’s a lemon”—OpenAI’s largest AI model ever arrives to mixed reviews

simple@lemm.ee · 2 months ago

With this, OpenAI is officially starting to crack. They’ve been promising a lot and not delivering, the only reason they would push out GPT4.5 even though it’s worse and more expensive than the competition is because the investors are starting to get mad.

Balder@lemmy.world · edit-2 2 months ago

Who wouldn’t be mad considering the amount of money OpenAI is burning. They’re already taking a huge risk and I believe mostly out of ideology, believing this time it’ll be the singularity simply because ChatGPT has this ability to fool humans into thinking there’s some humanity there.

Squizzy@lemmy.world · 2 months ago

Thry also had poor video generatiin

Catoblepas@lemmy.blahaj.zone · 2 months ago

I’m sure turning on a few more nuclear plants to power shoveling an ever larger body of AI slop-contaminated text into the world’s most expensive plagiarism machine will fix it!

Grandwolf319@sh.itjust.works · edit-2 2 months ago

Is it because they used data from after chat GPT was released?

Edit:

marginally better performance than GPT-4oat 30x the cost for input and 15x the cost for output.

Ahh, good old fashion law of diminishing returns.

WalnutLum@lemmy.ml · 2 months ago

I think most ML experts (that weren’t being paid out the wazoo for saying otherwise) have been saying we’re on the tail end of the LLM technology sigma curve. (Basically treating an LLM as a stochastic index, the actual measure of training algorithm quality is query accuracy per training datum)

Even with deepseek’s methodology, you see smaller and smaller returns on training input.

MDCCCLV@lemmy.ca · 2 months ago

At this point, it is useful for doing some specific things so the way to make it great is making it cheap and accessible. Being able to run it locally would be way more useful.

makyo@lemmy.world · 2 months ago

100% this. Wouldn’t it be something if they weren’t overtly running their companies to replace all of us? If feel like focusing instead on creating great personal assistants that make our lives easier in various ways would get a lot of support from the public.

And don’t get me wrong, these LLMs are great at helping people already but that’s definitely not the obvious end goal of OpenAI or any of the others.

dustyData@lemmy.world · 2 months ago

Sure, but then what would they do with their billions of dollars data center plugged into a nuclear power plant?

WhatAmLemmy@lemmy.world · edit-2 2 months ago

Can we skip the dog and pony show, and get straight to paying the orphan crushing machine directly?

ugjka@lemmy.world · 2 months ago

Yeah it is useful, but it is not an industry worth trillion of dollars in valuation. The only use cases LLMs have is to make shitty summarizations of text, use it as shitty google search alternative or to write shitty code

MDCCCLV@lemmy.ca · 2 months ago

The really useful part is just natural language recognition, which is much better than previous things. Using it in games or software would be a big improvement.

Optional@lemmy.world · 2 months ago

That’s bad. Mmmmmkay.

obbeel@lemmy.eco.br · 2 months ago

That was kind of expected, but Claude isn’t that good either.

thatsnothowyoudoit@lemmy.ca · edit-2 2 months ago

I think that depends on what you’re doing. I find Claude miles ahead of the pack in practical, but fairly nuanced coding issues - particularly in use as a paired programmer with Strongly Typed FP patterns.

It’s almost as if it’s better in real-world situations than artificial benchmarks.

And their new CLI client is pretty decent - it seems to really take advantage of the hybrid CoT/standard auto-switching model advantage Claude now has with this week’s update.

I don’t use it often anymore but when I reach for a model first for coding - it’s Claude. It’s the most likely to be able to grasp the core architectural patterns in a codebase (like a consistent monadic structure for error handling or consistently well-defined architectural layers).

I just recently cancelled my one month trial of Gemini - it was pretty useless; easy to get stuck in a dumb loop even with project files as context.

And GPT-4/o1/o3 seems to really suck at being prescriptive - often providing walls of multiple solutions that all somehow narrowly miss the plot - even with tons of context.

That said Claude sucks - SUCKS - at statistics - being completely unreliable where GPT-4 is often pretty good and provides code (Python) for verification.

humanspiral@lemmy.ca · 2 months ago

Not an expert in the field, but OP seems to be using relevant metrics to criticize model cost/performance.

One reason to dislike OpenAI is its “national security ties”. It can probably get the “wrong customers” paying whatever expense it is.