Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

AbuTahir@lemm.ee · edit-2 1 day ago

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Knock_Knock_Lemmy_In@lemmy.world · 5 hours ago

do we know that they don’t and are incapable of reasoning.

“even when we provide the algorithm in the prompt—so that the model only needs to execute the prescribed steps—performance does not improve”

Communist@lemmy.frozeninferno.xyz · edit-2 3 hours ago

That indicates that this particular model does not follow instructions, not that it is architecturally fundamentally incapable.

Knock_Knock_Lemmy_In@lemmy.world · 2 hours ago

Not “This particular model”. Frontier LRMs s OpenAI’s o1/o3,DeepSeek-R, Claude 3.7 Sonnet Thinking, and Gemini Thinking.

The paper shows that Large Reasoning Models as defined today cannot interpret instructions. Their architecture does not allow it.

Communist@lemmy.frozeninferno.xyz · edit-2 46 minutes ago

those particular models. It does not prove the architecture doesn’t allow it at all. It’s still possible that this is solvable with a different training technique, and none of those are using the right one. that’s what they need to prove wrong.

this proves the issue is widespread, not fundamental.

0ops@lemm.ee · 11 minutes ago

Is “model” not defined as architecture+weights? Those models certainly don’t share the same architecture. I might just be confused about your point though

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

Apple just proved AI "reasoning" models like Claude, DeepSeek-R1, and o3-mini don't actually reason at all. They just memorize patterns really well.

archive.is