LLMS Are Not Fun

codeinabox@programming.dev · 2 days ago

LLMS Are Not Fun

voodooattack@lemmy.world · 16 hours ago

This person is right. But I think the methods we use to train them are what’s fundamentally wrong. Brute-force learning? Randomised datasets past the coherence/comprehension threshold? And the rationale is that this is done for the sake of optimisation and the name of efficiency? I can see that overfitting is a problem, but did anyone look hard enough at this problem? Or did someone just jump a fence at the time and then everyone decided to follow along and roll with it because it “worked” and it somehow became the golden standard that nobody can question at this point?

VoterFrog@lemmy.world · 15 hours ago

The generalized learning is usually just the first step. Coding LLMs typically go through more rounds of specialized learning afterwards in order to tune and focus it towards solving those types of problems. Then there’s RAG, MCP, and simulated reasoning which are technically not training methods but do further improve the relevance of the outputs. There’s a lot of ongoing work in this space still. We haven’t seen the standard even settle yet.

voodooattack@lemmy.world · 7 hours ago

Yeah, but what I meant was: we took a wrong turn along the way, but now that it’s set in stone, sunk cost fallacy took over. We (as senior developers) are applying knowledge and approaches obtained through a trap we would absolutely caution and warn a junior against until the lesson sticks, because it IS a big deal.

Reminds me of this gem:

https://www.monkeyuser.com/2018/final-patch/

bitcrafter@programming.dev · 15 hours ago

The researchers in the academic field of machine learning who came up with LLMs are certainly aware of their limitations and are exploring other possibilities, but unfortunately what happened in industry is that people noticed that one particular approach was good enough to look impressive and then everyone jumped on that bandwagon.

voodooattack@lemmy.world · edit-2 7 hours ago

That’s not the problem though. Because if I apply my perspective I see this:

Someone took a shortcut because of an external time-crunch, left a comment about how this is a bad idea and how we should reimplement this properly later.

But the code worked and was deployed in a production environment despite the warning, and at that specific point it transformed from being “abstract procedural logic” to being “business logic”.