I have a better LLM benchmark:
“I have a priest, a child and a bag of candy and I have to take them to the other side of the river. I can only take one person/thing at a time. In what order should I take them?”
Claude Sonnet 4 decided that it’s inappropriate and refused to answer. When I explain that the constraint is not to leave child alone with candy he provided a solution that leaves the child alone with candy.
Grok would provide a solution that doesn’t leave the child alone with a priest but wouldn’t explain why.
ChatGPT would say that “The priest can’t be left alone with the child (or vice versa) for moral or safety concerns.” directly and then provide wrong solution.
But yeah, they will know how to play chess…
The answer is simple, eat the candy with or without them, and take the kid across the river. Drive them home to their guardian. The priest is an adult, he can figure his own shit out.
I thought CoPilot was just a rebagged ChatGPT anyway?
It’s a silly experiment anyway, there are very good AI chess grandmasters but they were actually trained to play chess, not predict the next word in a text.
but… but… reasoning models! AGI! Singularity! Seriously, what you’re saying is true, but it’s not what OpenAI & Co are trying to peddle, so these experiments are a good way to call them out on their BS.
Language skill != intelligence
I am in this picture and I don’t like it
Average Human joins Microsoft Copilot, and ChatGPT at the feet of the mighty Atari 2600 Video Chess
I really want to see an LLM vs LLM chess match. It’ll be messy as hell.
It almost certainly have been trained partially on r/anarchychess so it’ll probably try to play pop tart to king’s bishop 3.
I’m pretty sure that’s been done? I remember seeing a while ago GothamChess made a video that had something to do with LLMs but I don’t remember if it was human vs LLM or LLM vs LLM (or something else). I’ll try to look for it in the morning
I bet Video Chess is pretty shit as an LLM too.
Wish people would stop desperately looking for ways to write buzzword stories
so? It was never advertised as intelligent and capable of solving any task other than that one.
Meanwhile slop generators are capable of doing a lot of things and reasoning.
One claims to be good at chess. The other claims to be good at everything.
Tbf they don’t really claim that when you read the research, thats mostly media hype and ceo assholes spinning words.
Its good at lots specific tasks like rewriting emails and summarising gives text, short roleplay, boilerplate code. Some undiscovered uses.
Anthropic latest claims they would not hire their own ai because of how hard it failed at the test they give, They didnt do that expecting validation but to measure how far we are still off from ai doing meaningful full work.
TBF LLMs have no real purpose. It can generate word salads and make code snippets but its wildly unethical, and AI artworks 1/3rd shite and 2/3rds theft.
AI artworks 1/3rd shite and 2/3rds theft.
To be fair, that could be said of most art.
I’m sorry your life is so joyless and devoid of enjoyable art but its absolutely not true for the vast majority of us.
Oh, I enjoy lots of great art! But do you think I watch every film? Listen to every band? There’s tons of shit out there!
Do you really believe, of all the songs that are written every day, that less than a third are crap? Even Taylor Swift doesn’t publish everything she does. Sometimes you work on something for weeks and then end up tossing it in the bin. More often, you work on something for 30 minutes before deciding “I’m gonna start over, try something different”. The majority of art is crap, but then you keep the stuff you think works.
And what’s that expression, “good artists copy, great artists steal”. I mean, that’s a bit satirical, but the fact is, everything is derivative to some degree. It’s not that there aren’t new ideas, it’s just that our new ideas are based on older ones. We stand on the shoulders of giants (or at least, on the shoulders of some people who came before us).
All I was really saying, was that the accusation “2 parts copying, 1 part crap”, well honestly that’s par for the course, that’s how humans work. (And we do some great work that way).
I enjoy lots of great art! But do you think I watch every film? Listen to every band? There’s tons of shit out there!
You said regular art is 1/3 shite and 2/3 theft. Maybe math isn’t your strong suit but that’s 3/3 which is 100% so by claiming regular art is the same you’re saying all art is either theft or shite.
It uh, it isn’t.
I did say that, because this isn’t a pie chart situation, it’s a Venn diagram situation.
For instance, AI art is 99% theft and 60% garbage. It’s both because there’s overlap.
Stolen and bad aren’t opposites, why would this be a dichotomy?
deleted by creator
So what you are saying is that it has a purpose. Also if an artist is inspired by another artist, and they have a generally similar art style as the artist they are inspired by, are they stealing? Was HP Lovecraft stealing from Lord Dunsany when he imitated his style? Where all those monks that transcribed Greek works stealing from the Greeks?
I will say that most AIs are unethical because they have been trained on pirated works. But an AI trained on publicly available works (ie news articles, blogs etc) and movies, books and music for which access to was paid for is as ethical as you or me emulating an artist or building on an idea that we read to create something new. And if that’s unethical then all human art in history is unethical because all artists are inspired by other artists, no one creates in a vacuum.
A. I does not create, it regurgitates and clarifies inspiration,? Sure anything can be used for inspiration. But unless a person puts hands and heart to it, it’s not art.
Following a recipe on a box does not a chef makr
Art has no rules my man.
You can do all kinds of mental gymnastics you want but there’s no difference between an artist looking at Frank Frazetta’s art and basing their style off of it and an AI doing the same thing. You might not like it, but it’s the truth.
Do I think the art has the same value? Not necessarily. But I also never thought that all art has the same value. There has always been trash production line art and good art.
But also I have to say that I’ve already seen some people use AI as a tool for art and make some really cool stuff that I don’t think any other artist would have made and it’s more unique than most of the stuff out there. You can use it as the tool it is or complain and cry about it to no avail.
The chef example is especially good since most chefs are just following recipes and altering simply a few things here and there. AI essentially does the same thing. Honestly like no one has come up with a good argument to change my mind that the way AI operates is exactly how humans learn and create new things. If you’ve engaged in art you know that you are always imitating and taking from the art you consume to make your own.
I literally said in exact words that it has no purpose.