I get the agenda of the study and I also agree with it, but the study itself, is really low effort.
Obviously, an experienced developer working on a highly specialized project, where the software developer already have all the needed context, and have no experience with using AI, will beat a clueless AI.
How would the results look like, if the software developer had experience with AI, and were to start on a new project, without any existing context? A lot different, i would imagine. AI is also not only for code generation. After a year of working as a software developer, I could no longer gain much experience from my senior colleagues (says much more about them, than me or AI) and I kinda was forced to look for sparring elsewhere. I feel like I have been speed running my experience and career, by using AI. I have never used code generation that much, but instead I’ve used it to learn about things i don’t know i don’t know about. That have been an accelerator.
Today, I’m using code generation much more; when starting a new project, or when i need to prototype something, complete mundane tasks on existing projects, make some none-critical python scripts, get useful bash scripts, spin up internal UI projects, etc…
Sometimes, i naturally waste time, as it takes time for an AI to produce code, and then it takes time to review the code, but in general I feel my productivity have gained by using AI.
Yeah, I think it’s weird how people need to think in such a binary manner. AI sucks in almost every way and it can also save you time as a quick auto complete in an IDE. You’d have to be an idiot to have it write big blocks of code you don’t understand. That’s on you if you do it. If you want to use it to improve productivity, you should just let it write a few lines here and there which otherwise costs you several seconds if you didn’t. When it comes to refactoring, I’ve found GitHub copilot helps a lot because what I’m doing is changing from one common pattern to another, probably even more common pattern. It’s predictable, so it usually gets it fairly right.
If it were really artificially intelligent you could just describe a program and in seconds get a nearly bug-free, productiom-ready app. That’s a LONG way off, if it ever happens. People treating LLMs like they are actually AI is the issue. Stop misusing the tool.
It’s a study done by METR, they constantly pump out papers talking about the “significant catastrophic risks via AI self improvement or rogue replication”
The fact that they’re publishing something negative is what’s interesting here. Except that they’re reporting on a 6 month old study lol
But Rush and Becker have shied away from making sweeping claims about what the results of their study mean for the future of AI. For one, the study’s sample was small and non-generalizable, including only a specialized group of people to whom these AI tools were brand new.
I’m not sure focusing on one aspect to scope a reasonable and doable study automatically makes it “really low effort”.
You are right, but I believe they should at least have chosen another use case, to make it interesting. I wouldn’t have needed a study to know that an AI performs worse than a developer in a project the developer most likely built them self. The existing project might have some really weird code smells and work arounds that only the developer on the project knows about and understand. There might be relevant context external to the solution. The AI have to be a mind reader in these cases.
But, if you gave the AI and the developer a blank canvas a clear defined task, I just believe it would be a more interesting study. *
It kind of sounds like they were just handed a tool they knew nothing about and were asked to perform better with it. A mitter saw is way better and faster than a regular saw, if you know how to use it.
*edit
To make my point more clear, I don’t mean the developer needed to solve an issue that’s not related to his daily work, but a task that’s not dependent on years of tech debt or context that is not provided to the AI. And yes, by that, I don’t believe code generation from an AI have a big use case in scenarios where the project have too many dependencies and touches on niche solutions, but you can still use it for other purposes than building features.
I get the agenda of the study and I also agree with it, but the study itself, is really low effort.
Obviously, an experienced developer working on a highly specialized project, where the software developer already have all the needed context, and have no experience with using AI, will beat a clueless AI.
How would the results look like, if the software developer had experience with AI, and were to start on a new project, without any existing context? A lot different, i would imagine. AI is also not only for code generation. After a year of working as a software developer, I could no longer gain much experience from my senior colleagues (says much more about them, than me or AI) and I kinda was forced to look for sparring elsewhere. I feel like I have been speed running my experience and career, by using AI. I have never used code generation that much, but instead I’ve used it to learn about things i don’t know i don’t know about. That have been an accelerator.
Today, I’m using code generation much more; when starting a new project, or when i need to prototype something, complete mundane tasks on existing projects, make some none-critical python scripts, get useful bash scripts, spin up internal UI projects, etc…
Sometimes, i naturally waste time, as it takes time for an AI to produce code, and then it takes time to review the code, but in general I feel my productivity have gained by using AI.
Yeah, I think it’s weird how people need to think in such a binary manner. AI sucks in almost every way and it can also save you time as a quick auto complete in an IDE. You’d have to be an idiot to have it write big blocks of code you don’t understand. That’s on you if you do it. If you want to use it to improve productivity, you should just let it write a few lines here and there which otherwise costs you several seconds if you didn’t. When it comes to refactoring, I’ve found GitHub copilot helps a lot because what I’m doing is changing from one common pattern to another, probably even more common pattern. It’s predictable, so it usually gets it fairly right.
If it were really artificially intelligent you could just describe a program and in seconds get a nearly bug-free, productiom-ready app. That’s a LONG way off, if it ever happens. People treating LLMs like they are actually AI is the issue. Stop misusing the tool.
Use judgement, people.
It’s a study done by METR, they constantly pump out papers talking about the “significant catastrophic risks via AI self improvement or rogue replication”
The fact that they’re publishing something negative is what’s interesting here. Except that they’re reporting on a 6 month old study lol
Did these developers not have experience with AI?
I’m not sure focusing on one aspect to scope a reasonable and doable study automatically makes it “really low effort”.
If they were to test a range of project types, it’d have to be a much bigger study.
This is from the article
You are right, but I believe they should at least have chosen another use case, to make it interesting. I wouldn’t have needed a study to know that an AI performs worse than a developer in a project the developer most likely built them self. The existing project might have some really weird code smells and work arounds that only the developer on the project knows about and understand. There might be relevant context external to the solution. The AI have to be a mind reader in these cases.
But, if you gave the AI and the developer a blank canvas a clear defined task, I just believe it would be a more interesting study. *
It kind of sounds like they were just handed a tool they knew nothing about and were asked to perform better with it. A mitter saw is way better and faster than a regular saw, if you know how to use it.
*edit
To make my point more clear, I don’t mean the developer needed to solve an issue that’s not related to his daily work, but a task that’s not dependent on years of tech debt or context that is not provided to the AI. And yes, by that, I don’t believe code generation from an AI have a big use case in scenarios where the project have too many dependencies and touches on niche solutions, but you can still use it for other purposes than building features.