Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer

codeinabox@programming.dev · 1 day ago

Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer

mayorchid@lemmy.world · 8 hours ago

Someone on Mastodon was saying that whether you consider AI coding an advantage completely depends on whether you think of prompting the AI and verifying its output as “work.” If that’s work to you, the AI offers no benefit. If it’s not, then you may think you’ve freed up a bunch of time and energy.

The problem for me, then, is that I enjoy writing code. I do not enjoy telling other people what to do or reviewing their code. So AI is a valueless proposition to me because I like my job and am good at it.

fodor@lemmy.zip · 17 hours ago

No. Experienced devs knew it would make tasks take longer, because we have common sense and technical knowledge.

I don’t blame randos for buying into the hype; what do they know? But by now we’re seeing that they have caught on to the scam.

xep@discuss.online · 17 hours ago

I assumed nothing, and evaluated it like I would any other tool. It’s ok for throwaway scripts but if the script does anything non-trivial that could affect anything external the time spent making sure nothing goes awfully wrong is at least as much as the time saved generating the script, at least in my domain.

ExLisper@lemmy.curiana.net · 9 hours ago

I got an email couple of weeks ago with invitation to some paid study about AI. They were looking for programmers that would solve some tasks with and with AI help. I didn’t have time or felt like participating but if I did I would 100% work slower on task with AI just to help derail the pro-AI narrative. It’s not in my interest to help promote it. Just saying…

zaphod@sopuli.xyz · 1 day ago

Writing code with an AI as an experienced software developer is like writing code by instructing a junior developer.

clif@lemmy.world · 1 day ago

Without the payoff of the next generation of developers learning.

Management: “Treat it like a junior dev”

… So where are we going to get senior devs if we’re not training juniors?

BradleyUffner@lemmy.world · edit-2 1 day ago

… That keeps making the same mistakes over and over again because it never actually learns from what you try to teach it.

zaphod@sopuli.xyz · 1 day ago

Yep, the junior is capable of learning.

InternetCitizen2@lemmy.world · 1 day ago

Wait till I get hired as junior

Clent@lemmy.dbzer0.com · 9 hours ago

Yeah, not all people who enter the industry should be doing so.

Most of this was boomers being boomers and claiming anyone and everyone should code.

aport@programming.dev · 1 day ago

My job believes the solution to this is a 7,000 line agents.md file

Lucy :3@feddit.org · 1 day ago

Sometimes. And if they’re not, they’ll be replaced or replace themselves.

VoterFrog@lemmy.world · 1 day ago

This is not really true.

The way you teach an LLM, outside of training your own, is with rules files and MCP tools. Record your architectural constraints, favored dependencies, and style guide information in your rule files and the output you get is going to be vastly improved. Give the agent access to more information with MCP tools and it will make more informed decisions. Update them whenever you run into issues and the vast majority of your repeated problems will be resolved.

UnspecificGravity@piefed.social · edit-2 1 day ago

Well, that’s what they say, but then it doesn’t actually work, and even if it did it’s not any easier or cheaper than teaching humans to do it.

More to the point, that is exactly what the people in this study were doing.

Clent@lemmy.dbzer0.com · 9 hours ago

If it’s doesn’t work for you, it’s because you’re a failure!

Still not convinced these LLM bros aren’t junior developers (at best) who someone gave a senior title to because everyone else left their shit hole company.

VoterFrog@lemmy.world · edit-2 1 day ago

More to the point, that is exactly what the people in this study were doing.

They don’t really do into a lot of detail about what they were doing. But they have a table on limitations of the study that would indicate it is not.

We do not provide evidence that: There are not ways of using existing AI systems more effectively to achieve positive speedup in our exact setting. Cursor does not sample many tokens from LLMs, it may not use optimal prompting/scaffolding, and domain/repository-specific training/finetuning/few-shot learning could yield positive speedup.

Back to this:

even if it did it’s not any easier or cheaper than teaching humans to do it.

In my experience, the kinds of information that an AI needs to do its job effectively has a significant overlap with the info humans need when just starting on a project. The biggest problem for onboarding is typically poor or outdated internal documentation. Fix that for your humans and you have it for your LLMs at no extra cost. Use an LLM to convert your docs into rules files and to keep them up to date.

UnspecificGravity@piefed.social · 1 day ago

Your argument depends entirely on the assumption that you know more about using AI to support coding than the experienced devs that participated in this study. You want to support that claim with more than a “trust me, bro”?

VoterFrog@lemmy.world · 1 day ago

Do you think that like nobody has access to AI or something? These guys are the ultimate authorities on AI usage? I won’t claim to be but I am a 15 YOE dev working with AI right now and I’ve found the quality is a lot better with better rules and context.

And, ultimately, I don’t really care if you believe me or not. I’m not here to sell you anything. Don’t use it the tools, doesn’t matter to me. Anybody else who does use them, give my advice a try an see if it helps you.

UnspecificGravity@piefed.social · 1 day ago

These guys all said the same thing before they participated in a study that proved that they were less efficient than their peers.

criss_cross@lemmy.world · 17 hours ago

In theory yes.

In practice I find the more stuff like this you throw at it the more rope it has to hang itself with. And you spend so much time prompt adjusting so it doesn’t do the wrong things that you were better off just doing half of the tasks yourself.

aport@programming.dev · 1 day ago

Codex literally lies about being connected to configured MCP servers.

VoterFrog@lemmy.world · 1 day ago

Are you trying to make a point that agents can’t use MCP based off of a picture of a tweet you saw or something?

aport@programming.dev · 22 hours ago

I’m talking from my personal, daily experience using codex.

raspberriesareyummy@lemmy.world · 1 day ago

That is a moronic take. You would be better off learning to structure your approach to SW development than trying to learn how to use a glorified slop machine to plagiarize other people’s works.

plantfanatic@sh.itjust.works · 1 day ago

This is why you use a downloaded llm and customize it, there’s ways to fix these issues.

BradleyUffner@lemmy.world · edit-2 1 day ago

Unless you are retraining the model locally at your 23 acre data center in your garage after every interaction, it’s still not learning anything. You are just dumping more data in to its temporary context.

SchmidtGenetics@lemmy.world · 1 day ago

Sounds like you have no clue what an LLM/AI actually is or is capable of.

https://medium.com/sciforce/step-by-step-guide-to-your-own-large-language-model-2b3fed6422d0

It’s not hard to keep a data library updated for context, and some are under a TB in siz.

Where are you getting your information from?

TJA!@sh.itjust.works · 1 day ago

It seems you are still confusing context with training? Did you read that text and understand it?

Did you follow it yourself to build an llm?

WolfLink@sh.itjust.works · 1 day ago

I bet they had an LLM read it and summarize it for them

SchmidtGenetics@lemmy.world · edit-2 1 day ago

Why do you think it’s solely a training issue?

TJA!@sh.itjust.works · edit-2 1 day ago

So, you did not? Ok

plantfanatic@sh.itjust.works · edit-2 1 day ago

What part of customize did you not understand?

And lots fit on personal computers dude, do you even know what different llms there are…?

One for programming doesn’t need all the fluff of books and art, so now it’s a manageable size. Llms are customizable to any degree, use your own data library for the context data even!

BradleyUffner@lemmy.world · edit-2 1 day ago

What part about how LLMs actually work do you not understand?

“Customizing” is just dumping more data in to it’s context. You can’t actually change the root behavior of an LLM without rebuilding it’s model.

plantfanatic@sh.itjust.works · edit-2 1 day ago

“Customizing” is just dumping more data in to it’s context.

Yes, which would fix the incorrect coding issues. It’s not an llm issue, it’s too much data. Or remove the context causing that issue. These require a little legwork and knowledge to make useful. Like anything else.

You really don’t know how these work do you?

BradleyUffner@lemmy.world · edit-2 1 day ago

You do understand that the model weights and the context are not the same thing right? They operate completely differently and have different purposes.

Trying to change the model’s behavior using instructions in the context is going to fail. That’s like trying to change how a word processor works by typing in to the document. Sure, you can kind of get the formatting you want if you manhandle the data, but you haven’t changed how the application works.

TJA!@sh.itjust.works · 1 day ago

But

All the fluff from books and art

Is not inside the context, that comes from training. So you know how an llm works?

SchmidtGenetics@lemmy.world · edit-2 1 day ago

If it’s constantly making an error, fix the context data dude. What about it an llm/ai makes you think this isn’t possible…? Lmfao, you just want to bitch about ai, not comprehend how they work.

Eheran@lemmy.world · 1 day ago

This is Lemmy, bitching about AI is the norm.

moomoomoo309@programming.dev · edit-2 1 day ago

Yeah, but LLMs still consistently don’t follow all rules they’re given, they randomly will not follow one or more with no indication they did so, so you can’t really fix these issues consistently, just most of the time.

Edit: to put this a little more clearly after a bit more thought: It’s not even necessarily a problem that it doesn’t always follow rules, it’s more so a problem that when it doesn’t follow the rules, there’s no indication it did so. If it had that, it would actually be fine!

folekaule@lemmy.world · 1 day ago

Very true. I’ve been saying this for years. However, the flip side is you get the best results from AI by treating it as a junior developer as well. When you do, you can in fact have a fleet of virtual junior developer working for you as a senior.

However, and I tell this to the junior I work with: you are responsible for the code you put into production, regardless if you write it yourself or you used AI. You must review what it creates because you’re signing off on it.

That in turn means you may not save as much time as you think, because you have to review everything, and you have to make sure you understand everything.

But understanding will get progressively harder the more code is written by other people or AI. It’s best to try to stay current with the code base as it develops.

Unfortunately this cautious approach does not align with the profit motives of those trying to replace us with AI, so I remain cynical about the future.

AnyOldName3@lemmy.world · 1 day ago

Usually, having to wrangle a junior developer takes a senior more time than doing the junior’s job themselves. The problem grows the more juniors they’re responsible for, so having LLMs stimulate a fleet of junior developers will be a massive time sink and not faster than doing everything themselves. With real juniors, though, this can still be worthwhile, as eventually they’ll learn, and then require much less supervision and become a net positive. LLMs do not learn once they’re deployed, though, so the only way they get better is if a cleverer model is created that can stimulate a mid-level developer, and so far, the diminishing returns of progressively larger and larger models makes it seem pretty likely that something based on LLMs won’t be enough.

folekaule@lemmy.world · edit-2 1 day ago

I’m a senior working with junior developers, guiding them through difficult tasks and delegating work to them. I also use AI for some of the work. Everything you say is correct.

However, that doesn’t stop a) some seniors from spinning up several copies of AI and test them like a group of juniors and b) management from seeing this as a way to cut personnel.

I think denying these facts as a senior is just shooting yourself in the foot. We need to find the most productive ways of using AI or become obsolete.

At the same time we need to ensure that juniors can develop into future seniors. AI is throwing a major wrench in the works of that, but management won’t care.

Basically, the smart thing to do is to identify where AI, seniors, and juniors all fit in. I think the bubble needs to pop before that truly happens, though. Right now there’s too much excitement to cut cost/salaries with the people holding the purse strings. Until AI companies start trying to actually make a profit, that won’t happen.

AnyOldName3@lemmy.world · 1 day ago

If LLMs aren’t going to reach a point where they outperform a junior developer who needs too much micromanaging to be a net gain to productivity, then AI’s not going to be a net gain to productivity, and the only productive way to use it is to fight its adoption, much like the only way to productively use keyboards that had a bunch of the letters missing would be to refuse to use them. It’s not worth worrying about obsolescence until such a time as there’s some evidence that they’re likely to be better, just like how it wasn’t worth worrying about obsolescence yet when neural nets were being worked on in the 80s.

folekaule@lemmy.world · 1 day ago

You’re not wrong, but in my personal experience AI that I’ve used is already at the level of a decent intern, maybe fresh junior level. There’s no reason it can’t improve from there. In fact I get pretty good results by working incrementally to stay within its context window.

I was around for the dotcom bubble and I expect this to go similarly: at first there is a rush to put AI into everything. Then they start realizing they have to actually make money and the frivolous stuff drops by the wayside and the useful stuff remains.

But it doesn’t go away completely. After the dotcom bust, the Internet age was firmly upon us, just with less hype. I expect AI to follow a similar trend. So, we can hope for another AI winter or we can figure out where we fit in. I know which one I’m doing.

AnyOldName3@lemmy.world · 1 day ago

There’s a pretty good reason to think it’s not going to improve much. The size of models and amount of compute and training data required to create them is increasing much faster than their performance is increasing, and they’re already putting serious strain on the world’s ability to build and power computers, and the world’s ability to get human-written text into training sets (hence why so many sites are having to deploy things like Anubis to keep themselves functioning). The levers AI companies have access to are already pulled as far as they can go, and so the slowing of improvement can only increase, and the returns can only diminish faster.

folekaule@lemmy.world · 1 day ago

I can only say I hope you’re right. I don’t like the way things are going, but I need to do what I can to adapt and survive so I choose to not put my hopes on AI failing anytime soon.

By the way, thank you for the thoughtful responses and discussion.

Gamma@beehaw.org · 1 day ago

Apparently some people would love to manage a fleet of virtual junior devs instead of coding themselves, I really don’t see the appeal.

pinball_wizard@lemmy.zip · 1 day ago

I think the appeal is that they already tried to lean to code and failed.

Folks I know who are really excited about vibe coding are the ones who are tired of not having access to a programmer.

In some of their cases, vibe coding is a good enough answer. In other cases, it is not.

Their workplaces get to find out later which cases were which.

Zos_Kia@lemmynsfw.com · 1 day ago

Funny cause my experience is completely the reverse. I’ve seen a ton of medium level developers just use copilot style auto complete without really digging into new workflows, and on the other end really experienced people spinning agents in parallel and getting a lot of shit done.

The “failed tech business people” are super hyped for ten minutes when cursor gives them a static html page for free, but they quickly grow very depressed when the actual work starts. Making sense of a code base is where the rubber meets the road, and agents won’t help if you have zero experience in a software factory.

OldMrFish@lemmy.one · 1 day ago

That’s the funny thing. I definitely fall into the ‘medium level’ dev group (Coding is my job, but I haven’t written a single line of code in my spare time for years), and frankly - I really like Copilot. It’s like the standard code-completion on steroids. No need to spend excessive amounts of time describing the problem and review a massive blob of dubious code, just short-ish snippets of easily reviewed code based on current context.

Everyone seems to argue against AI as if vibe coding is the only option and you have to spend time describing every single task, but I’ve changed literally nothing in my normal workflow and get better and more relevant code completion results.

Obviously having to describe every task in detail taking edge cases into account is going to be a waste of time, but fortunately that’s not the only option.

thingsiplay@beehaw.org · 1 day ago

What a wonderful statement.

myfunnyaccountname@lemmy.zip · 1 day ago

I get what you are saying and agree. But corporations doing give a fuck. As long as they can keep seeing increased profits from it, it’s coming. It’s not about code quality or time or humans. It’s about profits.

UnspecificGravity@piefed.social · 1 day ago

Are they though? They’ve invested like a trillion dollars into this and it doesn’t seem any closer to actually making money.

myfunnyaccountname@lemmy.zip · 1 day ago

True. The AI parents are having issues. We all know OpenAI is hemorrhaging money. I think Anthropic is as well. They are all passing money between each other. But software companies, like the one I work for, don’t care what those companies are doing. As long as my company can use services provided by the AI parents, it’s not an issue if the AI parents themselves are losing money. Or if software companies can shove out their own AI feature (like the AI in ServiceNow or how Office 365 is getting some rebranding), all is well and they can brag about having AI to the shareholders.

UnspecificGravity@piefed.social · 1 day ago

That’ll work right up until the shareholders start hearing “we got AI!” as the equivalent to “we invested in Enron!”. I hope they have a plan for that.

Zos_Kia@lemmynsfw.com · 1 day ago

Me it reminds me of that period of time where A/B testing was big and everybody and their mother had to at least do some. Never mind that it solved problems we didn’t have, it still was a cool thing to say in a meeting lol

fluxx@lemmy.world · 1 day ago

Wow, great analogy. Might steal this to use myself.

Zos_Kia@lemmynsfw.com · 1 day ago

And that’s what I don’t understand. Instructing a team of juniors works very well, in fact it has been the predominant way of making software for some time now. Hire a bit more junior than what you need, and work them a bit above their pay grade thanks to your experience. That’s just business as usual.

So I guess what these studies show is that most engineers are not really good when it comes to piloting juniors, which has been a known fact forever. That’s often cited as a reason why most seniors will never make it to staff level.

dejected_warp_core@lemmy.world · edit-2 1 day ago

When writing code, I don’t let AI do the heavy lifting. Instead, I use it to push back the fog of war on tech I’m trying to master. At the same time, keep the dialogue to a space where I can verify what it’s giving me.

Never ask leading questions. Every token you add to the conversation matters, so phrase your query in a way that forces the AI to connect the dots for you
Don’t ask for deep reasoning and inference. It’s not built for this, and it will bullshit/hallucinate if you push it to do so.
Ask for live hyperlinks so it’s easier to fact-check.
Ask for code samples, algorithms, or snippets to do discrete tasks that you can easily follow.
Ask for A/B comparisons between one stack you know by heart, and the other you’re exploring.
It will screw this up, eventually. Report hallucinations back to the conversation.

About 20% of the time, it’ll suggest things that are entirely plausible and probably should exist, but don’t. Some platforms and APIs really do have barn-door-sized holes in them and it’s staggering how rapidly AI reports a false positive in these spaces. It’s almost as if the whole ML training stratagem assumes a kind of uniformity across the training set, on all axes, that leads to this flavor of hallucination. In any event, it’s been helpful to know this is where it’s most likely to trip up.

Edit: an example of one such API hole is when I asked ChatGPT for information about doing specific things in Datastar. This is kind of a curveball since there’s not a huge amount online about it. It first hallucinated an attribute namespace prefix of data-star- which is incorrect (it uses data- instead). It also dreamed up a JavaScript-callable API parked on a non-existent Datastar. object. Both of those concepts conform strongly to the broader world of browser-extending APIs, would be incredibly useful, and are things you might expect to be there in the first place.

clif@lemmy.world · 1 day ago

My problem with this, if I understand correctly, is I can usually do all of this faster without having to lead a LLM around by the nose and try to coerce it into being helpful.

That said, search engines do suck ass these days (thanks LLMs)

dejected_warp_core@lemmy.world · 24 hours ago

That’s been my biggest problem with the current state of affairs. It’s now easier to research newer tech through an LLM than it is to play search-result-wack-a-mole, on the off chance that what you need is on a forum that’s not Discord. At least an AI can mostly make sense of vendor docs and extrapolate a bit from there. That said, I don’t like it.

Feyd@programming.dev · 23 hours ago

People will literally do anything to avoid rtfm

xthexder@l.sw0.com · 22 hours ago

It’s a struggle even finding the manual these days if you don’t already know where it is / what it’s called. I was searching about an issue with my car recently and like 90% of the results are generic AI-generated “How to fix ______” with no actual information specific to the car I’m searching for.

boonhet@sopuli.xyz · edit-2 16 hours ago

I searched up a video to replace a part on my car. I did find it, but I also found 15 videos that were AI generated product reviews of the part.

I definitely also want my car parts to be “sleek and stylish” when hidden away under a plastic cover under the hood lmao

VoterFrog@lemmy.world · 24 hours ago

I find it best to get the agent into a loop where it can self-verify. Give it a clear set of constraints and requirements, give it the context it needs to understand the space, give it a way to verify that it’s completed its task successfully, and let it go off. Agents may stumble around a bit but as long as you’ve made the task manageable it’ll self correct and get there.

SleeplessCityLights@programming.dev · 24 hours ago

I like your strategy. I use a system prompt that forces it to ask a question if there are options or if it has to make assumptions. Controlling context is key. It will get lost if it has too much, so I start a new chat frequently. I also will do the same prompts on two models from different providers at the same time and cross reference the idiots to see if they are lying to me.

dejected_warp_core@lemmy.world · 24 hours ago

I use a system prompt that forces it to ask a question if there are options or if it has to make assumptions

I’m kind of amazed that even works. I’ll have to try that. Then again, I’ve asked ChatGPT to “respond to all prompts like a Magic 8-ball” and it knocked it out of the park.

so I start a new chat frequently.

I do this as well, and totally forgot to mention it. Yes, I keep the context small and fresh so that prior conversations (and hallucinations) can’t poison new dialogues.

I also will do the same prompts on two models from different providers at the same time and cross reference the idiots to see if they are lying to me.

Oooh… straight to my toolbox with that one. Cheers.

SleeplessCityLights@programming.dev · 10 hours ago

I forgot another key. The code snippets they give you are bloated and usually do unnecessary things. You are actually going to have to think to pull out the needed line(s) and clean it up. I never copy paste.

BlameTheAntifa@lemmy.world · 1 day ago

And this gets worse over time because you still have to maintain it.

And as the cherry on top - https://www.techradar.com/pro/nearly-half-of-all-code-generated-by-ai-found-to-contain-security-flaws-even-big-llms-affected

PurpleFanatic@quokk.au · 1 day ago

Not surprised.

In my last job, my boss used more and more AI. As a senior dev, I was very used to his coding patterns. I knew the code that he wrote and could generally follow what he made. The more he used AI? The less understandable, confusing and buggy the code became.

Eventually, the CEO of the company abused the “gains” of the AI “productivity” to push for more features with tighter deadlines. This meant the technical debt kept growing, and I got assigned to fixing the messes the AI was shitting all over the code base with.

In the end? We had several critical security vulnerabilities and a code base that even I couldn’t understand. It was dogshit. AI will only ever be used to “increase productivity” and profit while ignoring the chilling effects: lower quality code, buggy software and dogshit working conditions.

Enduring 3 months of this severely burnt me out, I had to quit. The rabid profit incentive needs to go to fucking hell. God I despise of tech bros.

fluxx@lemmy.world · edit-2 1 day ago

The real slowdown comes after when you realize you don’t understand your own codebase because you relied too much on AI. To understand it well enough requires discipline, which in the current IT world is lacking anyway. Either you can rely entirely on AI or you need to monitor its every action, in which case you may be better off writing yourself. But this hybrid approach I don’t think will pan out particularly well.

NoiseColor @lemmy.world · 1 day ago

Yeah, it’s interesting how strangely development is presented, like programming is only about writing code. They still do that when they tout ai coding capabilities.

I’m not against ai, it’s amazing how quickly you can build something. But something small and limited one person can build. The whole human experience is missing, laziness, boredom, communication and issues with communication,… to actually build a good product that’s more than a simple app.

Zos_Kia@lemmynsfw.com · 1 day ago

I think we’ll find a reasonable way to do things cause all of those problems also happen to any CTO of a growing tech team. And some of them have methods to make it work, that are neither letting the team run wild, nor inspecting every line of code they commit.

fluxx@lemmy.world · 1 day ago

Definitely. Or possibly AI will become vastly superior to developers and will require no supervision. In that case, the whole paradigm changes and I don’t know how the software development will look like then. But these are definitely still early days of AI software development, we have a lot to figure out.

plantfanatic@sh.itjust.works · edit-2 1 day ago

Any new tool or technique will slow ANYONE down until you familiarize yourself and get used to it.

This article mind was say the sky is blue and grass is green, it isn’t news and it’s quite obvious it will take a few uses to get decent with it. Like any other new tool, software, etc.

fluxx@lemmy.world · edit-2 1 day ago

This is true. However, the issue is we keep oscillating between AI is useless and over hyped; and it will solve all of life’s problems and you should not call it slop out of respect. The truth is somewhere in between, but we need to fight for it to find it.

porous_grey_matter@lemmy.ml · 1 day ago

deleted by creator

plantfanatic@sh.itjust.works · 1 day ago

The article clearly mentioned they weren’t experienced with AI, that’s the new tool.

curious_dolphin@slrpnk.net · 1 day ago

devs who were used to the tools

Not true - here’s an excerpt from the article:

including only a specialized group of people to whom these AI tools were brand new.

curious_dolphin@slrpnk.net · 1 day ago

Here’s the full paper for the study this article is about: Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (PDF).

Kissaki@programming.dev · 1 day ago

Before starting tasks, developers forecast that allowing AI will reduce completion time by 24%. After completing the study, developers estimate that allowing AI reduced completion time by 20%. Surprisingly, we find that allowing AI actually increases completion time by 19%–AI tooling slowed developers down.

The gap and miss is insane.

tauonite@lemmy.world · 1 day ago

Thank you, the article is shit

raspberriesareyummy@lemmy.world · 1 day ago

No shit, Sherlock. Except that “AI” is a wrongly attributed marketing buzzword.

TwoTiredMice@feddit.dk · edit-2 1 day ago

I get the agenda of the study and I also agree with it, but the study itself, is really low effort.

Obviously, an experienced developer working on a highly specialized project, where the software developer already have all the needed context, and have no experience with using AI, will beat a clueless AI.

How would the results look like, if the software developer had experience with AI, and were to start on a new project, without any existing context? A lot different, i would imagine. AI is also not only for code generation. After a year of working as a software developer, I could no longer gain much experience from my senior colleagues (says much more about them, than me or AI) and I kinda was forced to look for sparring elsewhere. I feel like I have been speed running my experience and career, by using AI. I have never used code generation that much, but instead I’ve used it to learn about things i don’t know i don’t know about. That have been an accelerator.

Today, I’m using code generation much more; when starting a new project, or when i need to prototype something, complete mundane tasks on existing projects, make some none-critical python scripts, get useful bash scripts, spin up internal UI projects, etc…

Sometimes, i naturally waste time, as it takes time for an AI to produce code, and then it takes time to review the code, but in general I feel my productivity have gained by using AI.

TrickDacy@lemmy.world · 1 day ago

Yeah, I think it’s weird how people need to think in such a binary manner. AI sucks in almost every way and it can also save you time as a quick auto complete in an IDE. You’d have to be an idiot to have it write big blocks of code you don’t understand. That’s on you if you do it. If you want to use it to improve productivity, you should just let it write a few lines here and there which otherwise costs you several seconds if you didn’t. When it comes to refactoring, I’ve found GitHub copilot helps a lot because what I’m doing is changing from one common pattern to another, probably even more common pattern. It’s predictable, so it usually gets it fairly right.

If it were really artificially intelligent you could just describe a program and in seconds get a nearly bug-free, productiom-ready app. That’s a LONG way off, if it ever happens. People treating LLMs like they are actually AI is the issue. Stop misusing the tool.

Use judgement, people.

Gamma@beehaw.org · 1 day ago

It’s a study done by METR, they constantly pump out papers talking about the “significant catastrophic risks via AI self improvement or rogue replication”

The fact that they’re publishing something negative is what’s interesting here. Except that they’re reporting on a 6 month old study lol

Kissaki@programming.dev · 1 day ago

if the software developer had experience with AI

Did these developers not have experience with AI?

and were to start on a new project, without any existing context

I’m not sure focusing on one aspect to scope a reasonable and doable study automatically makes it “really low effort”.

If they were to test a range of project types, it’d have to be a much bigger study.

TwoTiredMice@feddit.dk · edit-2 1 day ago

Did these developers not have experience with AI?

This is from the article

But Rush and Becker have shied away from making sweeping claims about what the results of their study mean for the future of AI. For one, the study’s sample was small and non-generalizable, including only a specialized group of people to whom these AI tools were brand new.

I’m not sure focusing on one aspect to scope a reasonable and doable study automatically makes it “really low effort”.

You are right, but I believe they should at least have chosen another use case, to make it interesting. I wouldn’t have needed a study to know that an AI performs worse than a developer in a project the developer most likely built them self. The existing project might have some really weird code smells and work arounds that only the developer on the project knows about and understand. There might be relevant context external to the solution. The AI have to be a mind reader in these cases.

But, if you gave the AI and the developer a blank canvas a clear defined task, I just believe it would be a more interesting study. *

It kind of sounds like they were just handed a tool they knew nothing about and were asked to perform better with it. A mitter saw is way better and faster than a regular saw, if you know how to use it.

*edit

To make my point more clear, I don’t mean the developer needed to solve an issue that’s not related to his daily work, but a task that’s not dependent on years of tech debt or context that is not provided to the AI. And yes, by that, I don’t believe code generation from an AI have a big use case in scenarios where the project have too many dependencies and touches on niche solutions, but you can still use it for other purposes than building features.

melfie@lemy.lol · 1 day ago

I think it’s fair to say that AI yields a modest productivity boost in many cases when used appropriately. It’s quicker to write a detailed prompt than it is to write code most of the time. If you have a good test setup with BDD, you can read the descriptions to make sure the behavior it is implementing is correct and also review the test code to ensure it matches the descriptions. From there, you can make it iterate however long it takes to get the tests passing while you’re only halfway paying attention. Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.

It often takes longer to get tests passing than it would take me, and in order to get code I’m happy with like I would write, I usually need to make it do a fair amount of refactoring. However, the fact that I can leave it churning on tests, lint errors, etc. while doing something else is nice. It’s also nice that I can have it write all the code for languages I don’t particularly like and don’t want to learn like Ruby, and I only have to know enough to read it and not have to write any of it.

Honestly, if a candidate in a coding interview made as many mistakes and churned on getting tests passing as long as GH Copilot does, I’d probably mark it as a no hire. With AI, unlike a human, you can be brutally honest and make it do everything according to your specifications without hurting its feelings, whereas micromanaging a bad human coder to the same extent won’t work.

porous_grey_matter@lemmy.ml · 1 day ago

I think it’s fair to say that AI yields a modest productivity boost in many cases when used appropriately

I think this is a mistake. The example in this post is some empirical evidence for that.

It’s not clear that we can know in advance whether it’s appropriate for any given usecase; rather it seems more likely that we are just pigeons pecking at the disconnected button and receiving random intermittent reinforcement.

WolfLink@sh.itjust.works · edit-2 1 day ago

This sounds awful to me. Passing the tests is the starting point. It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at. And that’s not even getting into if the code is efficient, or has other bugs or opportunities for bugs not captured by the test cases.

yo_scottie_oh@lemmy.ml · 1 day ago

It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at.

Fair points, although it seems to me the original commenter addresses this at the end of their first paragraph:

Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.

melfie@lemy.lol · edit-2 1 day ago

BDD testing frameworks produce useful documentation, which is why it’s critical to carefully review any AI generated “spec” to ensure the behavior is going to be what you want and that all of the scenarios are covered. Even if all the code is AI generated, humans should be deeply involved in making sure the “spec” is right, manually blocking out describe / it blocks or Cucumber rule / scenario / GWT as necessary. In the case of Cucumber with Acceptance Test-Driven development, it’s still useful to have humans write most of the feature file with AI assisting with example mapping and step definition implementation.

You’re right that there are also non-functional concerns like performance, security, maintainability, etc. that AI will often get wrong, which is why reviewing and refactoring are mandatory. AI code reviews in CI can sometimes catch some of these issues, and other checks in the pipeline like performance tests, CodeQL for security issues, Sonarqube for code smells / security, etc. can help as well.

AI can be a modest productivity boost when the proper guardrails are in place, but definitely not a free lunch. You can quickly end up with AI slop without the proper discipline, and there are definitely times when it’s appropriate to stop using AI and do it yourself, of not use AI in the first place for certain work that requires creative problem solving, that is too complex for AI to get right, etc. Usually anything I wouldn’t delegate to a junior engineer, I just do it myself and don’t bother with AI—that’s a good rule of thumb.

VoterFrog@lemmy.world · 1 day ago

This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.

But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.

melfie@lemy.lol · 1 day ago

Precisely. I think a big part of maturing as an engineer is being able to see past both the hype and cynicism with new tech and instead understand that everything has strengths, weaknesses, and trade-offs, and that some things are also a matter of opinion because software development is an art as much as it is a science. The goal is to have a nuanced understanding the capabilities of each tool in order to use the right tool in the right context, mitigate its weaknesses, and pair it with your own and your team’s strengths, weaknesses, preferences, career goals, etc.