Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer

codeinabox@programming.dev · 3 days ago

Experienced software developers assumed AI would save them a chunk of time. But in one experiment, their tasks took 20% longer

WolfLink@sh.itjust.works · edit-2 2 days ago

This sounds awful to me. Passing the tests is the starting point. It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at. And that’s not even getting into if the code is efficient, or has other bugs or opportunities for bugs not captured by the test cases.

yo_scottie_oh@lemmy.ml · 2 days ago

It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at.

Fair points, although it seems to me the original commenter addresses this at the end of their first paragraph:

Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.

melfie@lemy.lol · edit-2 2 days ago

BDD testing frameworks produce useful documentation, which is why it’s critical to carefully review any AI generated “spec” to ensure the behavior is going to be what you want and that all of the scenarios are covered. Even if all the code is AI generated, humans should be deeply involved in making sure the “spec” is right, manually blocking out describe / it blocks or Cucumber rule / scenario / GWT as necessary. In the case of Cucumber with Acceptance Test-Driven development, it’s still useful to have humans write most of the feature file with AI assisting with example mapping and step definition implementation.

You’re right that there are also non-functional concerns like performance, security, maintainability, etc. that AI will often get wrong, which is why reviewing and refactoring are mandatory. AI code reviews in CI can sometimes catch some of these issues, and other checks in the pipeline like performance tests, CodeQL for security issues, Sonarqube for code smells / security, etc. can help as well.

AI can be a modest productivity boost when the proper guardrails are in place, but definitely not a free lunch. You can quickly end up with AI slop without the proper discipline, and there are definitely times when it’s appropriate to stop using AI and do it yourself, of not use AI in the first place for certain work that requires creative problem solving, that is too complex for AI to get right, etc. Usually anything I wouldn’t delegate to a junior engineer, I just do it myself and don’t bother with AI—that’s a good rule of thumb.