I think it’s fair to say that AI yields a modest productivity boost in many cases when used appropriately. It’s quicker to write a detailed prompt than it is to write code most of the time. If you have a good test setup with BDD, you can read the descriptions to make sure the behavior it is implementing is correct and also review the test code to ensure it matches the descriptions. From there, you can make it iterate however long it takes to get the tests passing while you’re only halfway paying attention. Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.
It often takes longer to get tests passing than it would take me, and in order to get code I’m happy with like I would write, I usually need to make it do a fair amount of refactoring. However, the fact that I can leave it churning on tests, lint errors, etc. while doing something else is nice. It’s also nice that I can have it write all the code for languages I don’t particularly like and don’t want to learn like Ruby, and I only have to know enough to read it and not have to write any of it.
Honestly, if a candidate in a coding interview made as many mistakes and churned on getting tests passing as long as GH Copilot does, I’d probably mark it as a no hire. With AI, unlike a human, you can be brutally honest and make it do everything according to your specifications without hurting its feelings, whereas micromanaging a bad human coder to the same extent won’t work.
I think it’s fair to say that AI yields a modest productivity boost in many cases when used appropriately
I think this is a mistake. The example in this post is some empirical evidence for that.
It’s not clear that we can know in advance whether it’s appropriate for any given usecase; rather it seems more likely that we are just pigeons pecking at the disconnected button and receiving random intermittent reinforcement.
This sounds awful to me. Passing the tests is the starting point. It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at. And that’s not even getting into if the code is efficient, or has other bugs or opportunities for bugs not captured by the test cases.
It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at.
Fair points, although it seems to me the original commenter addresses this at the end of their first paragraph:
Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.
BDD testing frameworks produce useful documentation, which is why it’s critical to carefully review any AI generated “spec” to ensure the behavior is going to be what you want and that all of the scenarios are covered. Even if all the code is AI generated, humans should be deeply involved in making sure the “spec” is right, manually blocking out describe / it blocks or Cucumber rule / scenario / GWT as necessary. In the case of Cucumber with Acceptance Test-Driven development, it’s still useful to have humans write most of the feature file with AI assisting with example mapping and step definition implementation.
You’re right that there are also non-functional concerns like performance, security, maintainability, etc. that AI will often get wrong, which is why reviewing and refactoring are mandatory. AI code reviews in CI can sometimes catch some of these issues, and other checks in the pipeline like performance tests, CodeQL for security issues, Sonarqube for code smells / security, etc. can help as well.
AI can be a modest productivity boost when the proper guardrails are in place, but definitely not a free lunch. You can quickly end up with AI slop without the proper discipline, and there are definitely times when it’s appropriate to stop using AI and do it yourself, of not use AI in the first place for certain work that requires creative problem solving, that is too complex for AI to get right, etc. Usually anything I wouldn’t delegate to a junior engineer, I just do it myself and don’t bother with AI—that’s a good rule of thumb.
This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.
But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.
Precisely. I think a big part of maturing as an engineer is being able to see past both the hype and cynicism with new tech and instead understand that everything has strengths, weaknesses, and trade-offs, and that some things are also a matter of opinion because software development is an art as much as it is a science. The goal is to have a nuanced understanding the capabilities of each tool in order to use the right tool in the right context, mitigate its weaknesses, and pair it with your own and your team’s strengths, weaknesses, preferences, career goals, etc.
I think it’s fair to say that AI yields a modest productivity boost in many cases when used appropriately. It’s quicker to write a detailed prompt than it is to write code most of the time. If you have a good test setup with BDD, you can read the descriptions to make sure the behavior it is implementing is correct and also review the test code to ensure it matches the descriptions. From there, you can make it iterate however long it takes to get the tests passing while you’re only halfway paying attention. Then review the entire git diff and have it refactor as required to ensure clean and maintainable code while fixing any broken tests, lint errors, etc.
It often takes longer to get tests passing than it would take me, and in order to get code I’m happy with like I would write, I usually need to make it do a fair amount of refactoring. However, the fact that I can leave it churning on tests, lint errors, etc. while doing something else is nice. It’s also nice that I can have it write all the code for languages I don’t particularly like and don’t want to learn like Ruby, and I only have to know enough to read it and not have to write any of it.
Honestly, if a candidate in a coding interview made as many mistakes and churned on getting tests passing as long as GH Copilot does, I’d probably mark it as a no hire. With AI, unlike a human, you can be brutally honest and make it do everything according to your specifications without hurting its feelings, whereas micromanaging a bad human coder to the same extent won’t work.
I think this is a mistake. The example in this post is some empirical evidence for that.
It’s not clear that we can know in advance whether it’s appropriate for any given usecase; rather it seems more likely that we are just pigeons pecking at the disconnected button and receiving random intermittent reinforcement.
This sounds awful to me. Passing the tests is the starting point. It’s also important to make sure the code makes sense and is documented so whoever reads it 2 years from now (be that you, someone else, or I guess another llm) will understand what they are looking at. And that’s not even getting into if the code is efficient, or has other bugs or opportunities for bugs not captured by the test cases.
Fair points, although it seems to me the original commenter addresses this at the end of their first paragraph:
BDD testing frameworks produce useful documentation, which is why it’s critical to carefully review any AI generated “spec” to ensure the behavior is going to be what you want and that all of the scenarios are covered. Even if all the code is AI generated, humans should be deeply involved in making sure the “spec” is right, manually blocking out describe / it blocks or Cucumber rule / scenario / GWT as necessary. In the case of Cucumber with Acceptance Test-Driven development, it’s still useful to have humans write most of the feature file with AI assisting with example mapping and step definition implementation.
You’re right that there are also non-functional concerns like performance, security, maintainability, etc. that AI will often get wrong, which is why reviewing and refactoring are mandatory. AI code reviews in CI can sometimes catch some of these issues, and other checks in the pipeline like performance tests, CodeQL for security issues, Sonarqube for code smells / security, etc. can help as well.
AI can be a modest productivity boost when the proper guardrails are in place, but definitely not a free lunch. You can quickly end up with AI slop without the proper discipline, and there are definitely times when it’s appropriate to stop using AI and do it yourself, of not use AI in the first place for certain work that requires creative problem solving, that is too complex for AI to get right, etc. Usually anything I wouldn’t delegate to a junior engineer, I just do it myself and don’t bother with AI—that’s a good rule of thumb.
This lines up with my experience as well and what you’ve described is very close to how I work with LLM agents. The people bragging about 10x are either blowing smoke or producing garbage. I mean, I guess in some limited contexts I might get 10x out of taking a few seconds to write a prompt vs a couple of minutes of manual hunting and typing. But on the whole, software engineering is about so much more than just coding and those things have become no less important these days.
But the people acting like the tech is a useless glorified Markov generator are also out of their mind. There are some real gains to be had by properly using the tech. Especially once you’ve laid the groundwork by properly documenting things like your architecture and dependencies for LLM consumption. I’m not saying this to try to sell anybody on it but I really, truly, can’t imagine that we’re ever going back to the before times. Maybe there’s a bubble burst like the dotcom bubble but, like the internet, agentic coding is here to stay.
Precisely. I think a big part of maturing as an engineer is being able to see past both the hype and cynicism with new tech and instead understand that everything has strengths, weaknesses, and trade-offs, and that some things are also a matter of opinion because software development is an art as much as it is a science. The goal is to have a nuanced understanding the capabilities of each tool in order to use the right tool in the right context, mitigate its weaknesses, and pair it with your own and your team’s strengths, weaknesses, preferences, career goals, etc.