It’s like a new telling of the “Tortoise and the Hare”: A group of experienced software engineers entered into an experiment where they were tasked with completing some of their work with the help of AI tools. Thinking like the speedy hare, the developers expected AI to expedite their work and increase productivity. Instead, the technology slowed them down more. The AI-free tortoise approach, in the context of the experiment, would have been faster.
The researchers enlisted 16 software developers, who had an average of five years of experience, to conduct 246 tasks, each one a part of projects on which they were already working. For half the tasks, the developers were allowed to use AI tools—most of them selected code editor Cursor Pro or Claude 3.5/3.7 Sonnet—and for the other half, the developers conducted the tasks on their own.
Believing the AI tools would make them more productive, the software developers predicted the technology would reduce their task completion time by an average of 24%. Instead, AI resulted in their task time ballooning to 19% greater than when they weren’t using the technology.
So where did the hares veer off the path? The experienced developers, in the midst of their own projects, likely approached their work with plenty of additional context their AI assistants did not have, meaning they had to retrofit their own agenda and problem-solving strategies into the AI’s outputs, which they also spent ample time debugging, according to the study.
“The majority of developers who participated in the study noted that even when they get AI outputs that are generally useful to them—and speak to the fact that AI generally can often do bits of very impressive work, or sort of very impressive work—these developers have to spend a lot of time cleaning up the resulting code to make it actually fit for the project,” study author Rush told Fortune.
Other developers lost time writing prompts for the chatbots or waiting around for the AI to generate results.
But Rush and Becker have shied away from making sweeping claims about what the results of the study mean for the future of AI.
For one, the study’s sample was small and non-generalizable, including only a specialized group of people to whom these AI tools were brand new. The study also measures technology at a specific moment in time, the authors said, not ruling out the possibility that AI tools could be developed in the future that would indeed help developers enhance their workflow.
The purpose of the study was, broadly speaking, to pump the brakes on the torrid implementation of AI in the workplace and elsewhere, acknowledging more data about AI’s actual effects need to be made known and accessible before more decisions are made about its applications.
“Some of the decisions we’re making right now around development and deployment of these systems are potentially very high consequence,” Rush said. “If we’re going to do that, let’s not just take the obvious answer. Let’s make high-quality measurements.”
“For those people who have already had 20 years, or in this specific example, five years of experience, maybe it’s not their main task that we should look for and force them to start using these tools if they’re already well functioning in the job with their existing work methods,” Anders Humlum, an assistant professor of economics at the University of Chicago’s Booth School of Business, told Fortune.
Humlum’s research supports MIT economist and Nobel laureate Daron Acemoglu’s assertion that markets have overestimated productivity gains from AI. Acemoglu argues only 4.6% of tasks within the U.S. economy will be made more efficient with AI.
“In the real world, many tasks are not as easy as just typing into ChatGPT,” Humlum said. “Many experts have a lot of experience [they’ve] accumulated that is highly beneficial, and we should not just ignore that and give up on that valuable expertise that has been accumulated.”
“I would just take this as a good reminder to be very cautious about when to use these tools,” he added.