We Were Promised Jetpacks: Why AI Isn't Accelerating Feature Delivery
Where are the engineering productivity and velocity gains we were promised with AI coding tools?
AI tools assist in writing half of Google's code. Microsoft is not far behind at 30%. With so many more lines of code generated by AI, you may wonder: where are the massive engineering productivity improvements? Where are the spades of new features being delivered? Consider this: shovelware apps have not exploded since coding tools emerged. They've actually declined. In fact, there's a growing swell of concern that AI tools are actually slowing down coding.
Why is this happening? A clue comes from a recent Stack Overflow survey of over 30,000 developers. In the study, 45% of developers said that one of their biggest frustrations with AI was that debugging had become more time-consuming. Why? Because the developer hasn't written the code themselves, they're less familiar with it. They take longer to find the root cause of issues because they first must orient themselves in an unfamiliar environment. While generating lines of code is now nearly instantaneous, the real challenge is making that code work in production. And unfortunately, for reasons we'll get into, current AI models are not good at helping us operate code—just build it.
A Symmetry Problem
AI hasn't boosted engineering productivity because tools excel at writing new code, but not at operationalizing it, like testing, deploying, and debugging. The result is a stark asymmetry between how quickly we're able to produce new code versus what happens afterwards. Typically, the rest of the team has to put in significant effort to keep that code working, including deploying it, maintaining it, and supporting it. No wonder some studies show that using AI is actually slowing teams down by almost 20%. We've basically been ignoring a massive part of the SDLC—the part where we actually end up spending most of our dollars, and time.
It's time to focus on the second half of the SDLC. Sure, there are still improvements we can make in AI code generation, but we will not reap the benefits of AI until we look at the picture more holistically.

The Art and Science of Coding
Before I dive into what I think the answer is, I want to back up and explain why tackling the second half of the SDLC will be significantly harder than the first half. The explanation lies in a fundamental cognitive mismatch: building software and maintaining production systems require opposite mental processes—and current AI excels at only one of them.
The Artistic Process: Building Up
Building new software is, in a way, like an artistic process. Your starting point is a vision of how things should behave: data moves from this start to this end, changed as such via a control flow.
You're not thinking deeply about edge cases. You're not enumerating every possible state the system could enter. It’s impossible: the complexity of all the things that could go wrong would paralyze you. You are forced to simplify. If you tried to account for everything while building, you'd never write the first line of code. The creative process requires focusing on the happy path, the core functionality, the intended behavior.
That is the reason AI coding assistants make such great prototypes. They're phenomenal at this forward-looking, creative generation. Tell an AI what you want to build, and it constructs a plausible implementation remarkably well. It thinks in terms of how things should work, translating intent into code without getting bogged down in defensive programming.
The "vibe coding" phenomenon—describing what you want and watching it materialize—works precisely because it aligns with this artistic, building-up cognitive mode. It's why one-shot application generators can produce startling results. Creating an HTML file, CSS styling, and JavaScript interactivity to build a prototype is artistically straightforward for these models.
The Scientific Process: Building Down
Running software in production tells a different story. You must use a scientific process when something breaks in a complex system. You aren’t creating forward starting with intent; you're investigating backward from a failure.
You must understand the whole architecture to debug effectively. You need to understand how data actually flows through the system, not just how it was supposed to flow. By exploring a wide range of possibilities, you use different techniques to narrow down your understanding, leading to an explanation of the unexpected behavior.
This means enumerating states. It means thinking about edge cases you deliberately ignored while building. It means pinning down the specific corner of reality where the system behaves differently than expected. Where building up is artistic and expansive, building down is scientific and investigative.
The best developers are excellent at both, and they've learned to switch between these modes naturally. But they're complementary skills, not identical ones. You develop them separately, through different types of experience.

Where AI Hits the Wall
Current AI models are dramatically better at the artistic process than the scientific one.
We've spent two years optimizing for code generation speed. We've built tools that excel at taking specifications and producing implementations. First came enhanced autocomplete, then multi-line suggestions, then entire functions, and now complete applications. Each iteration improved the artistic, forward-building capability.
But to effectively run high-quality production software, you don't just need more code. To understand, you need what came before, the paths not taken, and what exists. You must be able to reverse-engineer a system from a defect. You need models that can understand complex, decade-old architectures and their behaviors.
This is an entirely distinct challenge. Many businesses maintain codebases dating back decades. Emergent behaviors, implicit dependencies, and past workarounds cause the technical debt in these systems to compound. Understanding them requires the scientific, building-down process that current AI models struggle with.
An AI assistant might generate perfect code for a new feature. But can it understand why that feature, once deployed, causes a cascade failure in a different module? Can it trace through years of architectural decisions to identify the conditions that produce a specific bug? Can it enumerate the edge cases that the original builders intentionally ignored but now matter in production?
Not yet. And this limitation explains the productivity paradox.
The Reality Gap
The distinction becomes even clearer when you think about what actually causes defects in production software. Most defects don't come from poor implementation by engineers (although some do). They usually stem from a misunderstanding of reality.
You sit down, design a feature based on your mental model of how users will interact with it, and implement it correctly according to that model. But when a user encounters the feature, you learn that your mental model was incomplete or off-kilter. Maybe the feature should behave differently in a case you didn't consider. Maybe there's a workflow you didn't imagine. Maybe the interaction with another system creates unexpected results.
This is why there's often a blurry boundary between bugs and feature requests. Both represent discoveries about reality that you didn't account for initially. Both require understanding the gap between your model and actual user behavior. Both need the scientific, investigative mindset to diagnose and address.
AI code generators start with a specification—a model of reality—and generate forward from it. But they struggle with the nuanced understanding of reality required to test that model against actual behavior, identify where it breaks down, and reason backward to understand why.
The Path Forward
This doesn't mean we should only use AI for code generation and leave software operations to humans. It just means we need to evolve our thinking. The industry made a massive leap from simple autocomplete to conversational code generation—a complete workflow transformation. We need an equivalent transformation for the analytical, debugging, and maintenance phases of software development.
Many tools address modern code tasks, yet they operate in isolation. They are moderately improving an existing process versus re-imagining a new process from scratch.

You can get incremental gains from a better AI-powered code review system, or with agentic SRE, but the biggest advances will come from tools that rethink the entire software operations process, not just enhance an existing princess. Successful AI tools must reverse-engineer, enumerate, and help developers find sources of unexpected behavior. Besides being artistic builders, they will also need to be scientific investigators. They'll examine the problem as a whole, not in isolation.
In the meantime, anticipate great prototypes and annoying production issues. This cognitive disconnect is here to stay, as it's key to how these systems operate. Understanding it is the first step toward building the next generation of tools that can finally deliver on the productivity promises of AI-assisted software development.
The models that can master both the art of building up and the science of building down while rethinking the entire SDLC won't just change how we code. They'll change what's possible to build and maintain at scale.


