The Engineering Productivity Metrics Nobody's Tracking (But Should Be)

Most engineering leaders I talk to can tell me their deployment frequency, their cycle time, their sprint velocity. A few of them track DORA metrics. Some have dashboards that update in real time.

Almost none of them can tell me what percentage of their senior engineers' time is consumed by production firefighting.

That's not a data problem. It's a framing problem. The engineering productivity metrics that have become standard over the last decade are all optimized for measuring the throughput of feature development. They're good at telling you how fast code moves from a branch to production. They're not designed to measure what happens after it gets there — or the compounding organizational cost when things go wrong.

The result is a hidden drain that doesn't appear in any dashboard but shows up clearly in sprint retrospectives, in engineer attrition surveys, and in the gap between what your team planned to ship and what actually shipped.

What Standard Metrics Miss

Let's take DORA at face value. Deployment frequency, lead time for changes, change failure rate, time to restore service. These are genuinely useful signals, and tracking them matters.

But notice what they measure: the pipeline. They tell you how efficiently code moves through your development process. They don't tell you how much of your best engineers' time is being spent outside that pipeline entirely — in Slack threads, in Zoom calls reconstructing incidents, in Jira tickets that have been open for six weeks because nobody can reproduce the issue without a specific customer's environment.

Sprint velocity has the same blind spot. Velocity counts story points completed. It doesn't count the unplanned work that displaced the planned work. Engineering organizations routinely build 20-30% buffer into their sprints specifically to absorb the support triage burden — and then treat the fact that velocity held steady as evidence that everything's fine. It isn't. The buffer is the evidence. A team that needs 30% slack to absorb unplanned escalations is a team running 30% below its theoretical output, every single sprint.

The productivity drain is real and measurable. It's just that almost nobody is measuring it.

The Iceberg Problem

Here's a dynamic that makes the hidden cost even harder to see: most issues are never reported.

The tickets in your queue represent the customers who were frustrated enough to file a report. They don't represent the customers who hit the same issue, couldn't figure it out, and quietly churned. Industry estimates consistently put the reporting rate for software issues somewhere around one to five percent of affected users. The queue you're managing is the visible tip.

This creates two compounding problems.

The first is that your engineering team is calibrating its triage investment to the reported volume, not the actual volume. If three tickets a week are coming in about a specific error, the issue might be affecting 60 to 300 users a week — and the business impact of fixing it is proportionally larger than the ticket count suggests. Automated issue detection that surfaces problems proactively — before users report them — changes this calculus entirely.

The second is that duplicate investigation is the default. When a similar issue surfaces six months after the last occurrence, there's rarely a reliable mechanism to connect the new ticket to the prior investigation. The engineering team re-discovers the root cause, re-traces the code path, re-documents the fix. The full triage cost is paid again from scratch.

The productivity loss here isn't visible in velocity metrics because it's not displacing sprints — it's showing up as the difference between what engineers could accomplish and what they actually accomplish. That's a difficult thing to put a number on, which is exactly why it tends not to get put on a number.

Knowledge That Evaporates

There's a third dimension of the problem that's harder to see than the first two and more expensive in the long run.

Every time an incident gets resolved, someone learns something. They learn which area of the codebase is fragile under specific conditions. They learn which customer configurations produce unexpected behavior. They learn which log signal is the reliable indicator and which is noise. That knowledge is real organizational value — but in most teams, it evaporates almost immediately. It lives in the engineer's head, maybe in a comment on a closed Jira ticket, possibly in a Slack thread that's impossible to search six months later.

The result is that organizations systematically fail to compound their debugging knowledge. Every resolved ticket should make the next similar ticket faster to close. In most cases, it doesn't — because there's no mechanism to encode what was learned in a way that's accessible the next time a related symptom appears.

This is the productivity failure that doesn't show up in any metric: you're paying the full investigation cost, every time, for categories of issues you've already solved. The third time a variant of the same problem surfaces, you're essentially starting from scratch.

When senior engineers describe the frustration of their role, this is often what they mean. It's not just that the escalation volume is high. It's that the work doesn't accumulate. You debug the same class of problem in slightly different configurations, repeatedly, with no mechanism for the organization to get smarter about it.

The Compounding Math

Let's make this concrete.

Take a mid-size engineering organization: 50 engineers, average fully-loaded cost of $250,000 per year. Assume senior engineers — the ones who actually touch the hard escalations — make up 20% of that team, at $350,000 loaded cost.

If those 10 senior engineers are spending 30% of their time on support triage and escalation work, that's $1.05M in annual capacity going to reactive investigation. That's before you account for the opportunity cost: the features that weren't built, the architecture improvements that weren't made, the onboarding that didn't happen because the senior engineers were occupied.

Now layer in the compounding effect. If each senior engineer touches 200 escalations per year, and each escalation requires an average of 3 hours of investigation, that's 600 hours per engineer — 15 full work weeks — spent on triage. Most of that investigation is work that doesn't accumulate. The knowledge produced by those 15 weeks of effort doesn't reduce the 15 weeks next year.

Compare that to an organization that has captured enough of that investigative knowledge to cut average investigation time in half. Same escalation volume, same complexity — but the organization is now returning 7+ weeks of senior engineer time per person per year to planned work. At scale, that's the difference between a team that ships and a team that maintains.

This is what production engineering is actually about at its core. Not just faster incident response — though that matters — but the systematic accumulation of organizational knowledge about how your system fails, so that the cost of the next incident is lower than the cost of the last one.

What "Good" Looks Like in Practice

The teams I've seen break this cycle share a few characteristics.

They track the right things. Not just velocity and deployment frequency, but escalation rate per sprint, average investigation time per ticket tier, and what percentage of incoming tickets match patterns seen before. These numbers tell a different story than DORA metrics, and they tend to surface the hidden capacity drain in a way that's actionable.

They invest in support ticket deflection at the right level. Most ticket deflection work focuses on L1: chatbots, knowledge bases, self-service FAQs. That's valuable, but it doesn't address the escalations that actually consume engineering capacity — the L2 and L3 tickets that require code-level investigation. Deflecting those requires not just better documentation but a system that can reason about whether a new ticket matches a prior resolution.

They treat resolved incidents as training data. Every ticket that gets fully investigated and closed contains a chain of reasoning: what was checked, what was ruled out, what the actual cause was, what the fix looked like. Organizations that build infrastructure to capture and reuse that reasoning chain get faster over time. Organizations that let it evaporate pay the full cost again on the next occurrence.

And critically: they measure the output of this work against engineering capacity, not just resolution time. Reducing debugging time is a metric. Returning that time to planned work — and tracking what gets built with it — is the actual goal.

The Metrics Worth Adding to Your Dashboard

If you're serious about understanding the true engineering productivity picture, here's what to start tracking alongside your existing DORA metrics:

Escalation rate as a percentage of sprint capacity. How much unplanned work is actually landing on engineering each sprint, measured in hours rather than story points? This number tells you what your buffer is actually absorbing.

Average investigation time by ticket tier. L1 tickets closed by support without engineering involvement, L2 tickets requiring engineering context, L3 tickets requiring senior engineer time. If L3 investigation time is growing quarter over quarter, the problem is compounding.

Repeat issue rate. What percentage of incoming tickets are variants of issues you've investigated before? If this number is high, you're paying the full triage cost repeatedly for problems you've already solved. This is the most direct measure of whether your organization is compounding its debugging knowledge or starting from scratch every time.

Deflection rate at L2/L3. Not just L1 deflection via self-service, but the percentage of escalations that support can resolve without engineering involvement because the relevant context is available to them. Cayuse reduced customer-facing issues by 90% by shifting this number. Key Data cut resolution time by 3x. These outcomes are measurable — but only if you're tracking the right baseline.

The goal isn't to make your dashboard more complex. It's to make the hidden drain visible. Once it's visible, the investment case for fixing it becomes much clearer — and the compound return on fixing it becomes much easier to communicate to leadership.

Frequently Asked Questions

Why don't standard engineering metrics capture the support triage burden?

DORA metrics and sprint velocity are designed to measure the throughput of planned development work. They measure how efficiently code moves through your pipeline. They don't measure the unplanned work that displaces planned work — which is exactly where support triage shows up. Teams typically absorb this through buffer, which makes the sprint velocity look stable while hiding the underlying capacity drain.

What's a healthy escalation rate for a mid-size engineering team?

There's no universal benchmark, but a useful frame: if senior engineers are spending more than 15-20% of their time on unplanned escalation and triage work, the drain is large enough to materially impact planned work. Teams with well-functioning automated issue resolution and context-aware triage typically see this drop to under 10%.

What's the difference between L1 ticket deflection and L3 ticket deflection?

L1 deflection means a customer finds the answer through self-service — a knowledge base article, a chatbot response — before filing a ticket. L3 deflection means a support engineer resolves a technically complex issue without escalating to a software engineer, because they have access to the code context and prior resolution history they need. L1 deflection reduces ticket volume. L3 deflection reduces the engineering capacity drain — which is typically where the real cost lives.

How does the "repeat issue rate" metric work in practice?

Track incoming tickets and categorize them against prior investigations. If a new ticket involves the same error pattern, the same code path, or the same customer configuration class as a ticket you resolved previously, it's a repeat. Most teams don't have infrastructure to do this matching automatically — which is why the repeat rate stays invisible and the re-investigation cost gets paid repeatedly. Context graphs built from ticket history, code, and telemetry make this matching tractable.

How does this connect to engineering world models?

An engineering world model is what makes it possible for debugging knowledge to accumulate rather than evaporate. Every resolved incident teaches the model something about how your system fails — which configurations cause which behaviors, which code changes introduce which categories of regression. Over time, the model gets better at connecting new symptoms to prior root causes, reducing investigation time and making the repeat issue rate drop systematically. That compounding effect is what turns triage from a drain into an investment.

PlayerZero builds the engineering world model that turns resolved incidents into institutional knowledge. If you're trying to quantify the triage drain and make the case for fixing it, the production engineering overview and Cayuse case study are good starting points. Or book a demo with a sprint's worth of your real tickets.