Metrics Are the Beginning. Go Ask Why.

The real AI code review problem has nothing to do with tokens.

I’ve been helping customers measure their AI toolchain. And the same problem keeps showing up.

They want to know if their AI investment is working. So we look at the metrics. Token count. Utilization rates. PR volume. And almost every time, the same question surfaces — not from me, but from them:

“We’re shipping more. But are we reviewing less carefully?”

Nobody made that decision. It happened as a consequence of velocity pressure. Review queues got longer. Engineers started approving faster, in higher volumes, with less scrutiny. The standard drifted without anyone announcing it.

The green dashboard problem

I’ve been a leader in enough leadership reviews to know that a green dashboard can be the most dangerous thing in the room. The number tells you where to look. It doesn’t tell you what you’re looking at.

The metric alone is never the answer. The “why” behind it is.

The framework I use when a metric spikes

Cycle time spikes: Is work too big? Is review too slow? Is there a hidden dependency nobody surfaced?

Deployment frequency drops: Is the pipeline broken? Are PRs getting batched because review is the bottleneck? Or did scope creep make everything too big to ship?

Unlinked PRs climb: Is planning disconnected from execution? Are engineers solving problems that never made it to the backlog? Or is the ticketing system too painful to use?

Review time grows: Is one person the bottleneck? Are PRs too large to review quickly? Or does the team lack shared context on what’s being built?

What this means for AI toolchains

The teams that maximize AI toolchains aren’t the ones with the best dashboards. They’re the ones that treat every metric as a door to a conversation, not a destination.

Metrics are the beginning. Go ask why.