Software Development Did It Backwards. We Scaled the Machine First.

Software development did it backwards. We scaled the machine first.

I’ve been helping customers measure their AI toolchain. And the same problem keeps showing up.

They want to know if their AI investment is working. So we look at the metrics. Token count. Utilization rates. PR volume. And almost every time, the same question surfaces — not from me, but from them:

“We’re shipping more. But are we reviewing less carefully?”

Nobody made that decision. It happened as a consequence of velocity pressure. Review queues got longer. Engineers started approving faster, in higher volumes, with less scrutiny. The standard drifted without anyone announcing it.

The CNC problem

There’s a manufacturing analogy that explains exactly how this happens.

CNC — Computer Numerical Control — arrived in manufacturing in the 1970s. It replaced manual machining with machine-executed precision. Fast. Consistent. Does exactly what it is told.

The problem is it also produces defects at scale — consistently, repeatedly, exactly.

So before any shop scaled CNC, they defined tolerances. What constitutes a passing part? Who inspects the output? Who is accountable when a defective part ships?

The machine didn’t define the standard. The standard existed before the machine was turned on.

Software development did it backwards. We scaled the machine first.

What regulated industries got right

Regulated industries are seeing the best AI rollouts right now — not because regulation makes them efficient, but because their standards existed before the tool arrived. They didn’t need a meeting. The conversation had already happened.

Healthcare. Finance. Defense. They defined what “adequate review” meant before they deployed the tool. The tolerance conversation happened before the machine was turned on.

The question that has to come first

What I am working on with customers is getting that conversation to happen before the drift becomes the default.

Before any measurement framework means anything, one question has to be formally answered:

What constitutes adequate human review of AI-generated code in your environment?

Not which dashboard to build. That question first.