The Consistency Tax

Abstract

Enterprise AI models increasingly report per-step accuracy at or above 99%. This paper demonstrates that such metrics create a dangerous illusion of reliability. We identify two compounding problems. First, we formalize the mental playground — the cognitive space where users unconsciously convert "99% accurate" into instance-level certainty, grounded in Sunstein's probability neglect (2002) and Kahneman & Tversky's base rate neglect (1973). Second, we derive the consistency tax (−σ²/2μ²) — a structural penalty for per-step variance that compounds linearly with pipeline length. Monte Carlo simulations (50,000 runs) confirm: a 99% accurate model traversing 20 enterprise workflow steps achieves 81.8% end-to-end success under deterministic conditions, but drops to 59.6% under high variance (σ=0.05). The 1% failure that users dismiss in their mental playground compounds into 40% failure across an enterprise pipeline.

Section 1

Introduction

A 99% accurate AI model sounds nearly perfect. In the mental playground of human cognition, 99% is functionally indistinguishable from 100%. The 1% failure probability is acknowledged intellectually but processed as zero operationally. This paper demonstrates that this cognitive shortcut, combined with a mathematical property of sequential stochastic processes, creates a reliability crisis invisible until it manifests at organizational scale.

Even under deterministic conditions (each step succeeds at exactly 99%), a 20-step enterprise pipeline yields only 0.99²⁰ = 81.8% end-to-end success. The 1% that seemed negligible has compounded into 18.2% failure. But real AI systems are not deterministic — they are stochastic. When we model this variance, a 99% accurate model with high variance (σ=0.05) drops to 59.6% at 20 steps. The 1% failure that users dismiss becomes 40% failure across an enterprise pipeline.

This paper argues that the deterministic assumption conceals two compounding problems.

The first is cognitive. We introduce the mental playground — the cognitive space where users unconsciously convert aggregate accuracy ("99% accurate") into instance-level certainty ("this output is correct"). Discovering that you believe 99% confidence is 100% confidence is extraordinarily difficult — precisely because you consciously know it's 99%. The gap occurs entirely within the mental playground, invisible to the user.

The second is mathematical. Real LLM systems are fundamentally stochastic. When per-step accuracy follows X_i ~ N(μ, σ²), the expected end-to-end accuracy includes a structural penalty −σ²/2μ² that we term the consistency tax.

These two problems reinforce each other. The mental playground causes users to ignore variance. The consistency tax ensures that ignored variance extracts a compounding mathematical toll. This dual blindness — not knowing the problem exists while the problem worsens — is a primary driver of the gap between enterprise AI adoption (88%) and performance (6%).

Section 2

The Mental Playground Problem

2.1 The Deterministic Privilege

The history of technology adoption reveals a widening gap between understanding and use. Users operate microwave ovens, light switches, and smartphones with no comprehension of underlying mechanisms. This is possible because these technologies are deterministic: the same input reliably produces the same output.

AI systems appear to operate identically. A user enters a prompt and receives a response. However, one critical difference exists: AI produces different outputs from identical inputs. The same prompt, submitted to the same model, yields meaningfully different responses across runs. This single property invalidates the deterministic mental model that users unconsciously apply.

2.2 Defining the Mental Playground

Definition

Mental playground — the unexamined cognitive space in which users form implicit expectations about AI outputs, accept or reject results without conscious probabilistic reasoning, and process stated accuracy metrics as instance-level certainty rather than aggregate-level distributions.

The mental playground has three defining properties: (1) invisible — users do not recognize they are making probabilistic assumptions; (2) non-falsifiable in real-time — any individual output could be correct; and (3) trust-reinforcing — each apparently successful interaction reduces the motivation to verify the next.

2.3 Probability Neglect in AI Contexts

Sunstein (2002) demonstrated probability neglect through experiments where subjects warned of an electric shock showed fear responses that varied with shock intensity but not with shock probability — even when probability ranged from 1% to 50%. People respond to the possibility of an outcome, not its likelihood.

In AI contexts, we observe the inverse form. Sunstein's subjects over-weighted negative possibilities regardless of probability. AI users under-weight the possibility of incorrect output regardless of probability. When told "99% accurate," users process each individual output as near-certain. The 1% failure is acknowledged intellectually but neglected operationally. When the annotation reads "99% confidence," verifying every result feels like waste.

Kahneman and Tversky's (1973) base rate neglect compounds this: users focus on the salient individuating information (the plausible AI response) while ignoring the base rate (the probability that any given output is incorrect).

2.4 Organizational Amplification

At the individual level, the mental playground is manageable. At the organizational level, containment breaks down. One person's AI output becomes another's input. The moment an unverified result becomes the premise for a subsequent decision, individual probabilistic error converts to system-level risk.

Each person in the chain applies their own mental playground independently. Verification responsibility diffuses across the organization while the underlying probabilistic error compounds across steps.

Section 3