← Back to Artificial Intelligence cs.AI
Why current AI safety testing cannot prove what regulators require
Pratinav Seth, Vinay Kumar Sankarapu
May 14, 2026
AI governance frameworks increasingly require evidence of properties like absent hidden objectives and bounded catastrophic capability, but every major assurance methodology today is limited to observable outputs. This position paper formalizes that mismatch as the 'audit gap' and coins 'fragile assurance' for cases where the evidence does not logically support the safety claim being made. An analysis of 21 governance instruments finds systematic pressure toward superficial behavioral proxies driven by geopolitical and commercial incentives. The authors propose capping the legal weight of behavioral evidence and supplementing it with mechanistic verification methods — linear probes, activation patching, and pre/post-training comparisons — that can inspect internal model structure.
Read the original paper →