← Back to Artificial Intelligence
cs.AI

Why current AI safety testing cannot prove what regulators require

Pratinav Seth, Vinay Kumar Sankarapu

May 14, 2026

AI governance frameworks increasingly require evidence of properties like absent hidden objectives and bounded catastrophic capability, but every major assurance methodology today is limited to observable outputs. This position paper formalizes that mismatch as the 'audit gap' and coins 'fragile assurance' for cases where the evidence does not logically support the safety claim being made. An analysis of 21 governance instruments finds systematic pressure toward superficial behavioral proxies driven by geopolitical and commercial incentives. The authors propose capping the legal weight of behavioral evidence and supplementing it with mechanistic verification methods — linear probes, activation patching, and pre/post-training comparisons — that can inspect internal model structure.
Published as Position: Behavioural Assurance Cannot Verify the Safety Claims Governance Now Demands arXiv:2605.15164
Read the original paper →