How hackers trick audio AI into ignoring safety guardrails

Large audio-language models face a new class of jailbreak attacks where unsafe behavior can be triggered through speech semantics, acoustic manipulation, signal artifacts, or embedding-layer tricks—not just prompt injections. Researchers tested ten open-source models with 40+ attack-defense combinations, finding that acoustic noise-based attacks and semantic framing tricks succeed at breaking safety guardrails while current defenses either fail or make the systems refuse benign requests. The work reveals a practical trade-off: robust safety comes at the cost of usability.