Running personal AI assistants entirely on your own device

Jon Saad-Falcon, Avanika Narayan, Robby Manihani, Tanvir Bhathal, Herumb Shandilya, Hakki Orhun Akengin, Gabriel Bo, Andrew Park, Matthew Hart, Caia Costello, Chuan Li, Christopher Ré, Azalia Mirhoseini

Personal AI assistants today send most queries to cloud services like Claude, raising privacy and cost concerns. Simply swapping in open local models causes 25–39 percentage point accuracy drops. OpenJarvis decomposes the AI stack into five independent, optimizable primitives: Intelligence, Engine, Agents, Tools & Memory, and Learning. The system uses frontier cloud models to propose edits to this specification at design time, accepting only non-regressive changes, then runs entirely on-device at inference. On eight benchmarks, on-device specs match or exceed cloud accuracy on four, with an average gap of just 3.2 percentage points, while cutting API costs by ~800× and end-to-end latency by 4×. Code and models are being released.