← Back to Machine Learning
cs.LG

Can voice assistants learn to match your emotional state?

Sukru Samet Dindar, Riki Shimizu, Xilin Jiang, Nima Mesgarani

May 30, 2026

Voice assistants today ignore emotional context entirely. Sympatheia conditions speech synthesis on continuous emotion signals (valence and arousal) inferred from the user's voice or fused from facial, heart-rate, and text cues. Trained on 18k dialogue pairs with 12 emotion anchors, it generates responses that match both semantic intent and emotional tone—especially useful when speech alone sounds neutral. The system works across multiple sensing modalities, making it practical for real assistants.
Published as Sympatheia: Emotionally Adaptive Voice Assistant with Continuous Affect Conditioning arXiv:2606.00851
Read the original paper →