← Back to Machine Learning cs.LG
Can voice assistants learn to match your emotional state?
Sukru Samet Dindar, Riki Shimizu, Xilin Jiang, Nima Mesgarani
May 30, 2026
Voice assistants today ignore emotional context entirely. Sympatheia conditions speech synthesis on continuous emotion signals (valence and arousal) inferred from the user's voice or fused from facial, heart-rate, and text cues. Trained on 18k dialogue pairs with 12 emotion anchors, it generates responses that match both semantic intent and emotional tone—especially useful when speech alone sounds neutral. The system works across multiple sensing modalities, making it practical for real assistants.
Read the original paper →