← Back to Computation and Language cs.CL
Making robotic speech sound naturally conversational
Parshav Singla, Agnik Banerjee, Aaditya Arora, Shruti Aggarwal, Anil Kumar Verma, Vikram C M, Raj Prakash Gohil, Gopal Kumar Agarwal
May 18, 2026
Read speech—the kind produced by text-to-speech systems—lacks the intonation, stress, and rhythm variations that make conversation sound natural. This work applies deep neural networks to analyze and modify prosodic features, using HiFi-GAN for high-quality synthesis. Tested on multiple datasets and evaluated by listener preference (Mean Opinion Score), the method improves naturalness and accuracy over conventional approaches. Intended for virtual assistants, customer service bots, and language learning tools where computational efficiency matters.
Read the original paper →