← Back to Computation and Language
cs.CL

Making robotic speech sound naturally conversational

Parshav Singla, Agnik Banerjee, Aaditya Arora, Shruti Aggarwal, Anil Kumar Verma, Vikram C M, Raj Prakash Gohil, Gopal Kumar Agarwal

May 18, 2026

Read speech—the kind produced by text-to-speech systems—lacks the intonation, stress, and rhythm variations that make conversation sound natural. This work applies deep neural networks to analyze and modify prosodic features, using HiFi-GAN for high-quality synthesis. Tested on multiple datasets and evaluated by listener preference (Mean Opinion Score), the method improves naturalness and accuracy over conventional approaches. Intended for virtual assistants, customer service bots, and language learning tools where computational efficiency matters.
Published as Bridging the Gap: Converting Read Text to Conversational Dialogue arXiv:2605.18001
Read the original paper →