Qwen TTS focuses on on-device processing with no external API; emotion control relies on precise prompts, shaping output ...
KittenTTS brings small text to speech models to edge devices; the Nano 8-bit model is about 25 MB, local playback is possible.
A duplex speech-to-speech model changes the premise: The intelligence layer consumes audio and produces audio directly. The model can attend to what was said and how it was said—content and delivery ...