Abstract: Prosody is a crucial speech feature in emotional text - to-speech (TTS), as different emotions have distinct prosodic characteristics. Existing works in emotional TTS have primarily utilized ...
AI-powered noise suppression for real-time audio processing with LiveKit. Based on the DeepFilterNet paper and implementation by Rikorose.
Abstract: A high-quality enrollment speech is crucial to target speaker extraction (TSE), since it provides essential cues for identifying the target speaker in the mixture. However, real applications ...