An interesting paper from Microsoft:
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
https://valle-demo.github.io/
ABS: Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work.
#ai #language_model #speech_synthesis #vall_e #tts #neural_networks
#ai #language_model #speech_synthesis #vall_e #tts #neural_networks