The AIs are moving fast, very fast, ready to swallow up our last atom. Before getting there, Vall-E, AI from Microsoft, is now able to imitate a voice in three seconds. Functioning only in English for the moment, it already asks many questions, starting with the performance of the deepfakes which will inevitably be more and more disturbing, already joining to a digital image almost in conformity with the original, the right tone of voice , without hitches or jerky effects. Bluffing.
And if James Dean became ?
Speech generation model from text (text-to-speech synthesis or TTS), Vall-E requires a written text, the one that will be declaimed virtually, and a voice model to imitate, regardless of the content. Far from being robotic, the rendering resulting from an AI training from 60,000 hours of recording in English with 7,000 different people, is downright astonishing. Judge for yourself with the Vall-E demo posted by Microsoft here.
The opportunities for such AIs are immense of course, and in all areas. But are we really looking forward to discovering the very first virtual James Dean or Marilyn more real than life in the cinema? Not sure… Sources: Capital / Arvix