The results aren’t 100% convincing, but it’s a sign of things to come
By James Vincent | 24 April 2017
THE VERGE —
Artificial intelligence is making human speech as malleable and replicable as pixels. Today, a Canadian AI startup named Lyrebird unveiled its first product: a set of algorithms the company claims can clone anyone’s voice by listening to just a single minute of sample audio.
A few years ago this would have been impossible, but the analytic prowess of machine learning has proven to be a perfect fit for the idiosyncrasies of human speech. Using artificial intelligence, companies like Google have been able to create incredibly life-like synthesized voices, while Adobe has unveiled its own prototype software called Project VoCo that can edit human speech like Photoshop tweaks digital images.
But while Project VoCo requires at least 20 minutes of sample audio before it can mimic a voice, Lyrebird cuts this requirements down to just 60 seconds. The results certainly aren’t indistinguishable from human speech, but they’re impressive all the same, and will no doubt improve over time. Below you can hear the synthesized voices of Donald Trump, Barack Obama, and Hillary Clinton discussing the startup:
Lyrebird says its algorithms can also infuse the speech it creates with emotion, letting customers make voices sound angry, sympathetic, or stressed out. The resulting speech can be put to a wide range of uses, says Lyrebird, including “reading of audio books with famous voices, for connected devices of any kind, for speech synthesis for people with disabilities, for animation movies or for video game studios.” It takes quite a bit of computing power to generate a voice-print, but once done, the speech is easy to make — Lyrebird can create one thousand sentences in less than half a second. […]