Two years ago, a mystery disease took away longtime radio reporter Jamie Dupree's ability to speak.
This month, Dupree is back on the radio.
The voice going out over WSB radio (News 95.5 and AM 750 WSB) and stations in five other cities has been “cloned” from old tapes of Dupree’s broadcasts by a Scotland-based voice-synthesis company called CereProc.
Here’s how it works:
Harvesting a new voice
CereProc (the name comes from “cerebral processing,” but with a hard “c”) usually has its clients step into a recording studio and read a list of 620 sentences that include every sound in the English language.
Dupree couldn’t do that. In 2016, he a developed a rare condition called tongue protrusion dystonia, which makes it impossible to put together more than a few words.
Luckily Dupree has kept hours and hours of tapes of his old broadcasts. CereProc combed through about eight hours of broadcasts and harvested the sounds it needed to create what they call a parametric clone.
To do that, the company’s computers analyzed the recordings and came up with the statistical parameters of the frequencies that make up Dupree’s voice.
Using a Vocoder (familiar to fans of Kraftwerk), CereProc built a clone of his voice. The sound is synthesized, but the distinctive harmonics that make the voice recognizable as Dupree are there. Using any text-to-speech software, Dupree can create a .wav file or an MP3, and his story will be pronounced in his cloned voice, which he is calling "Jamie Dupree 2.0."
How Siri was born
A more expensive process, called “unit selection,” is also offered by CereProc. The client records a series of sentences that put every sound in the English language adjacent to every other sound in the English language.
This requires about 30 hours of recitation, and produces the kind of artificial voice that we’ve come to recognize every day when we talk to Siri or Alexa.
In 2005, Atlantan Susan Bennett, the original voice of Siri, spent a month in a recording studio, pronouncing an array of often meaningless words, such as "Cathexis fefatelly sexual ease stump."
These words were then diced into their component sounds, which are recombined or “concatenated” — on the fly — to make any of a million words in the English language. The result is more natural sounding, because the sounds themselves are digital samples of a voice, rather than a synthetic re-creation.
Lost Voice Guy
Synthetic voices have been used for digital assistants and for those who have lost the ability to speak from cancer or ALS.
British comedian Lee Ridley, unable to speak because of cerebral palsy, used a text-to-speech program to compete on "Britain's Got Talent," and won. He's now known as Lost Voice Guy. (Since he's from Newcastle, the folks at CereProc were interested in upgrading his software with a Geordie accent.)
Dupree’s parametric voice lacks some of the naturalness of Siri, and has a tendency to end sentences abruptly, but there are ways that he can improve it.
Using XML language, he can tweak certain words and phrases and soften the ends of sentences.
Jamie Dupree 2.0
How will Jamie Dupree use his new voice?
Dupree covers national politics for the Cox Media Group (which includes The Atlanta Journal-Constitution and is a part of Cox Enterprises). For 30 years, Atlanta audiences have heard his voice on WSB radio, reporting from his cubbyhole on Capitol Hill.
He will continue to type up reports on Congress and the White House, and Jamie Dupree 2.0 will read them. Other uses are problematic. “It probably isn’t going to work for the longer Q&A segments with our morning show and talk show hosts,” he said, in a text-only interview, “but I fully believe this type of software solution can only improve as we move forward.”
Dupree added, “My hope is that other companies will continue to expand their work in this arena, and maybe even better solutions will become available. Of course, option #1 is to get my real voice back — but that is not going to happen soon.”
About the Author