The next steps in the evolution of AI Generated synthetic voices

Peter Saxon asks Raoul Wedel: How will this end?

The concept of synthetic voices generated by Artificial Intelligence (AI) has been around since 1965 when Robby the Robot came from the forbidden planet and 20 years later Max Headroom starred in his own TV show and in ads for New Coke.

Robby the Robot from Forbidden Planet

Both fictional android characters, Robby and Max, were played by real actors giving their interpretation of what an artificially generated voice might sound like, sometime in the future.

Well, that future has arrived. Synthetic Voices like Alexa and Siri are used extensively in smart speakers as well as in car Satnavs and much more.

Now, the people at the Adthos Creative Studio have taken the whole concept to a much higher plane offering broadcast quality audio ads using AI generated voice technology.

As you are probably aware by now, Adthos is a major advertiser with radioinfo, which means you probably also know that they’ve created a readymade campaign for radio stations focused on encouraging vaccine uptake, which can be downloaded and used for free, covering 6,500 cities in 40 countries, and in 70 languages and dialects.

As exciting as all that sounds, one can’t help but feel that the better, the more realistic synthetic voiceovers become, there could be some less positive outcomes arise from it.

Will AI make voice artists redundant? Is this how Big Brother will finally take over?

To answer those and other questions about the ethics of AI is the man behind all the high tech, Raoul Wedel, founder of Wedelsoft Software, the organisation behind the Adthos brand.

I asked him straight out: Now that thousands of stations and production houses worldwide have ahd a chance to sample the product, have you had any negative feedback regarding the ethics involved?

“There has, and to be honest, it greatly varies by the country, the ad itself, and also by the voiceover artist, of course, because they’re still human beings who have different opinions and different views.

“One voiceover that we recorded is actually a major iHeart talent in the US. And he did many shows on KISS-FM Boston while located in Los Angeles, and he thought it was the coolest thing ever. He said, ‘You know, this is going to happen anyway. There’s no way that I’m going to change that. So, I might as well just take it and run with it.”

There’s a myriad application for AI that can “sample” a voice and generate audio from a script.

As Mr Wedel explains, “Our system comes with a set of default voices. If you’re a brand, let’s say McDonald’s and you have 400 locations in your country and you want to send every local station a different version of an ad with the local address of the local franchise, then we can take their regular corporate voice that does all their ads on TV and radio. We can put that voice into the system and create 600 versions. And the franchisees could potentially change that too. Ton the highest end on the spectrum of what we’re doing.”

Does the voice get paid more for the coverage, even though they don’t have to physically read each individual version?

“Sure, they do. It’s different, though, from market to market. There are great differences because the smaller the market is, the more afraid the talent is of competing with an AI version of themselves in their own market, because then potentially they’re going to hear their voice on every station and then nobody’s going to book them in real life anymore, especially if the talent pool is small.

“We have two fee models for voice artists. We have a buyout model, where we give them a one-time fee and we can use their voice for it in smaller markets. We also have an option where the talent’s paid royalties.”

I saw a meme once on a car salespersons desk that said: ‘The key to salesmanship is sincerity. Once you learn to fake that, you’ve got, it made.’

It was a joke, of course. But will there be a time that AI will include the full gamut of nuanced emotions so that you won’t be able to tell the difference a synthesised voice and a real or ‘analogue’ one?

“You can compare this with the first music synthesisers. It’s kind of the same thing. And when those first synthesisers came along and you heard a string orchestra, it didn’t sound very much like a string orchestra. But these days, most professional musicians wouldn’t be able to tell the difference between a synthesised violin and a real violin because the quality has improved so, so fast.

“This is growing extremely fast. The technology provider we’re working with has updates coming all the time on a day-by-day basis.”

There’s much more to synthetic voices than radio and audio advertising. They’re developing text to speech engines to record audio books, for example.

Mr Wedel says, “I personally think that in five years time, there’s not going to be a Starbucks or a McDonald’s that has a real person taking an order on the speaker of the Drive-Thru. The technology is there and it’s going to be very quick.”

Peter Saxon

Tags: | | |