Voice mimicking AI dupes Alexa and other voice recognition devices

Deepfakes (a portmanteau of “deep learning” and “fake“) are synthetic media in which a real person’s pictures, videos, or speech are converted into someone else’s (often a celebrity’s) artificial, AI-generated likeness. You may have come across some on the internet before, such as Tom Cruise deepfakes on Tik Tok or Joe Rogan voice clones.

While image and video varieties are more convincing, the impression was that audio deepfakes have lagged behind — not without copious amounts of training audio, at least. But a new study serves as a wake-up call, showing that voice copying algorithms that are easy to find on the internet are already pretty good. In fact, the researchers found that with minimal amounts of training, these algorithms can fool voice recognition devices, such as Amazon’s Alexa.

Researchers at the University of Chicago’s Security, Algorithms, Networking and Data (SAND) Lab tested two of the most popular deepfake voice synthesis algorithms — SV2TTS and AutoVC — both of which are open-source and freely available on Github.

The two programs are known as ‘real-time voice cloning toolboxes’. The developers of SV2TTS boast that only five seconds’ worth of training recordings are enough to generate a passable imitation.

The researchers put both systems to the test by feeding them the same 90 five-minute voice recordings of different people talking. They also recorded their own samples from 14 volunteers, who were asked for permission to see whether the computer-generated voices could unlock their voice recognition devices, such as Microsoft Azure, WeChat, and Amazon Alexa.

SV2TTS was able to trick Microsoft Azure about 30 percent of the time but got the best of both WeChat and Amazon Alexa almost two-thirds, or 63 percent, of the time. A hacker could use this to log into WeChat with a synthetic vocal message mimicking the real user or access a person’s Alexa to make payments to third-party apps.

AutoVC performed quite poorly, being able to fool Microsoft Azure only 15 percent of the time. Since it fell short of expectations, the researchers didn’t bother to test it against WeChat and Alexa voice recognition security.

In another experiment, the researchers enlisted 200 volunteers who were asked to listen to pairs of recordings and identify which of two they thought was fake. The volunteers were tricked nearly half the time, which made their judgments no better than a coin toss.

The most convincing deepfake audios were those mimicking women’s voices and those of non-native English speakers. This is something that researchers are currently looking into.

‘We find that both humans and machines can be reliably fooled by synthetic speech and that existing defenses against synthesized speech fall short,’ the researchers wrote in a report posted on the open-access server arXiv.

‘Such tools in the wrong hands will enable a range of powerful attacks against both humans and software systems [aka machines].’

In 2019, a scammer performed an ‘AI heist’, using deep fake voice algorithms to impersonate a German executive at an energy company and convince employees to wire him $240,000. According to the Washington Post, the person who performed the wire transfer found it odd that their boss would make such a request, but the German accent and familiar voice heard over the phone was convincing. Cybersecurity firm Symantec says it has identified similar cases of deepfake voice scams that resulted in losses in the millions of dollars.

Voice mimicking AI dupes Alexa and other voice recognition devices

A Dutch 17-Year-Old Forgot His Native Language After Knee Surgery and Spoke Only English Even Though He Had Never Used It Outside School

Your Brain Hits a Metabolic Cliff at 43. Here’s What That Means

Scientists Just Found a Hidden Battery Life Killer and the Fix Is Shockingly Simple

Westerners cheat AI agents while Japanese treat them with respect

Scientists Turn to Smelly Frogs to Fight Superbugs: How Their Slime Might Be the Key to Our Next Antibiotics

This Popular Zero-Calorie Sugar Substitute May Be Making You Hungrier, Not Slimmer

Any Kind of Exercise, At Any Age, Boosts Your Brain

A Brain Implant Just Turned a Woman’s Thoughts Into Speech in Near Real Time

Using screens in bed increases insomnia risk by 59% — but social media isn’t the worst offender

We Should Start Worrying About Space Piracy. Here's Why This Could be A Big Deal

Voice mimicking AI dupes Alexa and other voice recognition devices

Related Posts

A Dutch 17-Year-Old Forgot His Native Language After Knee Surgery and Spoke Only English Even Though He Had Never Used It Outside School

Your Brain Hits a Metabolic Cliff at 43. Here’s What That Means

Scientists Just Found a Hidden Battery Life Killer and the Fix Is Shockingly Simple

Westerners cheat AI agents while Japanese treat them with respect

Scientists Turn to Smelly Frogs to Fight Superbugs: How Their Slime Might Be the Key to Our Next Antibiotics

This Popular Zero-Calorie Sugar Substitute May Be Making You Hungrier, Not Slimmer

Any Kind of Exercise, At Any Age, Boosts Your Brain

A Brain Implant Just Turned a Woman’s Thoughts Into Speech in Near Real Time

Using screens in bed increases insomnia risk by 59% — but social media isn’t the worst offender

We Should Start Worrying About Space Piracy. Here's Why This Could be A Big Deal