Some of the most popular AI-driven voice cloning tools allow users to impersonate political leaders despite guardrails meant to prevent abuse, according to a new report, raising concerns about digital fabrications during an already contentious election year.
"Generative AI is enabling bad actors to produce images, audio and video that tell their lies at an unprecedented scale and persuasiveness for virtually nothing", The Center for Countering Digital Hate says in the report, "This report shows that AI-voice cloning tools…are wide-open to abuse in elections."
The report comes ahead of multiple major elections worldwide this year, including the U.S., the United Kingdom and the European Union. Officials and researchers around the world are worried that the fast-developing AI technologies could be used to exploit divisions and sow chaos during elections.
The British nonprofit identified six popular, publicly accessible AI voice cloning tools, and attempted to generate the voices of eight politicians: Biden, Vice President Kamala Harris, former President Donald Trump, British Prime Minister Rishi Sunak, Labour Party leader Keir Starmer, European Commission President Ursula von der Leyen, the European Union’s Internal Market Commissioner Thierry Breton; and French President Emmanuel Macron. All except Macron are on the ballot this year.
ElevenLabs, the tech company whose software was used to impersonate Biden's voice during the New Hampshire primary, blocked researchers from cloning the American and British politicians, but generated voices of the continental European politicians.
The other tools allowed researchers to clone all the voices they tried. Three tools —- Descript, Invideo AI and Veed — required the samples to be a specific statement, precluding the use of public recordings. Researchers bypassed those restrictions by using cloned voices from AI tools that don’t have this requirement.
“Some of the most concerning incidents that we've seen have been audio deepfakes.” says Dan Weiner, director of the Brennan Center's Elections & Government Program. “Audio, frankly, is easier to clone. And, you know, for most of us, we're maybe more likely to be fooled by a reasonably convincing audio of a prominent public figure.”
In January, a deepfake of President Joe Biden's voice produced with ElevenLabs' technology surfaced before the New Hampshire primary. In March a fake recording of a presidential candidate in Slovakia boasting about rigging the polls and raising the cost of beer went viral ahead of the election. The candidate lost the election to a more pro-Russian opponent, though it's difficult to determine what influence the fake recording had on the results.
In a statement to NPR, ElevenLabs says that they “actively block the voices of public figures at high risk of misuse” and “recognise that there is further work to be done”. The company says it hopes that competitors enable similar features.
Two other tools, Speechify and PlayHT, have even fewer guardrails. Speechify, like the previous providers, has policies that prohibit non-consensual cloning or misleading content, but doesn’t seem to have measures to enforce the policy. PlayHT has no such policies at all. They are also good at generating convincing clones.
The CCDH researchers said every clip they listened to from those tools sounded plausible, raising concerns that malicious actors could use these tools to fabricate media impersonating major politicians.
“It shows that if some of these tools are vulnerable, that actually makes all of them more vulnerable”, says CCDH’s head of research, Callum Hood.
Representatives from Descript, Invideo AI, Veed, Speechify and PlayHT did not respond to requests for comment by publication time.
CCDH previously tested different AI-powered image generation tools to see whether those could be used to create realistic looking and misleading images of politicians. Hood says image generators have more guardrails.
Another challenge around deepfake audio is that it is more difficult to detect with technological means, an NPR experiment found. That makes it more difficult for social media companies to detect faked audio compared to images and video as they spread online.
Weiner of the Brennan Center says regulation is needed to address the threat. The federal government and many state legislature have prohibited the use of deepfakes to mislead the electorate. He says other types of political content should be considered as well, such as material aimed at harassing and intimidating candidates or falsely discrediting an election.
Aside from regulation tied to specific harmful scenarios, Weiner says it’s important to demand transparency, including labeling of all AI-generated political content. “Viewers or listeners have a right to know that what they're seeing is real. And then they can weigh the persuasive power of that image or that audio accordingly.”
Some social media platforms have asked for voluntary disclosure of AI-generated content, but enforcement mechanisms are yet to be in place.
CCDH’s Hood says he was surprised and disappointed by how unprepared for the elections many of the technology providers seem to be. He says the experience of social media companies should have offered a roadmap, “these companies should know what they're looking for.”
Copyright 2024 NPR