Thanks to the advance of deepfake technology, it’s becoming easier to clone peoples’ voices. Some uses of the tech, like creating voice-overs to fill in gaps in Roadrunner, the documentary about Anthony Bourdain released this past summer, are harmless (though even the ethics of this move were hotly debated when the film came out). In other cases, though, deepfaked voices are being used for ends that are very clearly nefarious—like stealing millions of dollars.
An article published last week by Forbes revealed that a group of cybercriminals in the United Arab Emirates used deepfake technology as part of a bank heist that transferred a total of $35 million out of the country and into accounts all over the world.
Money Heist, Voice Edition
All you need to make a fake version of someone’s voice is a recording of that person speaking. As with any machine learning system whose output improves based on the quantity and quality of its input data, a deepfaked voice will sound more like the real thing if there are more recordings for the system to learn from.
In this case, criminals used deepfake software to recreate the voice of an executive at a large company (details around the company, the software used, and the recordings to train said software don’t appear to be available). They then placed phone calls to a bank manager with whom the executive had a pre-existing relationship, meaning the bank manager knew the executive’s voice. The impersonators also sent forged emails to the bank manager confirming details of the requested transactions. Between the emails and the familiar voice, when the executive asked the manager to authorize transfer of millions of dollars between accounts, the manager saw no problem with going ahead and doing so.
The fraud took place in January 2020, but a relevant court document was just filed in the US last week. Officials in the UAE are asking investigators in the US for help tracing $400,000 of the stolen money that went to US bank accounts at Centennial Bank.
Our Voices, Our Selves
The old-fashioned way (“old” in this context meaning before machine learning was as ubiquitous as it is today) to make a fake human voice was to record a real human voice, split that recording into many distinct syllables of speech, then paste those syllables together in countless permutations to form the words you wanted the voice to say. It was tedious and yielded a voice that didn’t sound at all realistic.
It’s easy to differentiate the voices of people close to us, and to recognize famous voices—but we don’t often think through the many components that contribute to making a voice unique. There’s the timbre and pitch, which refer to where a voice falls on a span of notes from low to high. There’s the cadence, which is the speaker’s rhythm and variations in pitch and emphasis on different words or parts of a sentence. There’s pronunciation, and quirks like regional accents or lisps.
In short, our voices are wholly unique—which makes it all the more creepy that they’re becoming easier to synthetically recreate.
Fake Voices to Come
Is the UAE bank heist a harbinger of crimes to come? Unfortunately, the answer is very likely yes. It’s not the first such attempt, but it’s the first to succeed at stealing such a large sum of money using a deepfaked voice. In 2019 a group of criminals faked the voice of a UK-based energy firm’s CEO to have $243,000 transferred to a Hungarian bank account.
Many different versions of audio deepfake software are already commercially available, including versions from companies like Lyrebird (which needs just a one-minute recording to create a fake voice, albeit slightly halting and robot-like), Descript, Sonantic, and Veritone, to name just a few.
These companies intend their products to be used for good, and some positive use cases certainly do exist; people with speech disabilities or paralysis could use the software to communicate with those around them, for example. Veritone is marketing its software for use by famous people who may want to license their voices for things like product endorsements. Sonantic recently created a voice clone for Val Kilmer, whose voice was damaged from his battle with throat cancer. Recording audiobooks or news podcasts could also be a productive application of the technology, as right now a person either has to read aloud for hours or the listener gets a computerized artificial voice that’s not very pleasant to listen to.
Other companies are already using AI to fight back against AI; Microsoft’s Video Authenticator, released a little over a year ago, analyzes videos and images and tells users the percentage chance that they’ve been artificially manipulated. Similarly, the AI Foundation’s Reality Defender uses synthetic media detection algorithms to identify fake content. Facebook, Twitter, and YouTube have all taken steps to try to ban and remove deepfakes from their sites.
But this technology is only going to become more sophisticated, and across every realm: voice, image, and video. Fighting technology with more or better technology may be one of our best hopes, but it’s also important to raise awareness of deepfakes and instill a wide-ranging sense of skepticism in people around content they see online.
Let’s just hope the UAE bank heist incident instills a similar skepticism in people who work at banks, so that deepfaked voices helping fraudsters steal money doesn’t become a more common occurrence.