Have you noticed how bizarre social media and the news cycle has been lately?
In the age of digital media, journalism is changing significantly. Widely available storytelling and distribution tools, misinformation spreading like wildfire, social media filter bubbles—headlines and stories are increasingly vying for attention, plastered across a smorgasbord of platforms. Can media get any stranger? Without a doubt.
The videos we watch and podcasts we listen to may themselves soon be seamlessly manipulated, distorting the truth in new ways. Photoshop was just the beginning. Advanced media creation tools today are cheaper than ever, and innovative tech is accelerating the bleeding edge, further blurring the line between fantasy and reality.
One of the latest developments was introduced last week at Adobe Max conference in San Diego. Engineered to make audio editing easier, Adobe’s Project VoCo allows users to edit voices by rearranging words or saying phrases never actually recorded—all via typing.
The software requires a minimum 20 minutes of recorded talking to do its magic. Then you can make an edited or brand new snippet of speech. In a text box below a visualization of the audio, you can copy/paste or type whatever you want. In a playful demo, Adobe presenter Zeyu Jin jokes around with comedian Jordan Peele by using the software to make him speak falsehoods.
In short, this is the audio version of Photoshop—the ability to create something from nothing. A new generation of “sound-shopping,” à la photoshopping, has been born.
Watch the video below:
On the surface, many immediate practical applications like dialogue editing for video will become much easier. Gamers can also benefit from characters whose dialogue is more flexible instead of defaulting to whatever the designers initially wrote. And voice interfaces—like Siri or Alexa—are likely to sound more nuanced too.
But while the tone of the presentation was playful, the dark side of Project VoCo is hard to ignore, and Jin didn’t hesitate to share the negative implications. To combat misuse, he said Adobe is working on forgery prevention, using watermarks to distinguish between real or fake. It’s also worth noting the tool isn’t publicly available, as the project is still under development.
Still, it won’t be too long until such tools are available.
Video and sound manipulation isn’t new, as anyone who’s ever seen a Hollywood film can attest. What’s new is the affordability of such tools and the scale they can achieve nowadays versus expensive and complicated software workflows of the past.
Anyone with a relatively affordable computer, hardware, and access to the internet theoretically could do what once only major post-production studios could achieve.
Software alone won’t devalue big budget Hollywood filmmaking—we can never seemingly have enough grandiose destruction in films these days—but it will make user-generated content easier to produce at a much higher quality than previously imaginable. The future of media has already arrived, but distribution may be much more bottom-up than the top-down many have come to expect, as more new tools roll out and greater numbers of people learn to use them.
Fake audio is only one facet of the larger trend of audiovisual distortion emerging. Video facial manipulation via Stanford’s Face2Face has shown promising results, and the software is similarly aimed at mass distribution. Beyond faces, Interactive Dynamic Video provides the ability to manipulate physical objects onscreen using software with shocking results.
And there’s more: a newly developed machine learning algorithm can convert still images into mini videos, and it doesn’t even require video or audio at all, just a still image. Last but not least, we continue to see major advances in gaming graphics too.
Each of these tools on its own isn’t necessarily so harmful, but their convergence has huge implications. When computers are translating languages as well as humans and chatbots are becoming tools for communicating with dead friends and relatives, piecing all these tools together is the magic glue that could one day create believable avatars of real, non-living or entirely fake personalities who can speak every language, personalize every one-on-one interaction, and perform something different to a new audience every time.
What does it all mean?
In an increasingly post-truth media landscape, the end-users arguably suffer the most. And as we’re inching toward a new era of content manipulation, the ability to screen the information you receive—and the source content itself—may be a necessary skill required anytime one chooses to navigate the online world.
The good news is the democratization of all these capabilities is leveling the playing field like never before. And with each new possibility of misuse, we’re also learning how to fight harder to make sure these tools are used for good.
How will you use all this new tech once it becomes widely available?
Image credit: Shutterstock