Jacob Johnson, Aves Lair
n late 2017, a Reddit user released a series of synthetic videos containing celebrity likenesses. Since then, deepfake technology has exploded in popularity as people speculate over its future applications. Concerns over the tech’s potential for political disinformation and unauthorized pornographic content have led to the implementation of regulations surrounding its use. Simultaneously, innovators and deepfake software startups are scrambling to find ways we can use the tech to revolutionize commercial industries.
The non-pornographic content that the reddit user released in late 2017 and
early 2018 features actor Nicolas Cage’s face swaped into different movies he
Deepfake technology refers to audiovisual content generated by deep learning AI systems. By using a system of neural networks to analyze patterns across datasets, we can use the tech to manipulate, alter and synthesize media. Popular applications of the tech include the face-swapping videos that have flooded social media, but several companies have recognized its potential for professional industries, including retail, customer service, and media.
Modulate’s speech-analyzing software uses machine learning to synthesize Voice Skins: vocal filters of famous characters and celebrities. Unlike traditional voice synthesizers that rely on text-to-speech tech, Modulate’s Voice Skins synthesize audio in real-time, allowing for more seamless conversations. More importantly, because your real voice is synthesized, the Vocal Skins convey the full range of emotional nuance. In other words, they sound real.
CEO of Modulate.ai, Mike Pappas, demonstrating the company’s voice skin
Modulate’s Voice Skins take gaming to the next level, enabling a more immersive experience with our digital avatars. Furthermore, the synthesized voices help protect users’ identities, which is especially beneficial for women and minority gamers who are often the victims of cyberbullying. The watermarks within the Voice Skins safeguard the owners of borrowed voices against retribution from inappropriate use and will soon include user IDs for verification purposes.
Voice Skins are bound to be a game-changer for the gaming industry, but the tech has a wide range of other potential applications. Last year, the startup earned $2 million from investors including 2Enable Partners and Hyperplane Venture Capital. Later, CEO Mike Pappas released a statement asserting that the tech’s utility surpasses gaming and could enhance immersion across all digital communities, advocating its potential for “far-reaching effects across almost every almost industry.”
Respeecher uses both classical digital signal processing algorithms and proprietary deep generative modeling techniques to “create speech that’s indistinguishable from the original speaker.” After analyzing a person’s voice, their software creates an exact replica that can later be synthesized over other voices. They can even recreate the voices of historical figures, given enough sample data.
Respeecher’s Chief Research Officer, Grant Reaber, demonstrating the company’s
voice cloning technology
Like Modulate, Respeecher prides themselves on their Voice Clones’ quality and capacity for emotional nuance. However, the companies differ in their target demographics. Whereas Modulate’s Voice Skins cater to the gaming industry, Respeecher’s Voice Clones have the potential to revolutionize the film and animation industries.
Like most talent-based industries, film and animation studios are subject to human error. Too often, voice actors quit (or die, unfortunately), leaving half-finished projects, but with Respeecher’s Voice Clones, this will no longer be an issue. Their product also benefits the actors themselves. Instead of recording their lines, they can simply submit a sample of their voices, saving both time and effort and possibly reducing costs.
Respeecher’s Voice Clones have garnered attention from powerful figures across various industries. In addition to collaborating with one of Hollywood’s top film studios, they’ve also raised an estimated $1.5 million in funding from investors including ff Venture Capital and Acrobator Ventures. Later, the company indicated plans to branch into the content creation and customer service industries and hinted at the tech’s adoption by call centers.
Rephrase.ai’s software creates fully AI-driven presentation videos. Their deep learning engine analyzes the facial movements and expressions that accompany speech to generate photorealistic faces for text or audio. While creating a video, Rephrase.ai users either select from a panel of onboarded public presenters or create their own. Users have complete control over their custom presenters and can extend access to other companies and organizations.
The customizability of Rephrase.ai’s text-to-video tech enables the creation of highly-personalized sales and marketing materials. Their software is currently being used by real estate, automotive and financial institutions.
Rephrase.ai’s text-to-video demo
Rephrase.ai impressed venture capitalists like AV8 Ventures and Lightspeed Venture Partners, earning the startup $1.5 million in funding. While investors praised the tech’s functionality, others expressed concerns over its threat to the broadcast industry. However, CEO Ashray Malhotra assures news anchors that their jobs are safe for at least the next two years “depending on the AI progress.”
In the face of privacy concerns around the rising use of facial-recognition software, the specialists at D-ID developed a solution. Their de-identification software uses AI and deep learning to remove identifiable facial features without compromising key attributes such as age, gender or emotion. Applicable to both images and videos, D-ID’s Smart Anonymization software provides top-of-the-line security for biometric databases.
D-ID’s Smart Anonymization guarantees adherence to the EU’s strict regulations of sensitive personal data. In addition to enabling the use of publicly available video data for analytics, it also reduces security and remediation costs by removing the risk of identification — even in the event of a data breach.
The company’s new facial reenactment solution, however, is shifting to autonomous video production generated by filming a “driver video”, allowing users to “use driver videos to control and modify facial expressions and movements in pictures and video footage”. Is the company ready to further expand into the marketing, sales, and even the entertainment industry?
As one of the most well-funded deepfake startups, D-ID’s innovative tech earned recognition from investors including AXA Venture Partners, Pitango Venture Capital, Y-Combinator, AI Alliance, Hyundai Motor Company, OMRON Ventures, Maverick Ventures (U.S.), Mindset Ventures and Redds Capital, who collectively invested over $13 million in part of the startup’s round A fundraising. However, privacy proponents expressed concern over the nature of the tech, arguing that it protected corporate interests rather than sensitive data itself.
Topaz Labs uses machine learning to breathe new life into old media. Their Gigapixel AI software analyzes millions of pictures to deepen its understanding of the relationships between pixels, allowing it to enlarge images and allowing users to reenact their favorite scenes from Law & Order.
Enlarge photos with Gigapixel AI
Gigapixel AI’s functionality goes beyond enlarging pictures. It can also restore low-resolution pictures, upscale compressed images by 600% without sacrificing photo quality, and its automated face refinement features ensure the subject always stays in focus.
Law & Order aside, law enforcement officials have expressed interest in Gigapixel AI’s potential for investigations. However, because the software specializes in scenery and landscapes, it’s currently marketed toward photographers along with Topaz Labs’ other image-editing tools.
The applications of deepfake software developed by the companies listed above suggest we’re on the cusp of another technological revolution. Soon, we may find ourselves in a futuristic society where Rephrase.ai’s AI-presenters provide technical support using Vocal Skins or Voice Clones to sound like our favorite celebrities.
Clearly, deepfake technology has more uses than creating fake videos of public officials. Voice synthesizers like those offered by Modulate and Respeecher could revitalize the audiobook industry. With Gigabit AI, you can convert pictures taken on your smartphone into full-sized prints. The utilizations for deepfake, machine learning and artificial intelligence are endless, and by studying their current applications, we can peek into the future of technology.
This article is originally published on Aves Lair official Hacker Noon page.
Spring 2022 Demo Day on January 26th at 8PM EST