Columbia University researchers propose ‘neural voice camouflage’: an approach based on adversarial attacks that disrupts automatic speech recognition systems in real time

This Article is written as a summay by Marktechpost Staff based on the Research Paper 'REAL-TIME NEURAL VOICE CAMOUFLAGE'. All Credit For This Research Goes To The Researchers of This Project. Check out the paper and blog post.

Please Don't Forget To Join Our ML Subreddit

Have you ever had the unpleasant feeling that someone is listening to your every word? This is because it may be true. Since the dawn of time, companies have used “bossware” to listen to their employees when they are near their computers. Several “spyware” applications are available to record phone calls. Automatic speech recognition models like Amazon’s Echo and Apple’s Siri can record your daily conversation based on voice commands. To solve this critical problem, a group of researchers at Columbia University developed a new method called Neural voice camouflage. The crux behind the technology is that it creates custom audio noise in the background when a person speaks, confusing the AI ​​model that transcribes the recorded sounds. The new system uses an “adversarial attack” method, in which machine learning is used to alter sounds in such a way that other AI models misinterpret them as something else. In some ways, it uses one machine learning model to fool another. This procedure, however, is not as simple as it seems because the model must first process the entire sound clip before knowing how to modify it, rendering it non-functional in real time. Several research groups have attempted to build robust models capable of breaking neural networks by operating in real time throughout the previous decade. However, they failed to fulfill both prerequisites.

Following their latest study, the team successfully trained a brain-inspired neural network system to predict the future. Over several hours of recorded speech, it has been honed to process 2-second audio samples on the fly and disguise what is going to be said next. The algorithm takes into account what was just said and the characteristics of the speaker’s voice to generate sounds that interrupt a variety of imaginable words. Humans have no trouble recognizing spoken words because the audio disguise resembles background noise. The machines, on the other hand, are not the same. The technology improved the word error rate of the ASR program from 11.3% to 80.2%. Speech disguised by white noise and a competitive adversarial approach had error rates of only 12.8 and 20.5%, respectively. Even after being trained to transcribe speech affected by Neural Voice Camouflage, the error rate of the ASR system remained at 52.5%. Short words were the hardest to disrupt, as they are the minor telltale aspects of a conversation.


In quantitative studies, the researchers tested the approach in the real world by playing a voice recording mixed with the camouflage through speakers in the same room as a microphone. The strategy worked well. Since many ASR models use language models to predict outcomes, the system was also tested in this context. Compared to an ASR system with a defense mechanism, the system’s attack performs better, making it incredibly effective at eliminating white noise. The team’s work was also recently featured in a paper at the coveted International Conference on Representations of Learning.

Free 2 Minute AI NewsletterJoin over 500,000 AI people

According to the leading research scientist, this experiment is the first step towards protecting privacy in the face of AI. The ultimate goal is to create technologies that protect user privacy and give people control over their voice data. Other applications requiring real-time processing, such as driverless vehicles, can benefit from this concept. This is one more step towards the precise simulation of the functioning of the brain. The combination of a classical machine problem of future prediction with the challenge of adversarial machine learning has led to the discovery of new areas of study in the field. It’s safe to say that audio cloaking is desperately needed, as virtually everyone today is vulnerable to security algorithms that misinterpret their speech.

Source link

Comments are closed.