Meet the Amazon researchers helping robots understand noise and hear the results

The Astro home robot has been turning heads for Amazon unveiled the device last fall. Customers can ask the pet-sized robot to patrol the house, check pets, manage video calls, order groceries, and even pick up a drink. But few people are more amazed by its abilities than the scientists who brought it to life.

“Even as someone who works on this stuff for a living, it feels like magic,” says Wontak Kim, an Amazon audio engineer whose team helped Astro accurately process the sound.

It may sound like magic, but Astro’s ability to keep up with the demands of a busy room is actually the result of countless hours of dedicated work. Kim’s team, which is part of Amazon’s Devices and Services organization, includes scientists and acoustic engineers from Amazon’s Audio Lab in Cambridge, Massachusetts. Working with colleagues in Sunnyvale, California, and Bellevue, Washington, they designed and built Astro’s audio features, including voice recognition and audio and video calling. They knew that for the home robot to be successful, it had to be able to clearly understand and process audio requests. But not only that; Astro’s video calling feature needed to work in near real time for customers to use it.

“Humans can’t tolerate latency with audio,” says Mrudula Athi, an acoustics scientist on Kim’s team. “Even 20 milliseconds of lag is immediately noticeable. So for Astro, we needed to process and clean up 125 frames of audio signal per second.

The magic is to unravel the sound waves

Astro’s audio features use Amazon’s Alexa, the company’s voice AI. On any Alexa-enabled device, Alexa doesn’t automatically identify speech like we would when someone speaks to us. When you make a voice request, sound waves bounce off walls and ceilings toward the device’s microphone.

With Astro, this challenge is compounded by the fact that the robot moves around the house. For the robot to satisfy customers, it had to accurately process voice requests without being distracted by pets or other common household noises, the subtle sounds of the electric motors that power it, or music or any other sounds ‘he plays. For example, Sunnyvale team lead scientist Amit Chhetri states that when Astro is moving across a tiled floor, “the level of wheel noise at the microphones is even higher than that of speech.”

The magic lies in disentangling all the extra sounds.

“If you send all that noise to the voice recognition app, it won’t work very well,” says Athi. “Our job is to take those microphone signals and make sure they’re cleaned up enough for Alexa to perform at a level that results in a good customer experience.”

All this sorting of sounds must also take place rapidly.

This is a tough problem, and Amazon has pulled together some serious brains to solve it. Astro’s audio team included acoustical scientists familiar with the physics of sound, applied researchers creating algorithms to manipulate sound waves, and software engineers who turn those algorithms into powerful code.

Taking AI-based algorithms to a new level

The team first focused on suppressing background noise during audio and video calls, so people could talk to and understand each other even when the robot was navigating through a noisy space. To keep everything running at the fast speeds required, the team used an AI-based algorithm called a deep neural network (DNN), which is often used to troubleshoot audio and computer vision problems. But they took it to a new level. Chhetri, in particular, designed a new network architecture that both reduces background noise and de-reverberates speech, allowing Astro to handle calls.

Use of simulated data

An audio test GIF.

DNNs, especially ones as advanced as the one developed by Chhetri, Athi and the team, usually require a lot of data to train with. That’s where the team’s audio simulation expert came in. Using the data he generated, Athi says the engineers were able to rely on the simulated voice of “someone speaking from different positions in different types of rooms, with different levels of man-made room noise.” “Audio scientists at Amazon typically use simulated data for projects such as helping devices locate sound sources. But with Astro, the team had to go further. Because the robot makes its own noise, they needed even more Astro-specific data to build their speech enhancement model.

Another Amazon team had recorded audio of Astro making distinct noises while driving through a house in all sorts of scenarios. Athi says this data was perfect for their speech enhancement problem. So she mixed it with voice data sets she had assembled to train the robot, and solved the problem.

A “state-of-the-art” solution

An image of the Astro robot interacting with a dog in a living room.

The audio team was happy with the result, but now they had to integrate all this code inside the robot, another unique challenge. But again, teams at Amazon audio labs across the country have stepped up. The result, says Athi, is incredibly advanced.

“The amount of noise reduction we get with the speech enhancement performance we have, while being able to operate in real time, not in the cloud, but on the device…it’s all down to state-of-the-art,” she says.

Installing Astro’s speech enhancement feature on the device is one of the things Athi says she is most proud of in her professional career. But Kim, Athi, Chhetri and the rest of the audio team aren’t stopping anytime soon. They continue to improve Alexa speech recognition, Astro speech enhancement, and they have a number of projects in the works that they are excited to show customers.

“We’re very proud to work in this audio space for Amazon,” Kim says, “and for customers.”

Want to learn more about all the fun, convenience and security that Astro offers? Check out the updates announced by Amazon to the home robot during the fall launch of devices and services.

Illustration by Mojo Wang.


Source link

Comments are closed.