HARK! Sound Separation Software for Telepresence Robots (Video)

Part of the reason animals have two ears it to help them pinpoint where a sound is coming from and focus on it apart from the background noise. Most telepresence robots only have one microphone, robbing them of this ability to separate out different sounds in a crowded room. That’s where Hiroshi Okuno from Kyoto University and Kazuhiro Nakadai from the Honda Research Institute of Japan come in. They’ve created HARK (Honda…Audition for Robots with Kyoto…) a software package that uses multiple microphones to separate different audio in a crowded room, pinpoint the location of these sounds, and allow a telerobot user to select the channel on which they want to focus. Recently, the Honda-Kyoto teams joined forces with Willow Garage to integrate HARK into their open source telepresence robot (Texai). HARK is now available with the robot operating system (ROS), and free to use for research. Check out the cool HARK-Texai demo in the video below.

willow garage HARK sound separation
HARK allows a telerobot (Willow Garage's Texai) to identify speakers in a crowded room and focus on one. That silly green hat is hiding 8 microphones.

The Texai is already fairly impressive as it provides an open source approach to a robust telepresence platform. It allows users to do more than simply teleconference into a room – they can also use the robot to explore the space and have spontaneous meetings with people in the office. With HARK, that ability to go anywhere has been improved even further by letting the Texai interact with individuals one on one in a crowded environment. This is another step towards fully integrating robots (teleoperated or otherwise) into work spaces with humans.

In the following demonstration video the HARK team illustrates several key features of their system. At 0:38 they show how a group of talkers (including someone on a telerobot) can be separated into different channels. Near 1:03 you’ll see how the 8 microphones that were installed on the Texai allow HARK to locate the direction of sounds. The culmination of these two effects is seen around 1:21 when several speakers and background music are selectively muted by HARK so that the Texai user can focus on just one person (Takeshi Mizumoto as he counts to 10).

The sound separation technology isn’t new. Okuno and Nakadai started developing it together nearly a decade ago as Robot Audition. While there have been some technical improvements since their first published paper in 2000, perhaps the biggest difference between Robot Audition and HARK is that HARK is open source. Kind of. The code is available and integrated with ROS, and is free to use for research. Commercial applications require licensing.

I’m a strong proponent for open source technology, so I see this quasi-open source sharing of HARK as an interesting choice. It will allow for rapid research development as anyone is free to use, adapt, and build upon the code. But it also allows Honda and Kyoto to recoup their financial investment if HARK applications ever become commercially viable. This isn’t the first example of open source research/commercial licensing I’ve seen, and I wonder if this model will serve as a compromise for those who want to encourage open research without discouraging investment and vice versa.

For now, the HARK integration into the Texai serves as yet another example of how open source code and shared platforms are accelerating research in robotics. This is my favorite theme to talk about whenever I see a new Willow Garage project and I think it’s true every time. The open exchange of information is like nitrous in the engine of science. We really need to extend these kinds of open programs into the scientific community as a whole if we want to race towards the Singularity.

[screen capture and video credit: Willow Garage]
[source: HARK site, Willow Garage Blog]

Don't miss a trend
Get Hub delivered to your inbox