Gesture-based controls have become so widespread they seem almost mundane. We casually flip through photos on our smartphones and have no problem swinging our arms back and forth in front of an
Xbox Kinect. However, both touchscreens and camera-based recognition systems require specialty hardware. At Microsoft, researchers have created a program called SoundWave that needs only a speaker and a microphone.
Sound might seem like an unusual choice for gesture recognition—after all, moving hands need not make any noise. But SoundWave isn’t trying to detect the sound of your hands. Instead, the speaker plays an ultrasonic tone, one that is high-pitched enough to be inaudible to the user. The microphone then detects changes in the ultrasound and interprets those changes as movement.
Sidhant Gupta, a graduate student at the University of Washington, stumbled upon the idea while working on a different project. While measuring the frequency of a 40-kHz ultrasound wave, he noticed that the computer’s microphone was behaving oddly, picking up rogue signals at 39 and 41 kHz. "I thought it was a loose wire and that I should fix it."
Even after double-checking that all connections were secure, the abnormal signal persisted. He then realized that it wasn’t an equipment malfunction. The phantom ultrasound waves were the result of his leg idly bouncing up and down in his seat, produced by something out of high school physics: the Doppler effect. Think of an ambulance siren as it zooms down the street. The change in pitch that you hear is determined by the
Doppler Shift formula. If the sound source is zooming toward you, you get more sound waves, resulting in a higher frequency and a higher-pitched siren. If it’s speeding away, the opposite happens.
When Gupta sits still, the sound waves bounce off his leg and back into the computer microphone, registering as 40 kHz. However, when he starts to fidget, the bounced sound waves appear to be moving toward and away from the microphone. The restless leg produces a small, but measurable, change in frequency. Based on the pattern of these changes, he wrote a program that recognized whether his leg was moving up or down and expanded it to detect other types of movement and gestures.
Though using the Doppler effect
to track human gestures has been around for a decade, it had always required customized equipment. SoundWave eliminates the need for any specialized hardware, requiring only basic technology. Computers, cell phones, and other electronics already come equipped with a speaker and microphone.
Microsoft is not looking to revolutionize motion control with ultrasound. Dan Morris, another researcher involved with SoundWave, envisions the program more as a supporting actor than a one-man show. SoundWave wouldn’t replace Kinect, but it could work beside it and
make it better by making gesture recognition more accurate in more directions. "Computer vision isn’t perfect, so anything you can do to provide more information will make [gesture recognition] more accurate."
If there’s one worry about SoundWave, it is that the technology relies on the limited hearing range of adults for the ultrasound pulses to go undetected; but not everyone’s hearing is so diminished. For instance, products like the Mosquito antiloitering device play similar tones to specifically target and irritate the more sensitive ears of teenagers, and Bhiksha Raj, a computer science professor at Carnegie Mellon University, worries about the side effects of SoundWave on young children and pets. "Don’t get me wrong, the premise is brilliant," he says, "but you know what happens when you hear a high-pitched tone. They have to take care of these issues first."
Morris says the team is aware of that concern, and early in the project’s development they tested SoundWave in crowded environments where both dogs and kids abound, and never received any complaints. "We tested it with a 4-kHz tone and it was about as loud as the sound effects that normally come out of a laptop."