Robotic Aircraft Controlled by Human Hand Gestures

Aircraft carrier crews already use a set of hand gestures and body positions to guide pilots around the deck. With an increase in unmanned planes, MIT researchers are setting out to do the same with the autonomous robots.
Image may contain Human Person Airport Transportation Vehicle Airplane Aircraft Airfield Helmet and Clothing

By Mark Brown, Wired UK

Aircraft carrier crews already use a set of hand gestures and body positions to guide pilots around the deck. But with an increase in unmanned planes, what if the crew could use those same gestures to guide robotic aircraft?

A team of researchers at MIT – Computer Science student Yale Song, his advisor Randall Davis and Artificial Intelligence Laboratory researcher David Demirdjian – set out to answer that question.

They're developing a Kinect-like system (Microsoft's Xbox 360 peripheral wasn't available when the team started the project) that can recognise body shapes and hand positions in 3-D. It uses a single stereo camera to track crew members, and custom-made software to detect each gesture.

First, it captures a 3-D image of the crew member, and removes the background. Then, to estimate which posture the body is in, it compares the person against a handful of skeleton-like models to see which one fits best. Once it's got a good idea of the body position, it also knows approximately where the hands are located. It zeros in on these areas, and looks at the shape, position and size of the hand and wrist. Then it estimates which gesture is being used: maybe the crew member has their palm open or their fist clenched or their thumb pointing down.

The biggest challenge is that there's no time for the software to wait until the crew member stops moving to begin its analysis. An aircraft carrier deck is in constant motion, with new hand gestures and body positions every few seconds. "We cannot just give it thousands of [ video ] frames, because it will take forever," Song said in a press release. Instead, it works on a series of short body-pose sequences that are about 60 frames long (roughly three seconds of video), and the sequences overlap each other. It also works on probabilities rather than exact matches.

In tests, the algorithm correctly identified the gestures with 76 percent accuracy. Pretty impressive, but not good enough when you're guiding multimillion-dollar drones on a tiny deck in the middle of the ocean. But Song reckons he can increase the system's accuracy by considering arm position and hand position separately.

Image: Tommy Gilligan/U.S. Navy/Department of Defense [high-resolution]

Video: MITNewsOffice/YouTube

Source: Wired.co.uk