Editor's note: The following is a post from Athima Chansanchai, a staff writer at Microsoft News Center.
Kinect can be the bridge between folks who don’t speak the same language – and even those who can hear and those who can’t. A collaboration between Microsoft Research Asia, the Chinese Academy of Sciences and Beijing Union University has created a prototype that translates sign language into spoken language— and spoken language into sign language — in real time.
“There are more than 20 million people in China who are hard of hearing, and an estimated 360 million such people around the world, so this project has immense potential to generate positive social impact worldwide,” wrote Guobin Wu, program manager of the Kinect Sign Language Translator project for Microsoft Research Asia, on the Microsoft Research Connections Blog.
The system captures a conversation from both sides: the person who is deaf signs, with a written and spoken translation rendered in real-time, while the system takes a speaking person’s words and turns them into accurate, understandable signs. As you can see from the video, Dandan Yin, a Chinese 22-year-old computer science student born deaf, shows how this works by gesturing to a Kinect device connected to a sign language prototype – and words appear on the screen that translate what she just signed.
To see it is to be wowed by it. And for Yin, it’s the beginning of a childhood dream come true.
You may be more familiar with Kinect as more of a games accessory, with Xbox, as the sensor that reads your movements while you’re jumping or dancing so you’re the controller, not any device you have to hold. It’s simple, but so sophisticated. The Kinect for Windows sensor is a device that, when used with a computer and the Kinect for Windows software developer kit, gives you the foundation to create interactive apps that recognize peoples’ natural movements, gestures and voice commands.
In Beijing, Microsoft Research celebrates Innovation Day Wednesday with the Kinect Sign Language Translator project. Audiences were able to see demos at the Microsoft Research Faculty Summits in Tianjin, China in Oct. 2012 and in July at Redmond, Wash., in September at the Microsoft company meeting and at a product fair this year.
“My sense is that it intrigues people. It touches people, tantalizes them with the possibilities. When you try it yourself, you’ll quickly see,” says Stewart Tansley, director of Natural User Interface for Microsoft Research Connections.
One person who helps bring the technology to life is Yin, who made her first trip outside China in July, to the annual Microsoft Research Faculty Summit in Redmond. She came to the project during its first stage, which focused mainly on Chinese sign language data collection and labeling. During her interview to be part of the project, Yin said, “When I was a child, my dream was to create a machine to help people who can’t hear.”
And thanks to this collaboration, her dream is coming true. As Tansley says, “When Dandan is in front of it and it recognizes her gestures, you can immediately see how it can work. As we develop the research further, it could be a viable solution.”
But for now, this system is a research prototype. As Wu writes on the Microsoft Research Connections Blog, “We are diligently working to overcome the technology hurdles so that the system can reliably understand and interpret in communication mode.”
Those hurdles aren’t insurmountable, but they are daunting. For instance, it takes five people to establish the recognition patterns for just one word. And so far, they’ve added 300 Chinese sign language words – out of 4,000. They’ve done it in a very compressed amount of time – just over a year, starting in spring 2012, as one of three finalists when the call went out to Microsoft Research labs around the world to submit their best Kinect collaborations with the academic world. The other two finalists have projects that focus on assistive technology for the blind; and advancing Kinect for Windows 3D scanning through Kinect Fusion.
Wu says that recognition is by far the most challenging part of the project, but after trying data gloves and webcams, the Kinect struck the right balance in helping the technology move forward. Machine learning and pattern recognition programming enable the tool to interpret the meaning of the different gestures captured by Kinect. Bing Translator technology also helped with the translation part of the process.
The next milestone was to build a sign language recognition system. This was also a big challenge, bringing together such a collaboration of multiple disciplines that included experts in language modeling, translation, computer vision, speech recognition, 3D modeling, and special education. But by Oct. 2012, they had a demo ready, only six months after the project began.
Tansley was there and he recalled it as “the initial realization that we all had something here that had the potential to touch a lot of people. It was not the breakthrough moment of innovation -- but it was a breakthrough in realizing the communication potential.”
As he writes on the Microsoft Research Connections Blog, “Only hours earlier, I had watched a seminal on-stage demo of simultaneous language translation, during which Microsoft Research’s then leader, Rick Rashid, spoke English into a machine learning system that produced a pitch-perfect Chinese translation—all in real time, on stage before 2,000 Chinese students. It was a Star Trek moment. We are living in the future!”
And now we’re at a point where that sci-fi future starts turning into reality, with Wu talking about the next milestone of making the project have more impact, by adding more partners for other sign languages (such as ASL and BSL) and dialects, expanding recognition capability and continuing the data collection.
As Tansley says, “Look at the magic that technology can do – and that should inspire us all in whatever work we’re doing.”
Good use of Kinect!