In this final post on NUI I’d like to go beyond Kinect and explain why NUI is much more than touch and gesture. There are several other trends that will help bring these more natural user interfaces.
The Internet of Things As more and more devices become connected to the Internet (and thus to each other) they’ll have access to much more context. Take your phone for example - my phone knows more about me than most of my family so I’d like to have it share more of that content with other devices in my life. HTC’s phones have some NUI like smarts built in but there is room for much more. The Nokia Lumia 920 I have has some nice features such as being able to use it with gloves on and the Translator App from Bing is one of the best examples of NUI on a phone right now – hold it up to an image and it will translate it to your native language. Or speak to the phone and hear it translate your words in to another language. There is much more potential though. My phone knows who I call often, where I visit, where I check-in, what I search for, where I have been (via GPS), where I am going (via my calendar), what I like (via apps like OpenTable or Facebook) and all of these things could be used independently or together to give me more personalized experiences.
A few scenarios I often think could be dramatically improved with context are travel related. When I sit on an airplane it seems odd to me that I get the same interface in the inflight system as everyone else on the plane. The experience could be so much more personalized with context. And when I arrive at a hotel in a foreign country, surely in today’s age of connected devices I should expect more than a “hello Mr. Clayton” on the TV screen when I walk in the room? I expect in the future inflight systems will greet us like that hotel screen and then offer up the movies I haven’t watched (maybe using my Netflix history to identify actors or genres I like) and for the system to know whether I am traveling on business or pleasure and adapt the destination guides accordingly. Similarly, when I arrive at the hotel room I expect it’ll connect up all my devices seamlessly so I can stream movies from my devices to the TV and access the hotel vault of media (movies, music, podcasts, newspapers, magazines etc).
You can see some of these scenarios play out in the Office Vision video
Big Data and Machine Learning All of the context that is derived from the Internet of Things will generate huge amounts of data (so called Big Data) and using techniques such as Machine Learning to sift and learn from that data will enable technology to do much more on our behalf and actually begin to anticipate our needs. A simple example is seen today in Windows Phone where enormous amounts of public data have machine learning applied to them resulting in a keyboard that sometimes seems to know me – it anticipates what I am about to type and corrects my errors with surprising intelligence. I have another post on that coming up soon but you can read more on the Windows Phone blog.
As machine learning is applied to monitor my workflow, in the future I expect I’ll be able to say something such as “clear my schedule for tomorrow” and my system will act like a smart assistant – canceling the stuff I don’t really need, moving meetings to timeslots (and locations) that work for everybody, delegating tasks on my behalf, automatically prioritizing the urgent/time-dependent, and so on.
Speech Speech is already part of the NUI experience on Xbox with Kinect and Bing. You can simply use your voice to search for say, “Batman” and receive results in a matter of seconds that shows Batman movies, music, games and more. At CES this week I saw technology like this finding it’s way in to TV’s directly. That’s NUI – rather than battling with remote controls, you simply use your voice. I expect to see speech become a more dominant form of interaction, given improvements in accuracy and people becoming increasingly familiar with talking to their devices. However, it’ll be interesting to watch the social norms come in to play as there are times and places for talking to your devices – in the example earlier of the inflight system, I believe that will continue to be touch driven as nobody wants to battle with 300 other travelers all talking to the back of the seat in front of them. NUI is all about the right technology for the moment – not technology for the sake of it.
Computer Vision I’m no longer surprised when I come across another computer vision expert at Microsoft. In order for technology to respond and react to us in a more natural way it’ll help if it can see our world. Kinect has certainly helped to drive this field forward but there is more potential whether in augmented reality scenarios that we’re familiar with today or in the move to a world that blend the physical and digital. Illumishare is a great example of this in action, creating a very natural way to collaborate across boundaries.
Machine Translation MT brings some obvious benefits such as the ability to reliably translate text from one language to another. It’s used to great effect in applications such as Bing Translator on Windows Phone and the technology has some amazing new capabilities it’ll bring to us. Just last week, The Economist referred to a recent demo by Microsoft Research in China as “Conquering Babel”, saying simultaneous translation by computer is getting closer. Jump to about 5:40 in this video if you want to see this in action.
To draw this long post to a close – there is clearly much more to NUI than touch and gesture and we’re seeing some of that play out today – with speech becoming more predominant. For me though, NUI will really begin to have an impact when we begin to take advantage of the enormous amount of contextual information and I believe most people will recognize the benefits of NUI when it’s at it’s most invisible – doing things on our behalf based on understanding us and the world around us. I know to some people that may sound creepy but there will always be an off switch – even I’ll hit that button some days as I even I still cherish those moments without technology – almost as much as I look forward to it becoming more intelligent.
"And when I arrive at a hotel in a foreign country, surely in today’s age of connected devices I should expect more than a “hello Mr. Clayton” "
The sad reality is that most cloud-based offerings from Microsoft, Google and Apple will insist on addressing you in the language of the IP address you're using - even if you're logged in with specific language preferences. So there's a good chance you'll have absolutely no idea what the device will say to you, or indeed if it's trying to communicate with you at all.
Just to confirm, I logged into Bing Maps and saw the user interface switched to French (because of my IP address), despite my browser and account settings being English. (If I click the Settings icon, it says my Display language is English, revert to the app and I'm in French again. At least I can read that, but if I were in any other country I wouldn't know where to begin). There's no obvious way to switch it into my language, and as I have often seen, people who are travellers will burst into tears with frustration at the experience. They just want to check their travel connections online, but Microsoft/Apple/Google think they know better and present the information unintelligibly.
So it's all very easy to go on about the future, but there are very real steps that can be taken today to reduce frustration. Somewhat tragically, Microsoft has gotten worse at this over the last decade.
Part of the problem is that Microsoft keeps moving its international staff to Redmond (as happened to me many years ago), and completely loses touch with the way that software interacts with people outside of a West Coast USA environment, which is a rather unique. You want NUI? Make sure it's not just USA-NUI, because that just doesn't work for most of the planet.
The reality of perceptual commputing is very exciting. Thank You for the updates.