Voice wars: Apple v Google v Microsoft Asher Moses
October 27, 2011 - 2:16PM
Comments 24
.
Google says it's been doing voice search for years.
Voice is the new black when it comes to interacting with our gadgets - and Google and Microsoft aren't about to let Apple's Siri personal assistant hog all the limelight.
Yesterday, Microsoft switched on for Australians voice command capabilities for Kinect on the Xbox 360, allowing users to control games and, from mid-December, the games console itself, just by speaking.
Today, Google Australia released several videos to demonstrate its own voice search for Android, conducting voice searches in the middle of the desert and even underwater.
Advertisement: Story continues below
Australians can access voice search but Voice Actions, pictured above, are US only.
While he refused to even mention Siri by name, Mike Cohen, who leads Google's speech recognition efforts, said that Google has for years offered many of the voice features offered by Apple's smart voice recognition engine.
In fact, Google said it had seen voice inputs grow sixfold in the past year and every day the company received more than two years of non-stop speech input.
"We released voice search [for Android] three years ago and we released voice actions a year and a half ago," said Cohen in an interview with Fairfax Media from Google's headquarters in the US.
Apple's Siri personal assistant is one of the most talked about features on the iPhone 4S. Photo: Getty Images
"I've been talking about it [voice interaction] for twenty years."
Android voice slightly hobbled in Australia
The only problem is, the best voice features for Android haven't launched in Australia. While Australians have access to voice search - and a microphone icon built into Android's keyboard that can be used to enter text by voice rather than typing (even in third party apps) - voice actions is currently US-only.
Iris, the third-party Siri clone for Android.
Voice actions lets users control their phone by voice in a similar way to Siri. It can be used to send text messages to people, fire up music tracks, get directions, call businesses, send emails, view a map, go to websites and write notes.
Google Australia could not say when it would launch in Australia. However, Australians can still use voice commands while certain apps are open, such as the navigation app (e.g. "navigate to Potts Point").
Google's voice features aren't just limited to Android - it launched voice search on desktop for the Chrome browser in June this year.
Third-party Android developers have created a clone of Siri for Android, dubbed Iris, but early reviews suggest it is not yet as polished as Siri and isn't as competent at answering questions posed by the user.
On Windows Phone 7, the new 7.5 update brought voice commands, allowing users to open applications with voice, compose messages hands-free and search by voice.
Google's voice guru
Cohen joined Google in 2004 to spearhead its voice efforts. He had previously worked on voice-related projects for the US Defence Advanced Research Projects Agency (DARPA) and spent 10 years developing over-the-telephone spoken language applications at Nuance Communications, which he co-founded.
"The need for speech input is much greater now than it ever was before," he said, pointing out that we now used our phones for much more than just making calls.
"[Smartphone owners have] constant access to the internet, to search, to transactions, to everything that you used to be able to do on your desktop but now people want to do it all the time and therefore the need for speech technology is much greater."
But despite investing heavily in voice and employing, in Cohen's words, "big teams" of people building out voice functionality in Google's products, its recent messaging on the subject has been confused.
'You shouldn't be communicating with the phone'
Google's Android boss Andy Rubin recently dismissed Siri at a conference.
"I don't believe that your phone should be an assistant," he said. "Your phone is a tool for communicating. You shouldn't be communicating with the phone; you should be communicating with somebody on the other side of the phone."
This conflicts with Cohen's enthusiasm for voice and previous comments by Google executives, who have said they are working to turn our phones into our personal assistants. Cohen could not explain this mixed messaging.
"Whether it should be viewed as a personal assistant or not, I don't really care, what I care about is understanding end user needs and finding the best way to very seamlessly help them meet their needs," said Cohen.
Siri an 'existential threat' to Google?
Siri is considered a serious threat to Google as it brings Apple further into search, which is Google's core business.
Gary Morgenthaler, who was on the board at Cohen's previous company, Nuance, and was the first investor in Siri, said Siri was far and away better than other voice recognition systems as it not only had a top speech recognition engine but also "natural language understanding and various pieces of artifical intelligence that allows it to understand what you meant rather than simply recognise words and convert it into text".
"When you search, what you want back is not a million blue links. What you want back is one correct answer," Morgenthaler told CNET.
"Siri, because it has the semantic layer, is not just responding to keywords; it's responding to a conceptual understanding of what it is that you said. And therefore it's able to retrieve for you exactly the right information you want.
"Or, better still, if you intend to do something with that information - to make a transaction, say - Siri could take you all the way to that transaction. That's fundamentally new and fundamentally different and it is potentially very disruptive to the search industry."
Morgenthaler said Apple did not roll this out initially but Siri was capable of conducting transactions on behalf of the user such as buying a book on Amazon or sending someone flowers. This would likely appear in future updates.
Asked to comment on this perceived "existential threat" to Google, Cohen was dismissive.
"Gary was an investor in Siri, that's why he's good at promoting his stuff," said Cohen.
Cohen pointed out that Google was "one of the greatest natural language understanding engines that's ever come about" and could instantly answer almost any question, such as "what time is it in Tokyo" or "what's the weight of a rhinocerous".
How does it all work?
So how do engineers get computers to understand and interpret what we are saying? Cohen said the underlying approach was machine learning or data-driven.
"We develop algorithms that learn from data and then based on that we then have to find lots and lots of representative data to feed to the systems so they can learn," he said.
"As you have more and more data you gradually cover loads of different accent types, all the different pronunciations, all the different word combinations ... so over time our systems learn about all the variations in spoken language."
Read more: http://www.smh.com.au/digital-life/mobiles/voice-wars-apple-v-google-v-microsoft-20111027-1mldj.html#ixzz1bxoktkFs