Microsoft succeeds in developing speech recognition software more accurate than humans

Xuedong Huang, chief scientist of speech at Microsoft in December last year made a statement in an interview with Business Insider where he said that come next 4 to 5 years, computers will act just as good as humans when it comes to being able to detect words that humans say.

In what appears to be barely 10 months, the company has come up with a system they claim has the capacity to transcribe a phone call conversation with the same or even fewer errors that humans who are trained professionals in transcribing make.

This is a big milestone for the company even as Apple’s Airpods and Amazon’s Echo has shown that voice is going to come in handy in a big way in changing the face of technology. We can say that what Huang said has been accomplished.

Geoffrey Zweig, Principal researcher at Microsoft said that they were able to achieve the feat quicker than they had thought with the help of acoustic technology and artificial intelligence.

National Institute of Standards and Technology (NIST) many years ago in 1990 precisely released some phone conversations that were recorded and were in Spanish, English, and Mandarin that they called “Switchboard” which they used in ensuring that the research in speech recognition was free and fair. Since everyone cannot work outside the data provided, there was no way they could cheat.

Ever since companies like Microsoft, Google, and IBM, have been making use of the Switchboard to test for accuracy of their software for speech recognition.

Zweig said a phone call conversation is perfect for the test since in real life, people tend to cough, mumble, mutter and even stumble on words which make it very tasking for automatic transcription.

Last month, September, Huang through a blog post said that the company achieved an error rate test of 6.3% on the Switchboard which is the best that has been achieved in the industry. It is slightly above the 5.9% error rate averagely that professional transcribers make.

The company did what in Zweig’s opinion had not been done by anyone else which is to give the Switchboard test to professional transcribers so as to use the result they got to compare with theirs.

He was of the opinion that the reason why no one had thought of doing that before them could be because it was unimaginable for a software to beat human’s accuracy, which has been done as verified by NIST, he added.

What this means is that the company has come up with a system that outdid humans in speech recognition. So, in the short term, the company’s system for speech recognition is going to make their virtual assistant, Microsoft’s Cortana a whole lot better at understanding humans. Then in the long term, Zweig noted, the company is working to tweak this system for other situations.

For now, it is made to listen on a conversation that is stable on a landline telephone. What they need to do is to tweak the technology to understand humans even when they are in a noisy place, McDonald’s drive-thru, or echo-y conference room.

The algorithms will continue to improve as it learns from humans in different situations Zweig says.