The History of Voice: From Audrey to Siri

Check Out Our Speech Infographic

Here at Wavelink, we’ve always had a special place in our hearts for speech recognition. It’s hardly surprising. We know first-hand the many benefits of voice recognition in the warehouse – improved productivity, efficiency and warehouse safety. Since the introduction of Speakeasy six years ago, we’ve watched the rise of applications like Siri and Google Voice, which bring voice recognition to the masses. We thought it would be interesting to take a look back in time at some of the history of voice recognition and how it’s evolved over time.

Check Out Our Speech Infographic

Check Out Our Speech Infographic


Before Siri, there was Audrey. Audrey was a speech recognition system developed by Bell Laboratories in the early 1950s. It was a pretty basic system and could only recognize the numbers one through nine. It also forced the speaker to pause between words, making it a bit cumbersome to actually use.

In the early 1960s, IBM made some improvements with their “Shoebox” device, which could understand 16 entire words: 10 digits and 6 arithmetical commands. Both Audrey and Shoebox, needless to say, were not very portable, making them highly impractical by today’s standards. Considering the low levels of computing power at the time, these were pretty significant gains.


This really started moving in the 70s, when the Department of Defense decided they were interested in speech recognition and chipped in some money for research. Carnegie Melon promptly came out with the “Harpy” system, which had a significantly expanded vocabulary compared to previous systems, able to recognize a little over 1000 words – comparable to the vocabulary of a toddler. However, this technology was limited by more than vocabulary: speakers were still forced to pause between words in order for the computer to recognize them.

Following Harpy, speech recognition began to really take off with the introduction of the hidden Markov model (HMM), which would eventually become the basis for the voice recognition systems of IBM, Philips and Dragon Systems. HMM allowed systems to take into account the possibility that unknown sounds might be words, enabling a substantial expansion of the possible vocabulary for a speech recognition system. It made it possible for speech recognition technology to find commercial applications, such as the Julie Doll by Worlds of Wonder, “the doll that understands you.”


With the growth of computers in the 90s, and the explosion of processing power that made computers a practical commodity, came Dragon. Dragon Dictation was the first consumer speech recognition product, available for the jaw-dropping price of $9,000. A few years later, in 1997, the company released Dragon NaturallySpeaking , which removed the need to pause between words and allowed you to speak naturally. Though the price dropped, it was still $645 and required time for the user to train the program.


Things stalled a bit after that. There were a few false starts that didn’t really take off – both Vista and MacOSX had speech recognition built in, though few users were aware of it. The accuracy had topped out at about 80 percent and speech recognition technology seemed destined to become one of those novelty technologies that never finds its niche.

Then, in 2007, Speakeasy was launched. Shortly after that, in 2008, Google Voice Search was released for the iPhone. Not only was the mobile interface ideal for speech recognition, as we had discovered the year prior with Speakeasy, but by offloading the data processing necessary for speech recognition to Google’s cloud data centers,  the app had access to more computing power than ever, enabling more accurate matches between what the software produced and what was actually said. Personalized recognition was added in 2010, letting the software learn your voice and become more accurate than ever before.

And then: Siri. Introduced in 2011, Siri quickly took off. “She” also uses cloud-based processing for more accurate transcriptions and has enough of an artificial intelligence and personality to make using it fun. While usage has declined slightly since the novelty wore off, there’s no doubt that voice recognition has been brought to the masses.

What’s next?

I hope to see more voice enabled devices. Many computers and almost every smartphone is voice-enabled. As more and more things become computerized through the Internet of Things, I expect that these things will eventually become voice-enabled as well. Imagine rifling through your fridge for creamer, noticing it’s running low and instructing your fridge to add it to the grocery list, while you pour the coffee. With the Internet of Things, the possibilities for voice-enablement are infinite.

Tagged with , , , , , , ,