how does Amazon Echo (Alexa) know what is said to it?

GMartin used Ask the Experts™
Hello and Good Afternoon Everyone,

            I am wondering how Amazon Echo (Alexa) understands human speech and able to respond with answers.  


Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Systems Analyst & Webmaster
The technology for computers to understand voice commands has been around for YEARS. Basically, the device has a microphone built into it and when it hears a voice, it process that info based on frequency patterns to identify words, phrases, numbers, etc. Basically, it's a glorified speech to text converter that runs the text through a search engine.
William is correct; an explanation and an update on the types of models are at:
1. Basic Explanation -
2. The Latest -

Since the "horsepower" of computing increases as the costs decrease (i.e. Moore's Law), more difficult / resource intensive tasks like voice commands / robotics and the like get processed much quicker and some are now instantaneous ... when they used to be either impossible, very slow or only possible on very expensive computers.
William FulksSystems Analyst & Webmaster

It works the same way your smartphone can responds to voice commands or even identify music. If you think of the physical representation of sounds, it is done in waves with a complex set of measurements that differentiate tones and such, so that the number 4 and the letter X will look different when shown as a wavelength. With this info you can create a database saying "word" matches whatever criteria and then you're just making a search.

This is why people with strong accents will sometimes have issues with voice commands because it will heart certain words as they are actually spoken and not take the accent into account. Some programs, like Dragon's Naturally Speaking, will learn from you the more you use it, effectively building a custom database based on your own voice and manner of speech. I know some disabled folks who use this for writing, etc.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

They also have a larger database of phrases now to help with more natural speech patterns.  Early Dragon, Naturally Speaking (back in the late 90s), had you... talk... one... word... at... a... time... and... required... a... break... between... each... word....
If you didn't leave pauses, it would mess up.  These days, our computers hold much more data and can process them faster as well.


Hello and Good Afternoon Everyone,

         Thank you so very much for the enlightening feedback given in reply to my question.  I have to admit that I thoroughly enjoyed reading each person's shared thoughts and certainly did learn a great deal from this participation.


I hate to do this, but sorry William,  that answer is not correct for an Echo device.

An Amzon Echo only listens for its name, which it can usually recognize by simple pattern matching. Until it hears its key word, it throws away all other sounds.
When it hears its name, it records the voice that comes after that until a reasonable pause is heard, and streams that voice clip up to the Amazon servers.
Then the Amazon servers use very fast voice recognition and translation software and does a voice to "word" conversion, creating a string of words it heard. But that is not done on the Echo.
The string of words are then sent to a parser (in Amazon's cloud) to determine a best match to what you asked for.
Then, the Amazon servers send back a series of instructions and voice response info to do what you asked.

The actual voice to word conversion does not take place inside the Echo. This allows the processor to be lower speed and power, and allows the full power of high end servers to do the conversion.

Siri and Google do the same things. The conversion of voice to words is typically not done on the phones. That is why an internet connections is required to use those services. It is also why Amazon Alexa will tell you it cannot understand you when the internet is down.
William FulksSystems Analyst & Webmaster

Owen, his question was how it understands human speech not where the processing takes place. You are correct that it is a voice-acted interface for a cloud-based application. Same for phones and the like. My answer isn't incorrect, though.

Ok, I guess we read that differently. On a basic level, I agree, your answer explains speech to word conversion. But he asked how an Amazon works, and to be clear, it does not do the conversion.
To the layperson, it doesn't matter.  The modern computer eventually will evolve and we'll call cloud based systems a computer system.

On the other hand, I have been nitpicked to death at times here when not putting in full details and trying to give a simple answer. Sorry, but if you are going to answer a question here, why not be as accurate as possible?  There is a big difference NOW between a device doing all the work, and the device sending the work to the cloud. In this particular case, the answer does not explain why it stops working and understanding when the network is lost. Since points were already awarded I was simply trying to add more accurate details. Sorry if some of you are offended at trying to be more accurate with an answer.
I know this question has been marked as SOLVED (so I'm not adding further comment just to try and gain points or anything), but:

@serialband - "The modern computer eventually will evolve and we'll call cloud based systems a computer system."

That's not the modern computer evolving really, and I thought we already DID call 'cloud' systems a computer system.... because that's what they are!
Or have I mis-interpreted the post?
@IT-Expert - I think what he's saying is, in the past, we have referred to the computer as the actual hardware, etcetera which is physically in our house. It did all computing without having to "go for assistance outside the room." Then we started offloading specific processes to math coprocessors, video cards, etcetera but they all still resided at the same address, on the same piece of motherboard with no outside assistance. When one says "computer system" today, most non-technical people still only think of the physical box at the physical address as the "system."

In the future, with Alexa being a good example, the systems at my physical address will be "smart enough" to get started and then offload the remaining computing process(es) to a more powerful system at another physical address via "the cloud" (which is nothing more than a consumerized name for remarketing the Internet) then bring the answers back to my physical address. As this becomes "more publicly understood", saying "computer system" will simply mean "the stuff that gets me answers / results regardless of location."

Sorry for the long answer but that's what I heard when he posted that statement ... an evolution of understanding for the non-technical people using the "plug and play devices" like Alexa; plug it in, put it on Wifi and get me my stuff ...

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial