Eavesdropping With Amazon Alexa
Well, it was only a matter of time. Want to turn your Alexa into a spy-rig? Here's how it can be done, but be careful: Amazon already knows about it.
Join the DZone community and get the full member experience.
Join For FreeAmazon Echo (with its virtual assistant known as Alexa) is the most sold Intelligent Personal Assistant (IPA) ever, with over 31 million devices sold to date.
However, with its rise in popularity, we now see that one of the main concerns with IPAs is privacy. Specifically revolving around the fear of unknowingly being recorded.
So, naturally, as part of our research lab, we decided to toy with the idea of turning our Amazon Echo into a tapping device.
This quickly became a challenge, as Amazon states that audio is streamed to the cloud only after the wake-up word is detected (“Alexa”). Therefore, the only option left for us was to try to turn the device into a recording device AFTER the wake-up word is detected.
Once the wake-up word is detected, Alexa launches the requested capability (“skill”). Skills are either built-in or installed through the Alexa Skills Store. We had to think how we could build a “malicious” skill which looked harmless, but behind the scenes would record and transcribe what the user is saying and then would send everything right to the hacker.
There were two challenges in our way to reaching our goal:
First, we needed to find a way to keep the recording session alive after the user received a response from the benign part of the skill. Doing so without providing any audial indication to disclose that we are still listening is not straightforward, since normally the Echo prompts between cycles. Otherwise, the session ends after the response, to protect users’ privacy.
Second, we needed to find a way to accurately transcribe the voice received by the skill. Skills perform well when they are configured to accept a specific sentence format with placeholders (“slots”) of closed list of values, such as colors, places or movie names (e.g. “What is the weather in {City}”). Since we didn’t want to limit ourselves to specific conversations, we had to look for a way for the Echo to accept any text.
In order to accomplish that, we had to leverage/abuse three of Alexa’s built-in functionalities:
The first one was “shouldEndSession”:
This flag allows a session to stay alive for another cycle, after reading back the service’s text as a response. Our problem with that was if Alexa read back the text, it disclosed the fact we are still listening.
To overcome the “reading back” issue, we had to also use the “reprompt” feature:
Theoretically, “reprompts” suffer from the same “reading back” issue, but we found that Alexa’s API accepts “empty reprompts” such as:
In this case, a new listening cycle starts without letting the user know we are still listening.
And now to the third challenge:
In order to be able to listen and transcribe any arbitrary text, we had to do two tricks. First, we added a new slot-type, which captures any single word, not limited to a closed list of words. Second, in order to capture sentences at almost any length, we had to build a formatted string for each possible length.
A demo can be seen here with three different scenarios. The first two use regular skills (in the example, asking for the time and then asking for the distance between Tel Aviv and New York). The third scenario shows a malicious skill, where the benign part is a calculator and the malicious part effectively keeps listening, transcribing and sending the hacker everything that is being said. Please note the blue light on the device lights-up when a session is alive.
To be fair, that shining blue light discloses that Alexa is listening. But then again, the whole point of an IPA is that unlike a smartphone or tablet, you do not have to look at it to operate it. In fact, they are made to be placed in a corner where users simply speak to it without actively looking to its direction. On top of that, with Alexa Voice Services (AVS), vendors are embedding Alexa capabilities into their products, and they don’t necessarily provide a visual indication when the session is running.
Status
The Checkmarx Research Lab disclosed this attack scenario to Amazon Lab126 and worked closely with their team to mitigate the risk. Some of the measures that were put in place are:
- Setting specific criteria to identify (and reject if necessary) eavesdropping skills during certification
- Detecting empty-reprompts and taking appropriate actions
- Detecting longer-than-usual sessions and taking appropriate actions
We’d like to thank the security team at Lab126 for the effective communication once the report got to them.
Published at DZone with permission of Maty Siman, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments