False Media Wake Suppression

media-wakes

False Media Wake Suppression refers to mechanisms that prevent Alexa-enabled devices from waking up to media sources of speech containing "Alexa". These media induced wakes can come from a variety of sources such as radio, TV, online videos, and others. There several mechanisms in place to prevent these - some cloud-based, and some locally on the device. Such "false wakes" can result in unintended audio streaming to the cloud, and ultimately poor customer experience and privacy breaches.

Cloud Media Wake Suppression

The default line of defense towards preventing these media induced wakes, which all utterances from all Alexa devices are subject to, is performed in the cloud. The wake word portion of all streams sent to Alexa services is compared against a large database of known media occurrences of the word "Alexa" - if a media-based wake is detected, the cloud immediately sends a StopCapture Directive to the device to shutdown the stream an ignore any subsequent audio.

While this is effective in that large databases of known media can be stored, and more complex detection algorithms can be run in the cloud, the cloud based methods suffer from the fact that the device itself does in fact wake - the blue ring (or other audio/visual indicator) is activated on the device, a small portion of audio is sent to the cloud, and there a slight delay until the stream is terminated. In order to more quickly and securely detect and prevent these media wakes, the Wake Word engine has features that can provide similar detection performance on the device itself.

Device Media Wake Suppression

There are two methods of suppressing media induced wakes on the device. Both of these methods can detect and suppress the media wake in real-time as the wake word is processed, preventing any audio from streaming to the cloud and also preventing any audio/visual indication of audio streaming to the cloud. For these reasons, these device-side media wake rejection mechanisms provides the best customer experience.

Fingerprinting

Fingerprinting uses a local database of acoustic "fingerprints" to compare against the captured wake word in real-time on the device. For more information see the Fingerprinting section in this guide.

Watermarking

Watermarking is similar to photo watermarking - it embeds an inaudible "key" into the media's audio that can also be detected in real-time and further prevent these media induced wakes. For more information see the Watermarking section in this guide.