Watermarking Overview

Similar to a visual watermark, an audio watermark is a short bit of audio embedded in a larger audio file that can be used to identify the host audio. This watermark is encoded into the host audio, and at a later time, the host audio can be decoded to determine if there is a watermark present in the audio or not. As part of the the on-device wake word engine, the Amazon Wake Word Engine is capable of detecting these watermarks. If a watermark is present when the engine hears the word "Alexa", we suppress the wake word.

Our audio watermarks are designed to be inaudible, while also remaining robust to different environments. This means our watermark is still detectable even when watermarked advertisements are played by a variety of different televisions, home living room acoustic environments, and noisy backgrounds.

More information can be found in this Alexa Science blog post.

Audio watermarks (red squiggles) are embedded imperceptibly in a media signal (black). Each watermark consists of a repeating sequence of audio building blocks (colored shapes). A detector segments the watermark and aligns the segments to see if they match. Randomly inverting the building blocks prevents rhythmic patterns in the media signal from triggering the detector; the detector uses a binary key to restore the inverted blocks.

Dependencies

Device Artifact Vending Service (DAVS)
V2 API (PRL2000 or above)

Note

DAVS is a prerequisite for watermarking, as new watermarks must have a mechanism of being deployed rapidly to Alexa devices in the field.

Resource Requirements

Watermarking resource requirement estimates for armv7a:

Memory (KB)	CPU (MIPS)
70	32

Note

These are estimates only, based on a typical ARM Cortex A53 processor. Exact numbers will vary on other processors.

FAQ

See the Watermarking section of the FAQ.