FAQ

This section addresses frequently asked questions related to the Amazon Wake Word Engine.

General Questions

What's the difference between V1 API and V2 API?

V2 API lays the foundation for supporting media wake suppression and other features. V2 API needs minimal integration effort. V2 API is not backwards compatible with V1 API which will be deprecated in the future. libpryon_lite-PRL2000 uses the new Pryonlite V2 API and is a replacement for libpryon_lite which uses V1 API.

There are many models under the models folder, which one do I use?

See Model Selection.

There are multiple binaries (libpryon_lite.a, libpryon_lite.so, amazon_ww_filesim) in the package, which one should I use?

See Release Contents.

Why is the wake word model performance worse on the device compared to results from simulation using filesim?

Differences in audio front ends can cause differences in model performance. Our models are trained using data with background noise, and some processing algorithms (e.g. noise suppression) ahead of the wake word engine may adversely affect the wake word and Automatic Speech Recognition (ASR) performance.

Under device playback, we see very high False Reject Rate (FRR), what can be done?

Although the wake word detection threshold does not relate to the same performance for all models, we can tweak the threshold to adjust the overall sensitivity of the engine based solely on the device/front end in use. See: #### How do I set the wake word detection threshold? Client properties with model specific overrides for device playback state can be a better solution, but as of today, we do not have these override thresholds configured for all models.

How do I enable Low Latency?

The "lowLatency" mode, is valid only for "U" class models. This configuration parameter will reduce latency by 225 ms on average. It's disabled by default. To enable it, set PryonLiteDecoderConfig.lowLatency = true. The lower detection latency is at the cost of less accurate wake word end indices.

How do I set the wake word detection threshold?

Amazon PryonLite engine uses a detection threshold for trading off between False Accept (FA) and False Reject Ratio (FRR). The default is 500, while 1 is most sensitive (lowest FRR, highest FA) and 1000 is most restrictive (Lowest FA, highest FRR). It is generally recommend that if you have representative audio from the device’s AFE, testing in steps of 50 around 500 (500, 450, 550, 400, 600, 350, 650) to see which gives the best balance of FA/FRR. Most devices should land near 500 +/- 100-150. To set the detection threshold, update wakewordConfig.detectThreshold in the main() loop or call PryonLiteWakeword_SetDetectionThreshold() any time after decoder initialization. Please refer to Detection Threshold.

V2 API - Examples are provided in api_sample_PRL2000.cpp (or for PRL1000, PRL5000):

wakewordConfig.detectThreshold = 500; // default threshold
status = PryonLiteWakeword_SetDetectionThreshold(sHandle.ww, NULL, detectionThreshold)

V1 API – Examples are provided in api_sample.cpp:

config.detectThreshold = 500; // default threshold
status = PryonLiteDecoder_SetDetectionThreshold(sDecoder, NULL, detectionThreshold);

How do I interpret errors?

PryonLite library API functions return PryonLiteStatus, which has 2 fields, publicCode and internalCode. publicCode type is enum PryonLiteError (defined in pryon_lite_error.h), and internalCode is an internal error code. For the public error code, please refer to pryon_lite_error.h in the package, under the folder for the architecture you are using (e.g. armv7a-linaro820). There is more information in the internal error code.

Important

When submitting a support ticket related to errors, be sure to provide both error codes to help diagnose the problem as quickly as possible.

For example, the most common error when calling PryonLite_GetModelAttributes() is 8 (PRYON_LITE_ERROR_MODEL_INCOMPATIBLE). Most often the solution is to ensure that the engine and the model are compatible. This could be caused by mixing models and engine from different versions, or using model binaries from one architecture for another one (e.g. using x86 model binaries for aarch64-linaro541 architecture). See question "There are many models under the models folder, which one do I use?"

The device wakes up to itself when the speech includes "Alexa". How can this be prevented?

This type of false positives is known as "self-wake". Typically, we would expect an acoustic echo canceler (AEC) to be removing most or all of any echo in the microphone signal, and reduce self-wakes. Please refer to Self-Wake Mitigation Overview.

Fingerprinting

How many commercials are supported?

Different versions of the list (small, medium, large) are available for download. Devices are programmed to ask for the right size based on their storage capacity.

How often are the lists updated?

The media list is compiled and updated on a weekly basis. The devices contact DAVS services periodically and check whether a new list is available. If there is a new list, the device will download it.

Do I need the engine with V2 API before I can use fingerprinting?

Yes, if you are using PryonLite engine with V1 API, you need to switch to V2 API. The effort is minimal, and should be about a week.

VAD

Does using VAD to gate Wakeword functionality increase the accuracy of the WW Engine?

No. Its purpose is power saving only. For EnergyDetection in specific, it does very slightly degrade the accuracy of the WW Engine, typically 0.5-1.0% relative decrease in FAR/FRR, depending on the model being used.

Can I increase or decrease the sensitivity of VAD?

No. Currently the sensitivity is hard-coded inside the engine.

Cascade Mode

How does a customer make a request for models to run in cascade configuration?

Please reach out to your AVS Solution Architect to make a new package request. Cascade or stand-alone model(s) will be provided based on the information entered in the request.

What is the minimum CPU and memory required to run the first-stage detector?

Around 20 MIPS and 125 KB of memory is required to run the smallest ultra-low-power model.

Does VAD come built in with the first-stage detector?

Yes, it does, but it is disabled by default. The customer is also free to use their custom VAD instead if preferred.

Do I need to transmit Pre-Roll from the first stage to the second stage?

Yes; in fact, you will need to transfer more than 500 milliseconds of audio to downstream stages.

For devices that implement a 'cascade architecture' for low-power wake word detection, the final device-side wake word verification stage must meet pre-roll AVS device-to-cloud streaming requirements. So all upstream stages - in particular, the first-stage ultra-low-power wake word detector - must forward sufficient pre-roll for the final stage to be guaranteed to have the pre-roll it needs.

It is recommended that a cascade architecture be implemented such that the pre-roll from the first stage be variable via run-time configuration from a minimum of 500 milliseconds, to upwards of 700 milliseconds, for a final on-device wake word verification stage that applies only wake word verification. If the final verification stage also applies on-device media-induced wake suppression techniques like fingerprint matching, or watermark detection, those features may require upwards of two to three seconds of pre-roll. Contact the PryonLite team for the latest requirements specific to your device integration.

WWDI

Why should my device implement WWDI?

WakeWord Diagnostic Information (WWDI), also known as metadata, is used for monitoring wake word (WW) engine health in the field. WWDI includes WW engine/model versions, WW engine state, and WW detection threshold. Without this data we are blind when debugging customer issues related to WW detection, on-device fingerprinting, speaker ID, and other WW related features. WWDI also provides data for tracking of model performance in the field. For more information please see: Wake Word Docs WWDI

What are start and end indices for Cloud-Based Wake Word Verification?

Start index and end index are part of the SpeechRecognize Event HTTP request Sample Message. The start index represents the index in the audio stream where the wake word starts, in samples. The start index should be accurate to within 50 msec of wake word detection. We require 500 msec pre-roll before the wake word, which results in a start index of 8000. During the certification start index and end index are verified to ensure 500 msec pre-roll is available in the audio stream. Please refer to Streaming Requirements for Cloud-based Wake Word Verification.

What if I leave the WWDI part of recognizer event empty?

Leaving WWDI part empty will lead to AVS Certification failure. WWDI is used for understanding WWE performance and assessing health of devices in the field. Not implementing WWDI or modifying any content intended for WWDI hampers diagnostics efforts. Implementing WWDI is required as specified in the license.txt in the WWE package.

Do I have to integrate WWDI even for the first stage wake word detector in a cascade architecture?

Currently, it is not mandatory for the first stage wake word detector to send its WWDI to the second stage wake word detector. However, it is still recommended to send the first stage WWDI to the second stage as it would enable smoother integration of upcoming features that require the first stage WWDI.

AVS certification

Why do I need to certify my device?

If your product is intended for commercial use, you must submit your product for certification. To help you meet Amazon standards and build the best possible Alexa integration, all devices must go through Amazon’s testing and certification process before receiving approval for launch. Please refer to Alexa Built-in Testing and Certification Process.

When do I need to re-certify my device?

Please refer to Launch and post-certification Over-the-Air (OTA) updates.

How do I certify the Alexa App on an OEM phone? (Certifying a Mobile OEM for Alexa App launch)

For a Mobile OEM to be able to launch with Alexa Mobile App, the OEM needs to be an AVS partner. For further details, please refer to Solution Providers and ODM Solution Providers.

AVS Device SDK

Where do I get the latest version of AVS device SDK?

Please refer to (link).

What is the advantage of using the AVS Device SDK rather than my own integration?

The SDK implements cloud communication, WakeWord Diagnostic Information (WWDI), Communications, Music, and other features. Using the SDK saves considerable (over 50 hrs) development time.
You would benefit from maintenance releases.
Reduce chance of introducing new bugs in the custom application.

For more information, please refer to AVS Device SDK.

Where do I get the wake word adapter?

For customers with access to AVS Developer Portal, the wake word adapter is available here: AVS SDK Adapter for Wake Word Lite. Please note, you have to be added to the allow list in order to get the access. Please reach out to the Amazon SAs for instructions.

File Simulators (filesims)

How do I use the File Simulators?

See the instructions in the File Simulators section.

When I try a file by itself, the wake word is detected, but when I try the same file as part of a list of files, the wake word is not detected, what is wrong?

The wake word engine uses adaptation logic to learn the characteristics of the acoustic environment. This improves detection quality of all audio files in the list are recorded in the same acoustic environment. However if the audio clips are from different environments and do not contain at least 1 second of background noise prior to the wake word, the detection of the wakewords in subsequent files is impacted by changing characteristics. To avoid this problem, we suggest using -c which will cause the engine to clear the adaptation stats after each file.