Advanced Speech SDK Controls

19 Dec 2023
6 Minutes to read
Contributors

Print
Share
Dark
Light

Advanced Speech SDK Controls

Updated on 19 Dec 2023
6 Minutes to read
Contributors

Print
Share
Dark
Light

Article Summary

Share feedback

Thanks for sharing your feedback!

Advanced Controls

The Vuzix Speech Recognition engine has advanced controls described here. These have been expanded since the initial SDK was released.

Enabling and Disabling Speech Recognition

The Vuzix Speech SDK will listen for the wake word phrases "Hello Vuzix" whenever Vuzix Speech Recognition is enabled in the Settings menu (unless explicitly removed by an application).

When Speech Recognition is enabled, a hollow microphone icon will appear in the notification bar. When the Speech Recognition is disabled, the microphone icon in the notification bar is not present.

The Speech Command engine has global commands, such as "go home" and "start recording" that are processed in any application. The Speech Command engine also supports custom vocabulary that is processed by each individual application.

It is possible to write an application that relies on custom voice commands to perform essential tasks. In this scenario, it would be an unwanted burden to require the user to navigate to the system Settings menu to enable the Speech Recognition prior to launching your application. Instead the Vuzix Speech Recognition may be programmatically enabled from within an application.

import com.vuzix.sdk.speechrecognitionservice.VuzixSpeechClient; 
try {
	VuzixSpeechClient.EnableRecognizer(getApplicationContext(), true);
}
catch(NoClassDefFoundError e) {
	// This device does not implement the Vuzix Speech SDK
	// todo: Implelment error recovery
}

This method is static. Passing the the optional context parameter allows the proper user permissions to be applied, and is recommended for robustness.

The recognizer may be similarly disabled via code during times when false detection would impair the application behavior.

import com.vuzix.sdk.speechrecognitionservice.VuzixSpeechClient; 
try {
VuzixSpeechClient.EnableRecognizer(getApplicationContext(), false);
}
catch(NoClassDefFoundError e) {
	// This device does not implement the Vuzix Speech SDK
	// todo: Implelment error recovery
}

However, programmatically disabling the Speech Recognition is strongly discouraged. If your application is force-stopped or crashes before re-enabling this, Speech Recognition will remain disabled for all applications even across reboots. A more robust approach is to use deleteAllPhrases() to prevent anything from being detected but only while your application is running.

Once Vuzix Speech Recognition is disabled, the microphone icon on the notification bar becomes grayed-out, and the phrase "Hello Vuzix" will no longer trigger speech recognition.

It is safe to set the Speech Recognition to the existing state, so there is no need to query the state before enabling or disabling Vuzix Speech Recognition. Simply specify the desired state. However, if you want to display the current enabled/disabled state you can query it using isRecognizerEnabled(). This value is not changed by the system while your application is active so the appropriate place for this query is your activity onResume().

bool mSpeechEnabled;
@Override protected void onResume() {
	super.onResume();
	mSpeechEnabled = VuzixSpeechClient.isRecognizerEnabled(this);
	// todo: update status to user showing state of mSpeechEnabled
}

Triggering the Speech Recognizer

When Speech Recognition is enabled, the recognizer remains in a low-power mode listening only for the wake word phrase, "Hello Vuzix". This state is indicated by the microphone icon on the notification bar becoming an unfilled outline. Once the wake word phrase is heard, the recognizer wakes and becomes triggered. This state is indicated by the microphone icon on the notification bar becoming fully filled. While triggered, all audio data is scanned for all known phrases.

It is possible for an application to programmatically trigger the recognizer to wake and become active, rather than relying on the "Hello Vuzix" wake word phrase. This can be tied to a button press or a fragment opening.

import com.vuzix.sdk.speechrecognitionservice.VuzixSpeechClient;
try {
	VuzixSpeechClient.TriggerVoiceAudio(getApplicationContext(), true);
}
catch(NoClassDefFoundError e) {
	// This device does not implement the Vuzix Speech SDK 
	// TODO: Implement error recovery
}

The recognizer has a timeout that can be modified in the system Settings menu, or programmatically as described below. The active recognizer will return to idle mode after that duration has elapsed since the most recent phrase was recognized. This state is again indicated by the microphone icon on the notification bar returning to the unfilled outline, and the recognizer will only respond to the wake word phrase "Hello Vuzix."

Some workflows are best suited to return the active recognizer to idle at a specific time. For example, during recording of a voice memo. This prevents phrases such as "Go back" and "Go home" from being recognized and acted upon.

The recognizer engine may be programmatically un-triggered to idle state with the same method.

import com.vuzix.sdk.speechrecognitionservice.VuzixSpeechClient;
try {
	VuzixSpeechClient.TriggerVoiceAudio(getApplicationContext(), true);
}
catch(NoClassDefFoundError e) {
	// This device does not implement the Vuzix Speech SDK 
	// TODO: Implement error recovery
}

Trigger State Notification

Since the Speech Recognition engine may be triggered by the user speaking and may timeout internally, it is likely that applications that wish to control this behavior need to know the state of the recognizer.

The same Speech Command Intent that broadcasts phrases also broadcasts state change updates. Simply check for the presence of the extra boolean RECOGNIZER_ACTIVE_BOOL_EXTRA.

bool mSpeechTriggered;
@Override public void onReceive(Context context, Intent intent) {
	if (intent.getAction().equals(VuzixSpeechClient.ACTION_VOICE_COMMAND)) {
		Bundle extras = intent.getExtras();
		if (extras != null) {
			// We will determine what type of message this is based upon the extras provided
			if (extras.containsKey(VuzixSpeechClient.RECOGNIZER_ACTIVE_BOOL_EXTRA)) {
				// if we get a recognizer active bool extra, it means the recognizer was
				// activated or stopped
				mSpeechTriggered = extras.getBoolean(VuzixSpeechClient.RECOGNIZER_ACTIVE_BOOL_EXTRA, false);
				// todo: Implement behavior based upon the recognizer being changed to active or idle
			}
		}
	}
}

Since the state may also change while your application is not running, if you display the state using these notifications you should also query the current state in your onResume().

bool mSpeechTriggered;
@Override protected void onResume() {
	super.onResume();
	mSpeechTriggered = VuzixSpeechClient.isRecognizerTriggered(this);
	// todo: Implement behavior based upon the recognizer being changed to active or idle
}

Startup Timing Concerns

It is possible for applications that automatically launch with the operating system to be initialized before the speech engine has come online. This is true for launcher applications, among others. Any speech queries or commands issued at startup will fail, and must be retried after the speech engine comes online. In such applications, you should surround initialization logic with a call such as:

if( VuzixSpeechClient.isRecognizerInitialized(this) ) {
	//todo perform your speech customizations here
}

Even if the initialization code cannot be run at startup, you should still register the broadcast receiver for the trigger state, as described in the preceding section. When the engine becomes initialized it will send out an initial trigger state. The receipt of this trigger state can cause your application to retry the speech initialization. This allows you to create an application that is starts before the speech engine, and can interact with the speech engine as soon as it becomes available without any unnecessary polling.

Canceling Repeating Characters

Certain Commands like "Scroll up" and "Scroll down" initiate repeating key presses. This allows the user interface to continue to scroll in the selected direction. The repeating key presses stop when the engine detects any other phrase, such as "Select this". The default phrase "Stop" is recognized by the speech engine and has no behavior other than to terminate the scrolling.

You may wish to stop repeating key presses programmatically without requiring the user to say another phrase. This is useful when reaching the first or last item in a list. To do this, simply call StopRepeatingKeys().

try {
	sc.StopRepeatingKeys();
}
catch(NoClassDefFoundError e) {
	// The ability to stop repeating keys was added in Speech SDK v1.6 which
	// was released on M400 v1.1.4. Earlier versions will not support this.
}

Getting and Setting the Recognizer Timeout Configuration

Beginning with SDK v1.91 you have access to the maximum recognizer timeout time.

int recognizedMaxTimeoutTime = sc.getRecognizerTimeoutMax();

And modify it with:

sc.setRecognizerTimeoutConfig(30); 
// in seconds

Note: This change affects all applications until changed again programmatically, or from within the Settings menu of the device.

In order for the change to take effect, the timeout value must be within the supported range. The minimum value of zero indicates that the recognizer will never time-out. Values of 1 through the maximum indicate how many seconds the recognizer will remain active. The maximum supported value can be queried with:

int recognizedMaxTimeoutTime = sc.getRecognizerTimeoutMax();

Accessing the Global Speech Settings

Beginning with SDK v1.95 you can call startGlobalSpeechSettingsActivity to access the Speech recognition system settings.

Sample Project

A sample application for Android Studio demonstrating the Vuzix Speech SDK is available to download here.

Was this article helpful?

What's Next

Overview

Table of contents

Advanced Controls
Enabling and Disabling Speech Recognition
Triggering the Speech Recognizer
Trigger State Notification
Startup Timing Concerns
Canceling Repeating Characters
Getting and Setting the Recognizer Timeout Configuration
Accessing the Global Speech Settings
Sample Project