Speech service

The speech service provides API with the ability to call the device’s speech related capabilities, such as "speak", "listen", and “understand”. As the speech service access agent, SpeechManager provides the main API of the speech service and can be obtained through RobotContext.

SpeechManager speechManager = aRobotContext.getSystemService(SpeechManager.SERVICE);

In the field of speech technology, there are terms such as Keyword Spotting (KWS), Text to Speech (TTS), Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Natural Language Processing (NLP). These terms refer to the subdivided technologies of the field of speech technology, such as speech service wake-up, text-to-speech, speech recognition, natural language understanding and processing. The speech service is realized using these technologies, and calling the SpeechManager API is actually a call to these technical capabilities. The following chapters will introduce them one by one.

Speech service wake-up

When the user speaks a wake-up word to the device, using the speech wake-up technology, the device can perceive that the user is "calling it", and then it can start interacting with the user. This process is called "speech service wake-up". If you need to perceive speech wake-up, use the following code.

WakeUpListener wakeUpListener = new WakeUpListener() {
    @Override
    public void onWakingUp(WakeUp wakeUp /* [1] */) {
        // if it is perceived that the user has said the wake-up word, it will run here
    }
};
speechManager.registerWakeUpListener(wakeUpListener);

[1] The WakeUp object describes the environmental information at the time of speech service wake-up, including:

Attribute getter	Descriptions
WakeUp.when	The timestamp of the wake-up time, unit: ms.
WakeUp.voiceDirection	The sound source of the wake-up time

If you no longer need to perceive speech service wake-up, use the following code.

speechManager.unregisterWakeUpListener(wakeUpListener);

Text-to-speech

When you need the device to speak like a human, using the text-to-speech technology, text can be synthesized into human voices and played. This process is realized using the following code.

promise /* [2] */ = speechManager.synthesize("Hello, what can I do for you?" /* [1] */)
    .progress(new ProgressCallback<SynthesisProgress>() {
        @Override
        public void onProgress(SynthesisProgress progress /* [3] */) {
            // The synthetic process will be run here multiple times
        }
    }).done(new DoneCallback<Void>() {
        @Override
        public void onDone(Void aVoid) {
            // “Hello, what can I do for you?” After saying that, it will be run here
        }
    }).fail(new FailCallback<SynthesisException>() {
        @Override
        public void onFail(SynthesisException e) {
            // If the synthetic process goes wrong, it will be run here
        }
    });

[1]Enter the text sentences to be synthesized, it supports multiple sentences with punctuation.

[2]The asynchronous object that is waiting for the progress and result of the synthesis is returned, through which you can wait for or monitor the progress and results, and cancel the synthesis process. See Promise for specific usage.

[3]The SynthesisProgress object of the asynchronous callback describes the progress information of speech synthesis, including:

Attribute getter	Descriptions
SynthesisProgress.remainingTimeMillis	Remaining playback time
SynthesisProgress.playProgress	Playback progress
SynthesisProgress.audioBytes	Play the resource, some platforms may not provide this resource.

By specifying options, you can adjust the behavior of speech synthesis.

SynthesisOption /*[1]*/ option = new SynthesisOption.Builder("Hello, what can I do for you?").

[1] The SynthesisOption object is constructed through SynthesisOption.Builder, and the instructions are as follows:

Methods	Descriptions	Default value
Builder.constructor(inputText)	When constructing, you must import synthetic texts, NULL objects or strings whose lengths are 0 are not allowed.
Builder.setSpeakingSpeed(speakingSpeed)	Set the speaking speed	0
Builder.setSpeakingVolume(speakingVolume)	Set the speaking volume	0
Builder.setSpeakingVoiceId(speakingVoiceId)	Set the speaker, if you want to know the speakers that the system has, please refer to the speech settings	Zero-length string

Constants	Descriptions
SynthesisOption.SPEAKING_VOLUME_MAX	The maximum volume available
SynthesisOption.SPEAKING_VOLUME_MIN	The minimum volume available
SynthesisOption.SPEAKING_SPEED_FAST	Fast speech speed
SynthesisOption.SPEAKING_SPEED_NORMAL	Normal speech speed
SynthesisOption.SPEAKING_SPEED_SLOW	Slow speech speed

If you want to know the current synthesis status, use the following code.

boolean isSynthesizing = speechManager.isSynthesizing();

If true is returned, it means that the synthesis is in progress.

Speech recognition

When the user speaks to the device, the speech recognition technology can be used to convert speech into text or instructions. This process is realized using the following code.

RecognitionOption /* [1] */ option = new RecognitionOption.Builder(
                RecognitionOption.MODE_SINGLE).build();
promise /* [2] */ = speechManager.recognize(option)
    .progress(new ProgressCallback<RecognitionProgress>() {
        @Override
        public void onProgress(RecognitionProgress recognitionProgress /* [3] */) {
            // the recognition progress will be run here multiple times
        }
    }).done(new DoneCallback<RecognitionResult>() {
        @Override
        public void onDone(RecognitionResult recognitionResult /* [4] */) {
            // if the recognition progress is completed, the final recognition result will be returned
        }
    }).fail(new FailCallback<RecognitionException>() {
        @Override
        public void onFail(RecognitionException e) {
            // if the recognition progress goes wrong, it will be run here.
        }
    });

[1]Speech recognition options, which can be used to adjust the speech recognition behavior. It is built with RecognitionOption.Builder, and the instructions are as follows:

Methods	Descriptions	Default value
Builder.constructor(mode)	You must specify a recognition mode when constructing.
Builder.setDistanceRange(distanceRange)	Recognition scenarios	Near-field recognition
Builder.setTimeoutMillis(timeoutMillis)	Recognize the timeout length	No timeout
Builder.setUnderstandingOption(understandingOption)	Natural language processing options	NULL
Builder.setExtension(extension)	Customized field	NULL

Constants	Descriptions
RecognitionOption.MODE_SINGLE	Single recognition: It is generally used to start speech recognition once after waking up. The recognition progress will be stopped after getting the result, and it will wait for the next wake up.
RecognitionOption.MODE_CONTINUOUS	Continuous recognition: Uninterrupted recognition after wake-up is started, the data is returned after recognition, and the recognition will be continued.
RecognitionOption.DISTANCE_RANGE_NEAR_FIELD	Near-field recognition
RecognitionOption.DISTANCE_RANGE_FAR_FIELD	Far-field recognition
RecognitionOption.DISTANCE_RANGE_CLOSE_TALK	Super near field recognition

[2]The asynchronous object that is waiting for the recognition progress and results is returned, through which you can wait for or monitor the progress and results, and cancel the synthesis process. See Promise for specific usage.

[3] The RecognitionProgress object of the asynchronous callback describes the progress information of speech recognition, including:

Attribute getter	Descriptions
RecognitionProgress.decibel	Decibels recognized
RecognitionProgress.textResult	Text converted from the recognized speech
RecognitionProgress.understandingResult	The result of the recognized speech after natural language processing. For details, see Natural Language Processing.
RecognitionProgress.progress	Recognition progress

Constants	Descriptions
RecognitionResult.began	Start of recognition
RecognitionResult.ended	End of recognition
RecognitionProgress.PROGRESS_RECOGNIZING	Recognizing
RecognitionProgress.PROGRESS_RECOGNITION_TEXT_RESULT	Text recognized
RecognitionProgress.PROGRESS_UNDERSTANDING_RESULT	In natural language processing

[4] The RecognitionResult object of the asynchronous callback describes the final result of speech recognition, including:

Attribute getter	Descriptions
RecognitionResult.text	The text recognized
RecognitionResult.withUnderstandingResult	The recognition result is processed by natural language.
RecognitionResult.understandingResult	The processing results by natural language

For single recognition, if you don't want to build recognition options, you can also use an interface without parameters. The specific process is as follows:

promise = speechManager.recognize();

If you want to know the current recognition status, use the following code.

boolean isRecognizing = speechManager.isRecognizing();

If true is returned, it means that it is recognizing.

Natural language processing

When you need the device to have the ability to "listen" like a human, using natural language processing technology, the device can sense that the user is talking to it and then start interacting with the user. This process is realized by the following code.

promise /* [2] */ = speechManager.understand("How's the weather today?" /* [1] */)
    .done(new DoneCallback<UnderstandingResult>() {
        @Override
        public void onDone(UnderstandingResult understandingResult  /* [3] */) {
            // After the natural language processing is completed, it will return here. At this time, you can call the speech synthesis interface to talk with the user.
        }
    }).fail(new FailCallback<UnderstandingException>() {
        @Override
        public void onFail(UnderstandingException e) {
            // if the natural language processing goes wrong, it will be run here
        }
    });

[1]Enter the text sentences for natural language processing, it supports multiple sentences with punctuations.

[2] The asynchronous object of natural language processing is returned, through which the result can be obtained and the processing can be cancelled. See Promise for specific usage.

[3]Returns the results of natural language processing, including:

Attribute getter	Descriptions
UnderstandingResult.sessionId	Session id, it will remain unchanged during multiple rounds of conversation.
UnderstandingResult.source	Service providers of natural language processing
UnderstandingResult.inputText	Texts that need to be understood
UnderstandingResult.language	System language
UnderstandingResult.version	The version of the library of the natural language processing service provider
UnderstandingResult.sessionIncomplete	false: the conversation is completed
UnderstandingResult.contextList	Context parameter
UnderstandingResult.intent	Intent, see SpeechIntent for details
SpeechIntent.action	Intent
SpeechIntent.parameters	Word slot
SpeechIntent.score	Matching degree
UnderstandingResult.speechFulfillment	The information broadcast after natural language processing, see SpeechFulfillment for details
SpeechFulfillment.type	Source of the recognition result
SpeechFulfillment.text	type = TYPE_TEXT
SpeechFulfillment.uri	type = TYPE_URI
SpeechFulfillment.audio	type = TYPE_AUDIO
SpeechFulfillment.audioType	audio format
UnderstandingResult.fulfillmentList	Detailed results for natural language processing

By specifying options, you can adjust the behavior of natural language processing.

UnderstandingOption /* [1] */ option = new UnderstandingOption.Builder(
    “How's the weather today?").

[1] The UnderstandingOption object is constructed by UnderstandingOption.Builder, and the instructions are as follows:

Methods	Descriptions	Default value
Builder.constructor()	Construction without parameters	The natural language processing of text whose length is 0.
Builder.constructor(inputText)	During construction, you must specify the text information to perform natural language processing
Builder.setSessionId(sessionId)	Session id	Zero-length string
Builder.setVersion(version)	The version of the library of the natural language processing service provider	Zero-length string
Builder.setParameters(parameters)	Customized parameters	JsonObjectString.EMPTY_OBJECT
Builder.setContextList(contextList)	Context parameter	JsonArrayString.EMPTY_ARRAY
Builder.setTimeoutMillis(timeoutMillis)	Understanding the timeout length	0: no timeout

Speech settings

If you need to get or set the device's speech configuration, you can use the SpeechSettings object.

SpeechSettings speechSettings = speechManager.speechSettings();

Methods	Descriptions
SpeechSettings.getConfiguration()	Obtain language configuration information
SpeechSettings.setConfiguration(configutation)	Set language configuration information
SpeechSettings.registerListener(listener)	Register a speech configuration change monitor
SpeechSettings.unregisterListener(listener)	Unregister a speech configuration change monitor

To set the speech-related configuration information, you can use the following code.

SpeechConfiguration /* [1] */ configutation = new SpeechConfiguration.Builder().build();
speechSettings.setConfiguration(configutation);

[1]The speech configuration information is constructed by SpeechConfiguration.Builder, and the instructions are as follows:

Methods	Descriptions	Default value
Builder.constructor()	A structure without parameters, the recognition mode will not be specified.
Builder.setRecognitionMode(recognitionMode)	Recognition mode	0: unknown recognition mode
Builder.setSpeakingVoiceId(speakingVoiceId)	speaker	NULL
Builder.setSpeakingSpeed(speakingSpeed)	Speaking speed	0
Builder.setSpeakingVolume(speakingVolume)	volume	0

Constants	Descriptions
SpeechConfiguration.RECOGNITION_MODE_UNKNOWN	Unknown recognition mode
SpeechConfiguration.RECOGNITION_MODE_SINGLE	Single recognition mode
SpeechConfiguration.RECOGNITION_MODE_CONTINUOUS	Continuous recognition mode
SpeechConfiguration.SPEAKING_VOLUME_MAX	The maximum volume available
SpeechConfiguration.SPEAKING_VOLUME_MIN	The minimum volume available
SpeechConfiguration.SPEAKING_SPEED_FAST	Fast speech speed
SpeechConfiguration.SPEAKING_SPEED_NORMAL	Normal speech speed
SpeechConfiguration.SPEAKING_SPEED_SLOW	Slow speech speed

To obtain the speech configuration information, use the following code:

SpeechConfiguration configutation = speechSettings.getConfiguration();

If you need to perceive the speech configuration changes, use the following code:

SpeechConfigurationListener listener = new SpeechConfigurationListener() {
    @Override
    public void onConfigurationChanged(SpeechConfiguration speechConfiguration) {
        // when the device's speech configuration changes, it will be run here
    }
};
speechSettings.registerListener(listener);

If you no longer need to perceive the speech configuration changes, use the following code:

speechSettings.unregisterListener(listener);