Speech service

The speech service provides API with the ability to call the device’s speech related capabilities, such as "speak", "listen", and “understand”. As the speech service access agent, SpeechManager provides the main API of the speech service and can be obtained through RobotContext.

SpeechManager speechManager = aRobotContext.getSystemService(SpeechManager.SERVICE);

In the field of speech technology, there are terms such as Keyword Spotting (KWS), Text to Speech (TTS), Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), and Natural Language Processing (NLP). These terms refer to the subdivided technologies of the field of speech technology, such as speech service wake-up, text-to-speech, speech recognition, natural language understanding and processing. The speech service is realized using these technologies, and calling the SpeechManager API is actually a call to these technical capabilities. The following chapters will introduce them one by one.

Speech service wake-up

When the user speaks a wake-up word to the device, using the speech wake-up technology, the device can perceive that the user is "calling it", and then it can start interacting with the user. This process is called "speech service wake-up". If you need to perceive speech wake-up, use the following code.

WakeUpListener wakeUpListener = new WakeUpListener() {
    @Override
    public void onWakingUp(WakeUp wakeUp /* [1] */) {
        // if it is perceived that the user has said the wake-up word, it will run here
    }
};
speechManager.registerWakeUpListener(wakeUpListener);

[1] The WakeUp object describes the environmental information at the time of speech service wake-up, including:

Attribute getter Descriptions
WakeUp.when The timestamp of the wake-up time, unit: ms.
WakeUp.voiceDirection The sound source of the wake-up time

If you no longer need to perceive speech service wake-up, use the following code.

speechManager.unregisterWakeUpListener(wakeUpListener);

Text-to-speech

When you need the device to speak like a human, using the text-to-speech technology, text can be synthesized into human voices and played. This process is realized using the following code.

promise /* [2] */ = speechManager.synthesize("Hello, what can I do for you?" /* [1] */)
    .progress(new ProgressCallback<SynthesisProgress>() {
        @Override
        public void onProgress(SynthesisProgress progress /* [3] */) {
            // The synthetic process will be run here multiple times
        }
    }).done(new DoneCallback<Void>() {
        @Override
        public void onDone(Void aVoid) {
            // “Hello, what can I do for you?” After saying that, it will be run here
        }
    }).fail(new FailCallback<SynthesisException>() {
        @Override
        public void onFail(SynthesisException e) {
            // If the synthetic process goes wrong, it will be run here
        }
    });

[1]Enter the text sentences to be synthesized, it supports multiple sentences with punctuation.

[2]The asynchronous object that is waiting for the progress and result of the synthesis is returned, through which you can wait for or monitor the progress and results, and cancel the synthesis process. See Promise for specific usage.

[3]The SynthesisProgress object of the asynchronous callback describes the progress information of speech synthesis, including:

Attribute getter Descriptions
SynthesisProgress.remainingTimeMillis Remaining playback time
SynthesisProgress.playProgress Playback progress
SynthesisProgress.audioBytes Play the resource, some platforms may not provide this resource.

By specifying options, you can adjust the behavior of speech synthesis.

SynthesisOption /*[1]*/ option = new SynthesisOption.Builder("Hello, what can I do for you?"). 

[1] The SynthesisOption object is constructed through SynthesisOption.Builder, and the instructions are as follows:

Methods Descriptions Default value
Builder.constructor(inputText) When constructing, you must import synthetic texts, NULL objects or strings whose lengths are 0 are not allowed.
Builder.setSpeakingSpeed(speakingSpeed) Set the speaking speed 0
Builder.setSpeakingVolume(speakingVolume) Set the speaking volume 0
Builder.setSpeakingVoiceId(speakingVoiceId) Set the speaker, if you want to know the speakers that the system has, please refer to the speech settings Zero-length string
Constants Descriptions
SynthesisOption.SPEAKING_VOLUME_MAX The maximum volume available
SynthesisOption.SPEAKING_VOLUME_MIN The minimum volume available
SynthesisOption.SPEAKING_SPEED_FAST Fast speech speed
SynthesisOption.SPEAKING_SPEED_NORMAL Normal speech speed
SynthesisOption.SPEAKING_SPEED_SLOW Slow speech speed

If you want to know the current synthesis status, use the following code.

boolean isSynthesizing = speechManager.isSynthesizing();

If true is returned, it means that the synthesis is in progress.

Speech recognition

When the user speaks to the device, the speech recognition technology can be used to convert speech into text or instructions. This process is realized using the following code.

RecognitionOption /* [1] */ option = new RecognitionOption.Builder(
                RecognitionOption.MODE_SINGLE).build();
promise /* [2] */ = speechManager.recognize(option)
    .progress(new ProgressCallback<RecognitionProgress>() {
        @Override
        public void onProgress(RecognitionProgress recognitionProgress /* [3] */) {
            // the recognition progress will be run here multiple times
        }
    }).done(new DoneCallback<RecognitionResult>() {
        @Override
        public void onDone(RecognitionResult recognitionResult /* [4] */) {
            // if the recognition progress is completed, the final recognition result will be returned
        }
    }).fail(new FailCallback<RecognitionException>() {
        @Override
        public void onFail(RecognitionException e) {
            // if the recognition progress goes wrong, it will be run here.
        }
    });

[1]Speech recognition options, which can be used to adjust the speech recognition behavior. It is built with RecognitionOption.Builder, and the instructions are as follows:

Methods Descriptions Default value
Builder.constructor(mode) You must specify a recognition mode when constructing.
Builder.setDistanceRange(distanceRange) Recognition scenarios Near-field recognition
Builder.setTimeoutMillis(timeoutMillis) Recognize the timeout length No timeout
Builder.setUnderstandingOption(understandingOption) Natural language processing options NULL
Builder.setExtension(extension) Customized field NULL
Constants Descriptions
RecognitionOption.MODE_SINGLE Single recognition: It is generally used to start speech recognition once after waking up. The recognition progress will be stopped after getting the result, and it will wait for the next wake up.
RecognitionOption.MODE_CONTINUOUS Continuous recognition: Uninterrupted recognition after wake-up is started, the data is returned after recognition, and the recognition will be continued.
RecognitionOption.DISTANCE_RANGE_NEAR_FIELD Near-field recognition
RecognitionOption.DISTANCE_RANGE_FAR_FIELD Far-field recognition
RecognitionOption.DISTANCE_RANGE_CLOSE_TALK Super near field recognition

[2]The asynchronous object that is waiting for the recognition progress and results is returned, through which you can wait for or monitor the progress and results, and cancel the synthesis process. See Promise for specific usage.

[3] The RecognitionProgress object of the asynchronous callback describes the progress information of speech recognition, including:

Attribute getter Descriptions
RecognitionProgress.decibel Decibels recognized
RecognitionProgress.textResult Text converted from the recognized speech
RecognitionProgress.understandingResult The result of the recognized speech after natural language processing. For details, see Natural Language Processing.
RecognitionProgress.progress Recognition progress
Constants Descriptions
RecognitionResult.began Start of recognition
RecognitionResult.ended End of recognition
RecognitionProgress.PROGRESS_RECOGNIZING Recognizing
RecognitionProgress.PROGRESS_RECOGNITION_TEXT_RESULT Text recognized
RecognitionProgress.PROGRESS_UNDERSTANDING_RESULT In natural language processing

[4] The RecognitionResult object of the asynchronous callback describes the final result of speech recognition, including:

Attribute getter Descriptions
RecognitionResult.text The text recognized
RecognitionResult.withUnderstandingResult The recognition result is processed by natural language.
RecognitionResult.understandingResult The processing results by natural language

For single recognition, if you don't want to build recognition options, you can also use an interface without parameters. The specific process is as follows:

promise = speechManager.recognize();

If you want to know the current recognition status, use the following code.

boolean isRecognizing = speechManager.isRecognizing();

If true is returned, it means that it is recognizing.

Natural language processing

When you need the device to have the ability to "listen" like a human, using natural language processing technology, the device can sense that the user is talking to it and then start interacting with the user. This process is realized by the following code.

promise /* [2] */ = speechManager.understand("How's the weather today?" /* [1] */)
    .done(new DoneCallback<UnderstandingResult>() {
        @Override
        public void onDone(UnderstandingResult understandingResult  /* [3] */) {
            // After the natural language processing is completed, it will return here. At this time, you can call the speech synthesis interface to talk with the user.
        }
    }).fail(new FailCallback<UnderstandingException>() {
        @Override
        public void onFail(UnderstandingException e) {
            // if the natural language processing goes wrong, it will be run here
        }
    });

[1]Enter the text sentences for natural language processing, it supports multiple sentences with punctuations.

[2] The asynchronous object of natural language processing is returned, through which the result can be obtained and the processing can be cancelled. See Promise for specific usage.

[3]Returns the results of natural language processing, including:

Attribute getter Descriptions
UnderstandingResult.sessionId Session id, it will remain unchanged during multiple rounds of conversation.
UnderstandingResult.source Service providers of natural language processing
UnderstandingResult.inputText Texts that need to be understood
UnderstandingResult.language System language
UnderstandingResult.version The version of the library of the natural language processing service provider
UnderstandingResult.sessionIncomplete false: the conversation is completed
UnderstandingResult.contextList Context parameter
UnderstandingResult.intent Intent, see SpeechIntent for details
SpeechIntent.action Intent
SpeechIntent.parameters Word slot
SpeechIntent.score Matching degree
UnderstandingResult.speechFulfillment The information broadcast after natural language processing, see SpeechFulfillment for details
SpeechFulfillment.type Source of the recognition result
SpeechFulfillment.text type = TYPE_TEXT
SpeechFulfillment.uri type = TYPE_URI
SpeechFulfillment.audio type = TYPE_AUDIO
SpeechFulfillment.audioType audio format
UnderstandingResult.fulfillmentList Detailed results for natural language processing

By specifying options, you can adjust the behavior of natural language processing.

UnderstandingOption /* [1] */ option = new UnderstandingOption.Builder(
    How's the weather today?"). 

[1] The UnderstandingOption object is constructed by UnderstandingOption.Builder, and the instructions are as follows:

Methods Descriptions Default value
Builder.constructor() Construction without parameters The natural language processing of text whose length is 0.
Builder.constructor(inputText) During construction, you must specify the text information to perform natural language processing
Builder.setSessionId(sessionId) Session id Zero-length string
Builder.setVersion(version) The version of the library of the natural language processing service provider Zero-length string
Builder.setParameters(parameters) Customized parameters JsonObjectString.EMPTY_OBJECT
Builder.setContextList(contextList) Context parameter JsonArrayString.EMPTY_ARRAY
Builder.setTimeoutMillis(timeoutMillis) Understanding the timeout length 0: no timeout

Speech settings

If you need to get or set the device's speech configuration, you can use the SpeechSettings object.

SpeechSettings speechSettings = speechManager.speechSettings();
Methods Descriptions
SpeechSettings.getConfiguration() Obtain language configuration information
SpeechSettings.setConfiguration(configutation) Set language configuration information
SpeechSettings.registerListener(listener) Register a speech configuration change monitor
SpeechSettings.unregisterListener(listener) Unregister a speech configuration change monitor

To set the speech-related configuration information, you can use the following code.

SpeechConfiguration /* [1] */ configutation = new SpeechConfiguration.Builder().build();
speechSettings.setConfiguration(configutation);

[1]The speech configuration information is constructed by SpeechConfiguration.Builder, and the instructions are as follows:

Methods Descriptions Default value
Builder.constructor() A structure without parameters, the recognition mode will not be specified.
Builder.setRecognitionMode(recognitionMode) Recognition mode 0: unknown recognition mode
Builder.setSpeakingVoiceId(speakingVoiceId) speaker NULL
Builder.setSpeakingSpeed(speakingSpeed) Speaking speed 0
Builder.setSpeakingVolume(speakingVolume) volume 0
Constants Descriptions
SpeechConfiguration.RECOGNITION_MODE_UNKNOWN Unknown recognition mode
SpeechConfiguration.RECOGNITION_MODE_SINGLE Single recognition mode
SpeechConfiguration.RECOGNITION_MODE_CONTINUOUS Continuous recognition mode
SpeechConfiguration.SPEAKING_VOLUME_MAX The maximum volume available
SpeechConfiguration.SPEAKING_VOLUME_MIN The minimum volume available
SpeechConfiguration.SPEAKING_SPEED_FAST Fast speech speed
SpeechConfiguration.SPEAKING_SPEED_NORMAL Normal speech speed
SpeechConfiguration.SPEAKING_SPEED_SLOW Slow speech speed

To obtain the speech configuration information, use the following code:

SpeechConfiguration configutation = speechSettings.getConfiguration();

If you need to perceive the speech configuration changes, use the following code:

SpeechConfigurationListener listener = new SpeechConfigurationListener() {
    @Override
    public void onConfigurationChanged(SpeechConfiguration speechConfiguration) {
        // when the device's speech configuration changes, it will be run here
    }
};
speechSettings.registerListener(listener);

If you no longer need to perceive the speech configuration changes, use the following code:

speechSettings.unregisterListener(listener);