Skip to main content

Speech recognition and synthesis

Bots that make and accept calls use automatic speech recognition and synthesis:

  • Automatic Speech Recognition (ASR) is the process of translating speech to text.
  • Text-To-Speech (TTS), or speech synthesis, is the process of generating speech from written text.

When creating a phone channel, you can do either of the following:

Then, you will need to use the a tag or the $reactions.answer method for generating replies from the script.

Speech synthesis markup

To make the bot’s speech more expressive, you can use speech synthesis markup. JAICP supports Speech Synthesis Markup Language (SSML) that allows you to customize the speech tone, pronunciation, speed, volume, etc. Learn more about SSML in Speech synthesis markup.

Speech synthesis with variables

You can also use speech synthesis with variables if you want to use context-dependent variables that should be mentioned throughout the dialog. For more information, see Speech synthesis with variables.

Changing ASR and TTS settings from the script

The settings configured for the speech recognition and synthesis provider apply to all calls made through the phone channel. However, you can override them for each individual call if necessary: for example, you can switch the recognition language mid-conversation or change the voice in which the bot talks to a specific user.

To control the ASR and TTS settings from the script, use the $dialer built-in service methods:

MethodAction
getAsrProvider
getTtsProvider
Get the ASR/TTS provider name.
getAsrConfig
getTtsConfig
Get the current ASR/TTS settings.
setAsrConfig
setTtsConfig
Override the ASR/TTS settings.