Model adaptation

Speechly Conformer RNN-T models can be further optimized to your specific use case or domain with the use of Text adaptation and Audio adaptation. Model adaptation significantly increases the model accuracy.

Overview

The model adaptation process looks like this:

You provide training data (text or audio)
The model is trained using this data
Upon successful training an adapted model is created

The application status changes during model adaptation and the status is visible in both Speechly Dashboard and Speechly CLI.

Text adaptation

Text adaptation is done by providing example utterances of things your users might say. This teaches the speech recognition system the vocabulary that is relevant in your application. For example, your application may require the use of uncommon words, obscure brand names or specialist jargon that must explicitly be taught to the speech recognition model.

Viewing text training data

Log in to Speechly Dashboard
Open the Application you wish to inspect
Go to the Training data tab
Find the Text adaptation section

When using file-base configuration, the text training data is specified using the templates option.

Writing example utterances

Example utterances should cover every functionality of your application. In general, the more you can provide, the better the model will be.

Since Speechly is a spoken language understanding system, it is important to use example utterances that reflect how users talk. An example utterance is probably good, if it sounds natural when spoken out aloud.

For example, a simple customer service application might have a configuration like this:

*_ hi this is john smith, what can i do for you today?
*_ hi, i have a question regarding my recent order, as it hasn't yet arrived
*_ i’m so sorry to hear that
*_ if you give me your full name and order number, i can check in on that order?
*_ ok, so my name is james bond and my order number is zero zero seven
...

Gotcha

Speechly expects every example utterance must have an intent assigned to it. As we are not interested in intent detection in this example, we are using *_ (an empty intent).

Leverage user data

As your application gets more usage, you can see how your users are talking to your application and use this information to improve and expand your text adaptation. This data is also valuable when used for evaluating the accuracy of your application.

To view your user data:

Log in to Speechly Dashboard
Open the Application you wish to inspect
Go to the User data tab

You can also use the utterances command, if using Speechly CLI.

Natural Language Understanding

Speechly has Natural Language Understanding (NLU) features built into it and is capable of detecting intents and entities, among other things. Intent and entity detection is useful if you want to perform custom downstream actions in your application. By default NLU features are disabled and Speechly operates in speech-to-text mode.

Audio adaptation

Audio adaptation is done by providing a audio adaptation package, which is a set of user audio files together with correct transcripts and annotations. This data is used during model training to teach our speech recognition system the vocabulary that is relevant in your application and how it is said.

In general, audio adaptation results in higher accuracy than text adaptation alone. We recommended using both audio and text adaptation to achieve the highest possible accuracy.

Enterprise feature

Audio adaptation and Annotation service are available on Enterprise plans.

Viewing audio training data

Log in to Speechly Dashboard
Open the Application you wish to edit
Go to the Training data tab
Find the Audio adaptation section

When using file-base configuration, the audio training data is specified using the adaptation_audio_package option.

Annotation service

Annotating audio files is a time consuming task, that's why we offer an annotation service where our Natural Language Specialist make sure that whatever your users are saying gets transcribed and annotated correctly. You can use this curated data to further improve audio adaptation.

Model adaptation

Overview​

Text adaptation​

Viewing text training data​

Writing example utterances​

Leverage user data​

Natural Language Understanding​

Audio adaptation​

Viewing audio training data​

Annotation service​