Skip to main content

Model adaptation

The off-the-shelf speech recognition models are good for basic speech-to-text usage. To improve the model accuracy even further, they should be adapted to your specific domain.

Speechly offers two kinds of model adaptation: Text adaptation and Audio adaptation.

Text adaptation

Text adaptation is done by writing example utterances of things your users might say. This teaches our speech recognition system the vocabulary that is relevant in your application. An application may require the use of uncommon words (e.g. obscure brand names or specialist jargon) that must explicitly be taught to our speech recognition model.

To view your text adaptation data:

  1. Log in to Speechly Dashboard
  2. Open the Application you wish to inspect
  3. Go to the Training data tab
  4. Find the Text adaptation section

You can also view the text adaptation in your config.yaml file, if using file-based configuration.

Free feature

Text adaptation is available on all plans!

Writing example utterances

Example utterances should cover every functionality of your application. In general, the more you can provide, the better the model will be.

Since Speechly is a spoken language understanding system, it is important to use example utterances that reflect how users talk. An example utterance is probably good, if it sounds natural when spoken out aloud.

For example, a simple customer service application might have a configuration like this:

*_ hi this is john smith, what can i do for you today?
*_ hi, i have a question regarding my recent order, as it hasn't yet arrived
*_ i’m so sorry to hear that
*_ if you give me your full name and order number, i can check in on that order?
*_ ok, so my name is james bond and my order number is zero zero seven
...
Gotcha

Speechly expects every example utterance must have an intent assigned to it. As we are not interested in intent detection in this example, we are using *_ (an empty intent).

Leverage user data

As your application gets more usage, you can see how your users are talking to your application and use this information to improve and expand your text adaptation. This data is also valuable when used for evaluating the accuracy of your application.

To view your user data:

  1. Log in to Speechly Dashboard
  2. Open the Application you wish to inspect
  3. Go to the User data tab

You can also use the utterances command, if using Speechly CLI.

Natural Language Understanding

Speechly has Natural Language Understanding (NLU) features built into it and is capable of detecting intents and entities, among other things. Intent and entity detection is useful if you want to perform custom downstream actions in your application. By default NLU features are disabled and Speechly operates in speech-to-text mode.

Audio adaptation

Audio adaptation is the other type of model adaptation Speechly provides. In general, audio adaptation results in higher accuracy than text adaptation alone. We recommended using both audio and text adaptation to achieve the highest possible accuracy.

Audio adaptation is done by providing a set of user audio files together with correct transcripts and annotations. This data is used during model training to teach our speech recognition system the vocabulary that is relevant in your application and how it is said.

To specify the audio adaptation package:

  1. Log in to Speechly Dashboard
  2. Open the Application you wish to edit
  3. Go to the Training data tab
  4. Find the Audio adaptation section

You can also specify the adaptation package in your config.yaml file, if using file-based configuration.

Annotation service

Annotating audio files is a time consuming task, that's why we offer an annotation service where our Natural Language Specialist make sure that whatever your users are saying gets transcribed and annotated correctly. You can use this curated data to further improve audio adaptation.

Enterprise feature

Audio adaptation and Annotation service are available on Enterprise plans.