Model adaptation
The off-the-shelf speech recognition models are good for basic speech-to-text usage. To improve the model accuracy even further, they should be adapted to your specific domain.
Speechly offers two kinds of model adaptation: Text adaptation and Audio adaptation. Please note that model adaptation is only available for Speechly Conformer RNN-T models.
Text adaptation
Text adaptation is done by writing example utterances of things your users might say. This teaches our speech recognition system the vocabulary that is relevant in your application. An application may require the use of uncommon words (e.g. obscure brand names or specialist jargon) that must explicitly be taught to our speech recognition model.
To view your text adaptation data:
- Log in to Speechly Dashboard
- Open the Application you wish to inspect
- Go to the Training data tab
- Find the Text adaptation section
You can also view the text adaptation in your config.yaml
file, if using file-based configuration.
Text adaptation is available on all plans!
Writing example utterances
Example utterances should cover every functionality of your application. In general, the more you can provide, the better the model will be.
Since Speechly is a spoken language understanding system, it is important to use example utterances that reflect how users talk. An example utterance is probably good, if it sounds natural when spoken out aloud.
For example, a simple customer service application might have a configuration like this:
*_ hi this is john smith, what can i do for you today?
*_ hi, i have a question regarding my recent order, as it hasn't yet arrived
*_ i’m so sorry to hear that
*_ if you give me your full name and order number, i can check in on that order?
*_ ok, so my name is james bond and my order number is zero zero seven
...
Speechly expects every example utterance must have an intent assigned to it. As we are not interested in intent detection in this example, we are using *_
(an empty intent).
Leverage user data
As your application gets more usage, you can see how your users are talking to your application and use this information to improve and expand your text adaptation. This data is also valuable when used for evaluating the accuracy of your application.
To view your user data:
- Log in to Speechly Dashboard
- Open the Application you wish to inspect
- Go to the User data tab
You can also use the utterances
command, if using Speechly CLI.
Natural Language Understanding
Speechly has Natural Language Understanding (NLU) features built into it and is capable of detecting intents and entities, among other things. Intent and entity detection is useful if you want to perform custom downstream actions in your application. By default NLU features are disabled and Speechly operates in speech-to-text mode.
Audio adaptation
Audio adaptation is the other type of model adaptation Speechly provides. In general, audio adaptation results in higher accuracy than text adaptation alone. We recommended using both audio and text adaptation to achieve the highest possible accuracy.
Audio adaptation is done by providing a set of user audio files together with correct transcripts and annotations. This data is used during model training to teach our speech recognition system the vocabulary that is relevant in your application and how it is said.
To specify the audio adaptation package:
- Log in to Speechly Dashboard
- Open the Application you wish to edit
- Go to the Training data tab
- Find the Audio adaptation section
You can also specify the adaptation package in your config.yaml
file, if using file-based configuration.
Annotation service
Annotating audio files is a time consuming task, that's why we offer an annotation service where our Natural Language Specialist make sure that whatever your users are saying gets transcribed and annotated correctly. You can use this curated data to further improve audio adaptation.
Audio adaptation and Annotation service are available on Enterprise plans.