Every Speechly application needs a configuration for your specific use case.
The Speechly API takes an audio stream as input, and returns a transcript of the users speech, together with the identified intents and entities to your application. The Speechly API achieves this by applying machine learning. However, training the machine learning models requires example utterances annotated with the information specific to your application.
When “configuring” a Speechly application, you are essentially providing this training data. During deployment, this data is used both to adapt a speech-to-text model to the vocabulary present in the training examples, as well as to train NLU models for detecting application-specific intents and entities.
In general it is necessary to design the utterances for each application separately. With Speechly, the configuration serves two equally important purposes:
Teaching our speech recognition system the vocabulary that is relevant in your application. An application may require the use of uncommon words (e.g. obscure brand names or specialist jargon) that must explicitly be taught to our speech recognition model.
Defining the information (intents and entities) that should be extracted from users' utterances. It is difficult to provide ready-made configurations that would sufficiently suit a variety of use-cases. The set of intents and entities are tightly coupled with the workings of each specific application.
Last updated by Antti Ukkonen on April 27, 2021 at 17:31 +0300
Found an error on our documentation? Please file an issue or make a pull request