Glossary of Speechly terminology with links to more detailed documentation. Check out also a more general Machine Learning and Speech Recognition Glossary in our blog!
Annotating means marking the intents and entities that should be extracted from users’ utterances with Speechly annotation language (SAL). Also known as labelling.
Speechly offers annotation service on Enterprise plans.
Audio adaptation uses pre-transcribed audio to teach our speech recognition model the domain-specific vocabulary and tune it to your acoustic environment, significantly improving the accuracy. Audio adaptation is available on Enterprise plans.
A Speechly application or app is the set of training data, including SAL configuration, and settings that define how the Speechly SLU should process your users’ utterances. Note that in some complex situations, your software might take use of multiple Speechly applications.
Speechly Batch API is our gRPC application programming interface for asynchronous speech recognition. Batch API is available in cloud and on-premise deployments.
CLI (Command line interface)
Speechly CLI, or command line interface, lets you manage your applications, deploy new versions, download configurations, evaluate accuracy and more.
Speechly Dashboard is the place where you can manage your applications, deploy new versions, and preview the SLU in action.
Entities are “local snippets of information” in an utterance that describe details relevant to the users need. Entities are annotated with
[entity value](entity name) notation of SAL.
Entity data type
Entity Data Type specifies how the entity values of an entity should be post-processed.
By default, entity values are the matching words as spoken, for example entity matching the words “three hundred”, would have entity value
three hundred. It is often more useful to have the value post-processed to a structured form. With Entity Data Type
number would the entity value for “three hundred” would be
An utterance with the relevant intent and entities annotated with SAL. Example utterances are used to teach our SLU what parts in the utterance are relevant for your use-case.
SAL configuration containing SAL expressions using our SAL template notation cannot directly be used to train our SLU. Such expressions are first expanded to SAL example utterances that only contain intent and entity.
The intent in an utterance indicates what the user in general wants. The intent often correlates to the primary verb of the utterance. Intents are annotated with
*entity_name syntax in SAL.
Speechly provides multiple off-the-shelf speech recognition models for different uses. Conformer RNN-T models can be further optimized to your specific use case with audio adaptation and text adaptation.
NLU (Natural language understanding)
NLU, or natural language understanding, is a subprocess for Natural language processing (NLP) that comprehends text input and transforms it into structured data. In our NLU this data is structured in intents and entities.
SAL (Speechly annotation language)
SAL, or Speechly Annotation Language, is our domain specific language for annotating users’ utterances with intents and entities. In addition to the annotation syntax, SAL provides powerful template notation to generate artificial training data when real world example utterances are not available.
SAL Configuration is the training data written in SAL used to teach our SLU what intents and entities are relevant to your use-case.
SLU (Spoken language understanding)
Our SLU, that is spoken language understanding, solution extracts meaning out of the speech by transcribing the audio input into text and finding the intent and entities in it. SLU can be seen as combining both speech recognition and natural language understanding (NLU).
Speech recognition is the capability of the computer system to decipher spoken words and phrases and transcribe it into text. Also known as automatic speech recognition (ASR) and speech-to-text (STT, S2T).
Speechly Streaming API is our gRPC application programming interface for synchronous, realtime spoken language understanding. Streaming API is available in cloud and on-premise deployments.
Text adaptation uses written example utterances to teach our speech recognition model the domain-specific vocabulary and sentence structure, improving the accuracy. The SAL configuration is automatically used as source for text adaptation.
VAD (Voice activity detection)
VAD or voice activity detection is a lightweight technique to detect the presence of speech in audio. It's used to limit the actual speech recognition to only those parts of the audio. This reduces the overall resource consumption if the input contains silence or other non-speech parts. VAD is available on multiple clients for Speechly Cloud and Speechly On-device.