Skip to main content

Glossary

Glossary of Speechly terminology with links to more detailed documentation. Check out also a more general Machine Learning and Speech Recognition Glossary in our blog!

Android client

The Android client for Speechly Cloud is available in Maven Central Repository and also in our GitHub. A client with support for on-device SLU is available on Enterprise plans.

Annotating

Annotating means marking the intents and entities that should be extracted from users’ utterances with Speechly annotation language (SAL). Also known as labelling.

Speechly offers annotation service on Enterprise plans.

Audio adaptation

Audio adaptation uses pre-transcribed audio to teach our speech recognition model the domain-specific vocabulary and tune it to your acoustic environment, significantly improving the accuracy. Audio adaptation is available on Enterprise plans.

Application

A Speechly application or app is the set of training data, including SAL configuration, and settings that define how the Speechly SLU should process your users’ utterances. Note that in some complex situations, your software might take use of multiple Speechly applications.

Batch API

Speechly Batch API is our gRPC application programming interface for asynchronous speech recognition. Batch API is available in cloud and on-premise deployments.

Browser Client

A browser client for using Speechly Cloud in a web app is available from NPM and our GitHub.

CLI (Command line interface)

Speechly CLI, or command line interface, lets you manage your applications, deploy new versions, download configurations, evaluate accuracy and more.

Cloud

Speechly Cloud is a deployment option where Speechly SLU is run in our Cloud environment.

Dashboard

Speechly Dashboard is the place where you can manage your applications, deploy new versions, and preview the SLU in action.

Decoder API

Speechly Decoder API is the C application programming interface of our on-device SLU solution, available on Enterprise plans.

Entity

Entities are “local snippets of information” in an utterance that describe details relevant to the users need. Entities are annotated with [entity value](entity name) notation of SAL.

Entity data type

Entity Data Type specifies how the entity values of an entity should be post-processed.

By default, entity values are the matching words as spoken, for example entity matching the words “three hundred”, would have entity value three hundred. It is often more useful to have the value post-processed to a structured form. With Entity Data Type number would the entity value for “three hundred” would be 300 instead.

Example utterance

An utterance with the relevant intent and entities annotated with SAL. Example utterances are used to teach our SLU what parts in the utterance are relevant for your use-case.

Expand

SAL configuration containing SAL expressions using our SAL template notation cannot directly be used to train our SLU. Such expressions are first expanded to SAL example utterances that only contain intent and entity.

Intent

The intent in an utterance indicates what the user in general wants. The intent often correlates to the primary verb of the utterance. Intents are annotated with *entity_name syntax in SAL.

iOS client

The iOS client for Speechly Cloud is available in using Swift Package Manager and also in our GitHub. A client with support for on-device SLU is available on Enterprise plans.

Model

Speechly provides multiple off-the-shelf speech recognition models for different uses. These can be further optimized to your specific use case with audio adaptation and text adaptation.

NLU (Natural language understanding)

NLU, or natural language understanding, is a subprocess for Natural language processing (NLP) that comprehends text input and transforms it into structured data. In our NLU this data is structured in intents and entities.

On-device

Speechly On-device is a deployment option where Speechly SLU is run in the end users’ device. This feature is available on our Enterprise plans.

On-premise

Speechly On-premise is a deployment option where Speechly SLU is run in your data center or private cloud. This feature is available on our Enterprise plans.

SAL (Speechly annotation language)

SAL, or Speechly Annotation Language, is our domain specific language for annotating users’ utterances with intents and entities. In addition to the annotation syntax, SAL provides powerful template notation to generate artificial training data when real world example utterances are not available.

SAL Configuration

SAL Configuration is the training data written in SAL used to teach our SLU what intents and entities are relevant to your use-case.

SLU (Spoken language understanding)

Our SLU, that is spoken language understanding, solution extracts meaning out of the speech by transcribing the audio input into text and finding the intent and entities in it. SLU can be seen as combining both speech recognition and natural language understanding (NLU).

Speech recognition

Speech recognition is the capability of the computer system to decipher spoken words and phrases and transcribe it into text. Also known as automatic speech recognition (ASR) and speech-to-text (STT, S2T).

Streaming API

Speechly Streaming API is our gRPC application programming interface for synchronous, realtime spoken language understanding. Streaming API is available in cloud and on-premise deployments.

Text adaptation

Text adaptation uses written example utterances to teach our speech recognition model the domain-specific vocabulary and sentence structure, improving the accuracy. The SAL configuration is automatically used as source for text adaptation.

Unity client

Unity client for Speechly Cloud is available in our GitHub. On Enterprise plans it is possible to extend the client with on-device support.

VAD (Voice activity detection)

VAD or voice activity detection is a lightweight technique to detect the presence of speech in audio. We use it to limit the actual speech recognition to only those parts of the audio. This reduces the overall resource consumption if the input contains silence or other non-speech parts.

VAD is available in our browser-client and on-device.