Speechly On-device
Transcribe live streaming audio in real-time, accurately and cost-effectively right on the users’ device.
Most speech recognition solutions are SaaS products. This requires sending large amounts of audio over the Internet to be processed in the cloud. However, for services that require real-time processing of tens of thousands of hours of audio per day, cloud-based solutions are often too expensive or cannot deliver transcripts in real-time. These difficulties are easily overcome by deploying the speech recognition software directly on the user’s device.
Speechly on-device is at its core a C-library (Speechly Decoder) that runs on a variety of CPUs and operating systems. It uses the same proprietary speech recognition models trained on tens of thousands of hours of speech that also power Speechly cloud. It can be built against different machine learning frameworks, such as TensorFlow Lite, Core ML, and ONNX Runtime to provide optimum performance on different platforms.
Deploying Speechly on-device is available on Enterprise plans.
Overview
Deploying Speechly on-device is a bit different to deploying it in the cloud, but the core concepts are the same:
- Select the application you want to use, or create a new one.
- From the Models section, select a
small
model and deploy changes if necessary. - Download the model bundle and the Speechly Decoder library and import them into your project.
- Use Speechly Decoder API to transcribe live streaming audio in real-time.
If you make changes to your training data, remember to deploy the changes, download the updated model bundle and import it to your project.
- Supported CPUs: x86, x86_64, arm32, arm64
- Minimum CPU requirement: Depends on the platform, real time decoding on modern arm64 SoCs consumes about 1-2 CPU cores.
- Supported operating systems: Android (including Oculus), iOS, Windows, Linux, Macos, BSD variants.
- Input audio: 1-channel, at least 16kHz sample rate.
- Impact on binary size (non-mobile platforms): ~6MB when linked statically, ~500k when using dynamic library that uses e.g. TensorFlow Lite or Core ML
- Model size: 70-140MB depending on accuracy requirements / available resources
Select an appropriate model
Speechly offers models of different sizes and capabilities. For on-device use, only small
models are supported.
Speechly Dashboard
- Go to Application → Overview → Model
- Select the
small
model you want to use
config.yaml
Add the following line to your config.yaml
:
model: small-lowlatency-LATEST
Download the Speechly Decoder library
The Speechly Decoder library is available for Android, iOS, and Unity. For other platforms, such as native Windows applications, we can provide either a pre-compiled dynamic or a static library plus the required header files.
Integrating the library doesn't require expertise in speech recognition, but you must be able to capture real-time audio from the device microphone. You can download the library from Speechly Dashboard by going to Application → Integrate.
Download a model bundle
To use the Speechly Decoder library you need a model bundle. They are available for three different machine learning frameworks: ONNX Runtime, TensorFlow Lite and Core ML.
Which model bundle you need depends on the platform you are developing on. Also, all model bundles have a predefined lifetime after which the Speechly Decoder library refuses to load the model.
Speechly Dashboard
- Go to Application → Overeviw → Model
- Click the version you want to download
Speechly CLI
Use the download
command:
speechly download YOUR_APP_ID . --model coreml
# Available options are: ort, tflite, coreml and all
Get started with iOS
If you would like to try out Speechly on-device streaming transcription on iOS, there's a iOS example application you can run in simulator or on an iOS device.
Before you start
Make sure you have created and deployed a Speechly application. You'll also need a Core ML model bundle and the SpeechlyDecoder.xcframework
library. See above for instructions on how to download them.
Copy the example app
Copy the example app using degit:
npx degit speechly/speechly/examples/ios-decoder-example my-ios-app
cd my-ios-app
Add dependencies
Open Decoder.xcodeproj
in Xcode and add both SpeechlyDecoder.xcframework
and YOUR_MODEL_BUNDLE.coreml.bundle
to the project by dragging and dropping them into the Frameworks folder:
Make sure Copy items if needed, Create groups and Add to targets are selected:
In Decoder/SpeechlyManager.swift
update the model bundle resource URL:
let bundle = Bundle.main.url(
forResource: "YOUR_MODEL_BUNDLE.coreml",
withExtension: "bundle"
)!
Run the app
Run the app and grant it microphone permissions when prompted.
Get started with Android
If you would like to try out Speechly on-device streaming transcription on Android, there's a Android example application you can run in emulator or on an Android device.
Before you start
Make sure you have created and deployed a Speechly application. You'll also need a TensorFlow Lite model bundle and the SpeechlyDecoder.aar
library. See above for instructions on how to download them.
Copy the example app
Copy the example app using degit:
npx degit speechly/speechly/examples/android-decoder-example my-android-app
cd my-android-app
Add dependencies
Put SpeechlyDecoder.aar
in a directory that gradle can find. For example, add a flatDir
field to the repositories section in your settings.gradle
:
pluginManagement {
repositories {
flatDir {
dirs '/path/to/decoder'
}
}
}
In your build.gradle
dependencies section add:
dependencies {
implementation 'org.tensorflow:tensorflow-lite:2.9.0'
implementation(name:'SpeechlyDecoder', ext:'aar')
}
If the file is packaged as part of the application, it may be good to ensure that it is not compressed when building the .apk
by updating the android section in your build.gradle
:
android {
aaptOptions {
noCompress 'bundle''
}
}
Open DecoderTest
in Android Studio and add YOUR_MODEL_BUNDLE.tflite.bundle
to the project by dragging and dropping it into the build/src/main/assets folder:
In MainActicity.java
update the model bundle resource:
this.bundle = loadAssetToByteBuffer("YOUR_MODEL_BUNDLE.tflite.bundle");
Run the app
Run the app and grant it microphone permissions when prompted.
Get started with C
If you would like to try out Speechly on-device streaming transcription using plain C, there's a C example program that you can compile and run. The repository contains a readme on how to use it and any prerequisites it may have.