Skip to main content

Speechly On-device

Transcribe live streaming audio in real-time, accurately and cost-effectively right on the users’ device.

Overview

Most speech recognition solutions are SaaS products. This requires sending large amounts of audio over the Internet to be processed in the cloud. However, for services that require real-time processing of tens of thousands of hours of audio per day, cloud-based solutions are often too expensive or cannot deliver transcripts in real-time. These difficulties are easily overcome by deploying the speech recognition software directly on the user’s device.

Speechly On-device is at its core a C-library, Speechly Decoder, that runs on a variety of CPUs and operating systems. The Speechly Decoder library is available for Android, iOS, and Unity. For other platforms, such as native Windows applications, we can provide either a pre-compiled dynamic or a static library plus the required header files.

It uses the same proprietary speech recognition models trained on tens of thousands of hours of speech that also power Speechly Cloud. It can be built against different machine learning frameworks, such as TensorFlow Lite, Core ML, and ONNX Runtime to provide optimum performance on different platforms.

Enterprise feature

On-device deployment is available on Enterprise plans.

Deployment

  1. Select the application you want to use, or create a new one.
  2. From the Models section select a small model.
  3. Download the model bundle for your platform (visible below the model dropdown).
  4. Download the Speechly Decoder library from the Integrate tab.
  5. Import the model bundle and library into your project.
  6. Use Speechly Decoder API to transcribe live streaming audio in real-time.
Tip

If you make changes to your training data, remember to deploy the changes, download the updated model bundle and import it to your project.

Technical specifications

  • Supported CPUs: x86, x86_64, arm32 and arm64.
  • Minimum CPU requirement: Depends on the platform, real time decoding on modern arm64 SoCs consumes about 1-2 CPU cores.
  • Supported operating systems: Android (including Oculus), iOS, Windows, Linux, MacOS and BSD variants.
  • Input audio: 1-channel, at least 16kHz sample rate, see supported audio formats.
  • Impact on binary size (non-mobile platforms): ~6MB when linked statically, ~500k when using dynamic library that uses e.g. TensorFlow Lite or Core ML
  • Model size: 70-140MB depending on accuracy requirements / available resources.

Examples

Check out Using Speechly On-device to get started with on-device transcription on iOS, Android and C.

API reference

Speechly Decoder API