Skip to main content

Speechly On-premise

Transcribe large quantities of pre-recorded audio accurately and asynchronously in a customized installation, for example on-premise or in your private cloud.


Most speech recognition solutions are SaaS products. This requires sending large amounts of audio over the Internet to be processed in the cloud. However, for services that require asynchronous processing of tens of thousands of hours of audio per day, cloud-based solutions are often too expensive or there might be privacy concerns. These difficulties are easily overcome by deploying the speech recognition software in a customized installation.

Speechly On-premise is at its core an instance of Speechly Batch API that's deployed for example on your on-premise server or in your private cloud. It uses the same proprietary speech recognition models trained on tens of thousands of hours of speech that also power Speechly Cloud.

Enterprise feature

Deploying Speechly on-premise is available on Enterprise plans.


  1. Adapt a model to your specific domain in collaboration with the Speechly team. Optional, you can choose to use an off-the-shelf model instead.
  2. Deploy an instance of the Speechly Batch API to your on-premise environment or private cloud. This steps needs coordination and planning with the Speechly team.
  3. Use the Speechly Batch API instance to transcribe pre-recorded audio files asynchronously.

At the moment on-premise deployment is not a part of our self-service offering, so everything is done in tight collaboration with the Speechly team.

Technical specifications

  • The Batch API deployment can be scaled from 4 AMD64 CPU virtual machine to thousands of CPUs and GPUs. When scaling up, note that network bandwidth, memory and disk space will need to scale up also.
  • The default supported deployment target for the Batch API is Kubernetes, using kustomize based manifest management. It is also possible to use any docker environment, but autoscaling is supported only in Kubernetes deployments.
  • The deployment will consist of between 1 to 6 images, depending on the actual configuration and capabilities of the target system. This also changes the amount of containers created, and the resources they will consume.
  • Monitoring and alerting can be managed with Prometheus. All Batch API components export metrics for scraping.
  • Base images used for the Batch API components are, at the moment: debian 11, alpine-linux and ubuntu 22.04
  • The actual API interface is the same as Speechly Cloud, except there is no need for identity tokens.


Check out the Batch API guide to learn how to transcribe pre-recorded audio files asynchronously.

API reference

Speechly Batch API