Speechly Decoder API for Android

Java API reference.


In general, passing null to any of the arguments may result in undefined behaviour, including segmentation faults that will crash the application. To be safe it is recommended to ensure that in particular the DecoderFactoryHandle and DecoderHandle instances are not null before calling any of the API functions.



Creates a new DecoderFactoryHandle instance from a ByteBuffer that contains the speech recognition model. Note that the isDirect() method must return true for the given buffer.

public static DecoderFactoryHandle DecoderFactory_CreateFromModelArchive(
java.nio.ByteBuffer buffer, // the model bundle bytes
long bufferSize) // the number of bytes in the buffer


A DecoderFactoryHandle that represents the Decoder Factory instance.


DecoderException if there is a problem with loading the given model buffer.


Creates a new DecoderHandle instance using a given DecoderFactoryHandle. The DecoderHandle is a fairly lightweight object, multiple instances share the same underlying speech recognition model.

public static DecoderHandle DecoderFactory_GetDecoder(
DecoderFactoryHandle factory, // A DecoderFactoryHandle instance
String device_id) // An id (UUID) to be used with this device


A DecoderHandle that represents the Decoder instance. The device_id must be an UUID formatted string.


DecoderException if a decoder cannot be instantiated.


Deletes a DecoderFactoryHandle instance and frees the models it uses. The ByteBuffers with the model data can be garbage collected after calling this method. Note that any DecoderHandle instances created from the given factory should each be destroyed separately before calling this method.

public static void DecoderFactory_Destroy(
DecoderFactoryHandle factory) // The DecoderFactoryHandle to be deleted


Returns the App Id that the currently loaded model bundle is associated with.

public static String DecoderFactory_GetAppId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle


DecoderException if an error occurrs.


Returns the Bundle Id that the currently loaded model bundle is associated with.

public static String DecoderFactory_GetBundleId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle


DecoderException if an error occurrs.


The decoder can indicate longer periods of silence in the resulting transcript by an @ symbol. By default this feature is disabled. It can be enabled by calling this function to set the length of the silence that will trigger a segment boundary.

public static void DecoderFactory_SetSegmentationDelay(
DecoderFactoryHandle factory, // a DecoderFactoryHandle instance
int milliseconds) // the inter-segment delay in ms

The setting applies to all DecoderHandle instances created from this factory after the call.

A segment boundary is indicated by the corresponding CResultWord objects getWord() method returning the string @.

Note: The real granularity of the silence detection mechanism is not at the level of millisecods. It has sub-second accuracy, but the resolution is closer to one 10th of a second.


DecoderException if an error occurrs.


Sets a list of words that should be detected with higher accuracy by the decoder. This is an experimental feature! After calling this method on DecoderFactoryHandle factory, all subsequent DecoderHandle instances will use the given boost vocabulary.

public static void DecoderFactory_SetBoostVocabulary(
DecoderFactoryHandle // a DecoderFactoryHandle instance
String vocabulary // a newline (\n) separated list of words
float weight) // the strength with which to boost the words

The vocabulary must be a String with the desired words concatenated with the newline \n character in between the words. The weight parameter controls the strength of the biasing. Suitable values for weight are floats in the range [-10, 10]. Negative weight results in the decoder being less likely to output the given list of words.

Note: The decoder will try to bias its output towards the words in vocabulary, but it is not guaranteed that the words will be correctly recognized even if they appear on the list.


DecoderException if an error occurrs.


DecoderException if an error occurrs.


Deletes a DecoderHandle instance and frees all underlying resources.

public static void Decoder_Destroy(
DecoderHandle handle) // Handle to the Decoder to be destroyed.


Write audio samples to a DecoderHandle instance. The audio must have a single channel, and the samples must be represented by 32-bit floats normalized within [-1.0, 1.0]. The default sample rate is 16kHz. This can be adjusted by calling SpeechyDecoder.SetInputSampleRate before feeding the samples.

public static void Decoder_WriteSamples(
DecoderHandle handle, // The DecoderHandle instance to use for transcription
float[] samples, // an array with samples (must not be null!)
long samples_size, // number of samples to read from the array
int end_of_input) // 1 if this is the last chunk of samples, 0 otherwise

Note that the samples array can be re-used with new samples or garbage collected after calling this method. The samples_size parameter specifies how many samples to read from the buffer and must be <= samples.length.

The end_of_input parameter is a 0/1 flag that indicates if this is the last chunk of samples for this audio stream or not. Note that failing to set end_of_input = 1 after the last audio chunk may cause SpeechlyDecoder.Decoder_WaitResults to block indefinitely.


DecoderException if an error occurrs when feeding the audio.


Sets the sample rate for incoming audio. Natively the decoder works with 16kHz audio. If your audio has a higher sample rate, please use this function to tell the decoder the correct sample rate of your audio.

public static void Decoder_SetInputSampleRate(
DecoderHandle handle, // The DecoderHandle instance to set the sample rate on
int sample_rate) // The sample rate


DecoderException if an error occurrs.


Reads a CResultWord that contains one word of transcript from a DecoderHandle instance. This method blocks until there is a new word available, or until the end of audio has been reached and the decoder is guaranteed to not return any more words.

public static CResultWord Decoder_WaitResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from


A CResultWord. End of stream is indicated by the returned word being equal to "".


DecoderException if an error occurrs.


Reads a CResultWord that contains one word of transcript from a Decoder. This method returns immediately.

public static CResultWord Decoder_GetResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from


A CResultWord or null when no new words are available. End of stream is indicated by the returned word being equal to "".


DecoderException if an error occurrs.


Enables Voice Activity Detection (VAD) on the given DecoderHandle instance. The main use of this is to reduce the CPU load of the decoder, as it prevents the decoder from processing those parts of the audio that are likely to contain only silence. The downside of enabling VAD is that it may introduce some errors in the resulting transcript, in particular some words may be missed if the VAD incorrectly prevents some parts from the audio from being processed.

public static void Decoder_EnableVAD(
DecoderHandle decoder // the DecoderHandle instance on which to enable VAD
int enabled) // set to 1 to enable VAD, set to 0 to disable


DecoderException if an error occurrs.


Get an integer parameter by parameter id.

public static int Decoder_GetParamI(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant

See Speechly Constants for valid parameter ids.


DecoderException if the given parameterId is not valid.


Set an integer parameter by parameter id.

public static void Decoder_SetParamI(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
int parameterValue) // The value for the parameter

See Speechly Constants for valid parameter ids.


DecoderException if the given parameterId is not valid or if parameterValue is invalid for the given parameterId.


Get a float parameter by parameter id.

public static int Decoder_GetParamF(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant

See Speechly Constants for valid parameter ids.


DecoderException if the given parameterId is not valid.


Set a float parameter by parameter id.

public static void Decoder_SetParamF(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
float parameterValue) // The value for the parameter

See Speechly Constants for valid parameter ids.


DecoderException if the given parameterId is not valid or if parameterValue is invalid for the given parameterId.


Returns the total number of samples processed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumSamples(
DecoderHandle decoder) // The DecoderHandle instance


Returns the total number of samples that are not part of silence regions processed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumVoiceSamples(
DecoderHandle decoder) // The DecoderHandle instance


Returns the total number of characters transcribed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumCharacters(
DecoderHandle decoder) // The DecoderHandle instance


A utility method that returns a String representing the version of this Speechly decoder library.


A utility method that returns a long representing the build id of this Speechly decoder library.


Deallocates a CResultWord when it's no longer needed.

public static void CResultWord_Destroy(
CResultWord result_word) // The CResultWord instance to destroy

The CResultWord class

An instance of CResultWord represents a single word of transcript together with its begin and end timestamps. The timestamps are measured from the start of the audio stream. A CResultWord instance has the self-explanatory methods

  • public String getWord()
  • public long getStart_time()
  • public long getEnd_time()

End of stream is indicated by getWord() returning the empty string "".

Note that when a CResultWord instance is no longer needed, it must be disposed of by explicitly calling SpeechlyDecoder.CResultWord_Destroy.

Speechly Constants

The decoder library uses a number of integer constants to specify parameters (in the GetParamI/SetParamI/GetParamF/SetParamF methods) as well as error codes. The constants appear as static integers in the SpeechlyDecoder class.

Error constants

The relevant error constants are given below.


The model bundle being loaded is targeted for a different machine learning backend than the one used by this library. By default, the Android SDK can only run models for Tensorflow Lite.


The model bundle is corrupted.


The model bundle contains a licence that has expired. All Speechly model bundles have a predefined lifetime after which the decoder library refuses to load the model.


The parameter Id given to SetParamI, SetParamF, GetParamI or GetParamF is invalid.


The parameter value given to SetParamI, SetParamF is invalid for the given parameterId.


There was an error related to memory allocation. Most likely this means you have run out of memory.


Something unexpected happened in the decoder.


No error happened. Note that the error code of a thrown DecoderException should always be something else than this. So there should never be a need to check for this value.

Parameter constants

These are valid values for the parameterId argument in the GetParamI/SetParamI/GetParamF/SetParamF methods. For the time being these are all related to internals of the Voice Activity Detection feature. In general it should not be necessary to set/get any of these, as the default values will provide good performance in most cases. These constants are subject to change, and will be more thoroughly documented in a future release. They are listed here merely for the sake of completeness.


VAD signal-to-noise energy ratio needed for frame to be 'loud'.


VAD Energy threshold - below this won't trigger activation. Range (-90.0f, 0.0f).


VAD Rate of background noise learn. Defined as duration in which background noise energy is moved halfway towards current frame's energy. Range (0, 5000).


VAD 'loud' to 'silent' ratio in signal_search_frames to activate 'is_signal_detected'. Range(.0f, 1.0f).


VAD 'loud' to 'silent' ratio in signal_search_frames to keep 'is_signal_detected' active. Only evaluated when the sustain period is over. Range(.0f, 1.0f).


VAD duration to keep 'is_signal_detected' active. Renewed as long as VADActivation is holds true. Range(0, 8000).


VAD number of past audio frames analyzed by energy threshold VAD. Range(1, 32).