Skip to main content

Speechly Decoder API for Android

Java API reference.

Overview

In general, passing null to any of the arguments may result in undefined behaviour, including segmentation faults that will crash the application. To be safe it is recommended to ensure that in particular the DecoderFactoryHandle and DecoderHandle instances are not null before calling any of the API functions.

Functions

SpeechlyDecoder.DecoderFactory_CreateFromModelArchive

Creates a new DecoderFactoryHandle instance from a ByteBuffer that contains the speech recognition model. Note that the isDirect() method must return true for the given buffer.

public static DecoderFactoryHandle DecoderFactory_CreateFromModelArchive(
java.nio.ByteBuffer buffer, // the model bundle bytes
long bufferSize) // the number of bytes in the buffer

Returns

A DecoderFactoryHandle that represents the Decoder Factory instance.

Throws

DecoderException if there is a problem with loading the given model buffer.

SpeechlyDecoder.DecoderFactory_GetDecoder

Creates a new DecoderHandle instance using a given DecoderFactoryHandle. The DecoderHandle is a fairly lightweight object, multiple instances share the same underlying speech recognition model.

public static DecoderHandle DecoderFactory_GetDecoder(
DecoderFactoryHandle factory, // A DecoderFactoryHandle instance
String device_id) // An id (UUID) to be used with this device

Returns

A DecoderHandle that represents the Decoder instance. The device_id must be an UUID formatted string.

Throws

DecoderException if a decoder cannot be instantiated.

SpeechlyDecoder.DecoderFactory_Destroy

Deletes a DecoderFactoryHandle instance and frees the models it uses. The ByteBuffers with the model data can be garbage collected after calling this method. Note that any DecoderHandle instances created from the given factory should each be destroyed separately before calling this method.

public static void DecoderFactory_Destroy(
DecoderFactoryHandle factory) // The DecoderFactoryHandle to be deleted

SpeechlyDecoder.DecoderFactory_GetAppId

Returns the App Id that the currently loaded model bundle is associated with.

public static String DecoderFactory_GetAppId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.DecoderFactory_GetBundleId

Returns the Bundle Id that the currently loaded model bundle is associated with.

public static String DecoderFactory_GetBundleId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.DecoderFactory_SetSegmentationDelay

The decoder can indicate longer periods of silence in the resulting transcript by an @ symbol. By default this feature is disabled. It can be enabled by calling this function to set the length of the silence that will trigger a segment boundary.

public static void DecoderFactory_SetSegmentationDelay(
DecoderFactoryHandle factory, // a DecoderFactoryHandle instance
int milliseconds) // the inter-segment delay in ms

The setting applies to all DecoderHandle instances created from this factory after the call.

A segment boundary is indicated by the corresponding CResultWord objects getWord() method returning the string @.

Note: The real granularity of the silence detection mechanism is not at the level of millisecods. It has sub-second accuracy, but the resolution is closer to one 10th of a second.

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.DecoderFactory_SetBoostVocabulary

Sets a list of words that should be detected with higher accuracy by the decoder. This is an experimental feature! After calling this method on DecoderFactoryHandle factory, all subsequent DecoderHandle instances will use the given boost vocabulary.

public static void DecoderFactory_SetBoostVocabulary(
DecoderFactoryHandle // a DecoderFactoryHandle instance
String vocabulary // a newline (\n) separated list of words
float weight) // the strength with which to boost the words

The vocabulary must be a String with the desired words concatenated with the newline \n character in between the words. The weight parameter controls the strength of the biasing. Suitable values for weight are floats in the range [-10, 10]. Negative weight results in the decoder being less likely to output the given list of words.

Note: The decoder will try to bias its output towards the words in vocabulary, but it is not guaranteed that the words will be correctly recognized even if they appear on the list.

Throws

DecoderException if an error occurrs.

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.Decoder_Destroy

Deletes a DecoderHandle instance and frees all underlying resources.

public static void Decoder_Destroy(
DecoderHandle handle) // Handle to the Decoder to be destroyed.

SpeechlyDecoder.Decoder_WriteSamples

Write audio samples to a DecoderHandle instance. The audio must have a single channel, and the samples must be represented by 32-bit floats normalized within [-1.0, 1.0]. The default sample rate is 16kHz. This can be adjusted by calling SpeechyDecoder.SetInputSampleRate before feeding the samples.

public static void Decoder_WriteSamples(
DecoderHandle handle, // The DecoderHandle instance to use for transcription
float[] samples, // an array with samples (must not be null!)
long samples_size, // number of samples to read from the array
int end_of_input) // 1 if this is the last chunk of samples, 0 otherwise

Note that the samples array can be re-used with new samples or garbage collected after calling this method. The samples_size parameter specifies how many samples to read from the buffer and must be <= samples.length.

The end_of_input parameter is a 0/1 flag that indicates if this is the last chunk of samples for this audio stream or not. Note that failing to set end_of_input = 1 after the last audio chunk may cause SpeechlyDecoder.Decoder_WaitResults to block indefinitely.

Throws

DecoderException if an error occurrs when feeding the audio.

SpeechlyDecoder.Decoder_SetInputSampleRate

Sets the sample rate for incoming audio. Natively the decoder works with 16kHz audio. If your audio has a higher sample rate, please use this function to tell the decoder the correct sample rate of your audio.

public static void Decoder_SetInputSampleRate(
DecoderHandle handle, // The DecoderHandle instance to set the sample rate on
int sample_rate) // The sample rate

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.Decoder_WaitResults

Reads a CResultWord that contains one word of transcript from a DecoderHandle instance. This method blocks until there is a new word available, or until the end of audio has been reached and the decoder is guaranteed to not return any more words.

public static CResultWord Decoder_WaitResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from

Returns

A CResultWord. End of stream is indicated by the returned word being equal to "".

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.Decoder_GetResults

Reads a CResultWord that contains one word of transcript from a Decoder. This method returns immediately.

public static CResultWord Decoder_GetResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from

Returns

A CResultWord or null when no new words are available. End of stream is indicated by the returned word being equal to "".

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.Decoder_EnableVAD

Enables Voice Activity Detection (VAD) on the given DecoderHandle instance. The main use of this is to reduce the CPU load of the decoder, as it prevents the decoder from processing those parts of the audio that are likely to contain only silence. The downside of enabling VAD is that it may introduce some errors in the resulting transcript, in particular some words may be missed if the VAD incorrectly prevents some parts from the audio from being processed.

public static void Decoder_EnableVAD(
DecoderHandle decoder // the DecoderHandle instance on which to enable VAD
int enabled) // set to 1 to enable VAD, set to 0 to disable

Throws

DecoderException if an error occurrs.

SpeechlyDecoder.Decoder_GetParamI

Get an integer parameter by parameter id.

public static int Decoder_GetParamI(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant

See Speechly Constants for valid parameter ids.

Throws

DecoderException if the given parameterId is not valid.

SpeechlyDecoder.Decoder_SetParamI

Set an integer parameter by parameter id.

public static void Decoder_SetParamI(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
int parameterValue) // The value for the parameter

See Speechly Constants for valid parameter ids.

Throws

DecoderException if the given parameterId is not valid or if parameterValue is invalid for the given parameterId.

SpeechlyDecoder.Decoder_GetParamF

Get a float parameter by parameter id.

public static int Decoder_GetParamF(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant

See Speechly Constants for valid parameter ids.

Throws

DecoderException if the given parameterId is not valid.

SpeechlyDecoder.Decoder_SetParamF

Set a float parameter by parameter id.

public static void Decoder_SetParamF(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
float parameterValue) // The value for the parameter

See Speechly Constants for valid parameter ids.

Throws

DecoderException if the given parameterId is not valid or if parameterValue is invalid for the given parameterId.

SpeechlyDecoder.Decoder_GetNumSamples

Returns the total number of samples processed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumSamples(
DecoderHandle decoder) // The DecoderHandle instance

SpeechlyDecoder_Decoder_GetNumVoiceSamples

Returns the total number of samples that are not part of silence regions processed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumVoiceSamples(
DecoderHandle decoder) // The DecoderHandle instance

SpeechlyDecoder.Decoder_GetNumCharacters

Returns the total number of characters transcribed by a given DecoderHandle instance since the previous time SpeechlyDecoder.Decoder_WriteSamples was called with end_of_input = 1.

public static int Decoder_GetNumCharacters(
DecoderHandle decoder) // The DecoderHandle instance

SpeechlyDecoder.SpeechlyDecoderVersion

A utility method that returns a String representing the version of this Speechly decoder library.

SpeechlyDecoder.SpeechlyDecoderBuild

A utility method that returns a long representing the build id of this Speechly decoder library.

SpeechlyDecoder.CResultWord_Destroy

Deallocates a CResultWord when it's no longer needed.

public static void CResultWord_Destroy(
CResultWord result_word) // The CResultWord instance to destroy

The CResultWord class

An instance of CResultWord represents a single word of transcript together with its begin and end timestamps. The timestamps are measured from the start of the audio stream. A CResultWord instance has the self-explanatory methods

  • public String getWord()
  • public long getStart_time()
  • public long getEnd_time()

End of stream is indicated by getWord() returning the empty string "".

Note that when a CResultWord instance is no longer needed, it must be disposed of by explicitly calling SpeechlyDecoder.CResultWord_Destroy.

Speechly Constants

The decoder library uses a number of integer constants to specify parameters (in the GetParamI/SetParamI/GetParamF/SetParamF methods) as well as error codes. The constants appear as static integers in the SpeechlyDecoder class.

Error constants

The relevant error constants are given below.

SpeechlyDecoder.SPEECHLY_ERROR_MISMATCH_IN_MODEL_ARCHITECTURE

The model bundle being loaded is targeted for a different machine learning backend than the one used by this library. By default, the Android SDK can only run models for Tensorflow Lite.

SpeechlyDecoder.SPEECHLY_ERROR_INVALID_MODEL

The model bundle is corrupted.

SpeechlyDecoder.SPEECHLY_ERROR_EXPIRED_MODEL

The model bundle contains a licence that has expired. All Speechly model bundles have a predefined lifetime after which the decoder library refuses to load the model.

SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_PARAMETER

The parameter Id given to SetParamI, SetParamF, GetParamI or GetParamF is invalid.

SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_PARAMETER_VALUE

The parameter value given to SetParamI, SetParamF is invalid for the given parameterId.

SpeechlyDecoder.SPEECHLY_ERROR_MEMORY_ERROR

There was an error related to memory allocation. Most likely this means you have run out of memory.

SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_ERROR

Something unexpected happened in the decoder.

SpeechlyDecoder.SPEECHLY_ERROR_NONE

No error happened. Note that the error code of a thrown DecoderException should always be something else than this. So there should never be a need to check for this value.

Parameter constants

These are valid values for the parameterId argument in the GetParamI/SetParamI/GetParamF/SetParamF methods. For the time being these are all related to internals of the Voice Activity Detection feature. In general it should not be necessary to set/get any of these, as the default values will provide good performance in most cases. These constants are subject to change, and will be more thoroughly documented in a future release. They are listed here merely for the sake of completeness.

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_TO_NOISE_DB_F

VAD signal-to-noise energy ratio needed for frame to be 'loud'.

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_NOISE_GATE_DB_F

VAD Energy threshold - below this won't trigger activation. Range (-90.0f, 0.0f).

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_NOISE_LEARN_HALFTIME_MS_I

VAD Rate of background noise learn. Defined as duration in which background noise energy is moved halfway towards current frame's energy. Range (0, 5000).

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_ACTIVATION_F

VAD 'loud' to 'silent' ratio in signal_search_frames to activate 'is_signal_detected'. Range(.0f, 1.0f).

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_RELEASE_F

VAD 'loud' to 'silent' ratio in signal_search_frames to keep 'is_signal_detected' active. Only evaluated when the sustain period is over. Range(.0f, 1.0f).

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_SUSTAIN_MS_I

VAD duration to keep 'is_signal_detected' active. Renewed as long as VADActivation is holds true. Range(0, 8000).

SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_SEARCH_FRAMES_I

VAD number of past audio frames analyzed by energy threshold VAD. Range(1, 32).