Speechly Decoder API for Android
Java API reference.
Overview
In general, passing null
to any of the arguments may result in undefined behaviour, including segmentation faults that will crash the application. To be safe it is recommended to ensure that in particular the DecoderFactoryHandle
and DecoderHandle
instances are not null
before calling any of the API functions.
Functions
SpeechlyDecoder.DecoderFactory_CreateFromModelArchive
Creates a new DecoderFactoryHandle
instance from a ByteBuffer
that
contains the speech recognition model. Note that the isDirect()
method must return true
for the given buffer.
public static DecoderFactoryHandle DecoderFactory_CreateFromModelArchive(
java.nio.ByteBuffer buffer, // the model bundle bytes
long bufferSize) // the number of bytes in the buffer
Returns
A DecoderFactoryHandle
that represents the Decoder Factory
instance.
Throws
DecoderException
if there is a problem with loading the
given model buffer.
SpeechlyDecoder.DecoderFactory_GetDecoder
Creates a new DecoderHandle
instance using a given
DecoderFactoryHandle
. The DecoderHandle
is a fairly lightweight
object, multiple instances share the same underlying speech
recognition model.
public static DecoderHandle DecoderFactory_GetDecoder(
DecoderFactoryHandle factory, // A DecoderFactoryHandle instance
String device_id) // An id (UUID) to be used with this device
Returns
A DecoderHandle
that represents the Decoder instance. The
device_id
must be an UUID formatted string.
Throws
DecoderException
if a decoder cannot be instantiated.
SpeechlyDecoder.DecoderFactory_Destroy
Deletes a DecoderFactoryHandle
instance and frees the models it
uses. The ByteBuffer
s with the model data can be garbage collected
after calling this method. Note that any DecoderHandle
instances
created from the given factory should each be destroyed separately
before calling this method.
public static void DecoderFactory_Destroy(
DecoderFactoryHandle factory) // The DecoderFactoryHandle to be deleted
SpeechlyDecoder.DecoderFactory_GetAppId
Returns the App Id that the currently loaded model bundle is associated with.
public static String DecoderFactory_GetAppId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.DecoderFactory_GetBundleId
Returns the Bundle Id that the currently loaded model bundle is associated with.
public static String DecoderFactory_GetBundleId(
DecoderFactoryHandle factory) // A DecoderFactoryHandle that has a loaded model bundle
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.DecoderFactory_SetSegmentationDelay
The decoder can indicate longer periods of silence in the resulting transcript
by an @
symbol. By default this feature is disabled. It can be enabled by
calling this function to set the length of the silence that will trigger
a segment boundary.
public static void DecoderFactory_SetSegmentationDelay(
DecoderFactoryHandle factory, // a DecoderFactoryHandle instance
int milliseconds) // the inter-segment delay in ms
The setting applies to all DecoderHandle
instances created from this factory
after the call.
A segment boundary is indicated by the corresponding CResultWord
objects
getWord()
method returning the string @
.
Note: The real granularity of the silence detection mechanism is not at the level of millisecods. It has sub-second accuracy, but the resolution is closer to one 10th of a second.
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.DecoderFactory_SetBoostVocabulary
Sets a list of words that should be detected with higher accuracy by the
decoder. This is an experimental feature! After calling this method on
DecoderFactoryHandle factory
, all subsequent DecoderHandle
instances
will use the given boost vocabulary.
public static void DecoderFactory_SetBoostVocabulary(
DecoderFactoryHandle // a DecoderFactoryHandle instance
String vocabulary // a newline (\n) separated list of words
float weight) // the strength with which to boost the words
The vocabulary
must be a String
with the desired words
concatenated with the newline \n
character in between the words. The
weight
parameter controls the strength of the biasing. Suitable
values for weight
are floats in the range [-10, 10]. Negative weight
results in the decoder being less likely to output the given list of
words.
Note: The decoder will try to bias its output towards the words in
vocabulary
, but it is not guaranteed that the words will be
correctly recognized even if they appear on the list.
Throws
DecoderException
if an error occurrs.
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.Decoder_Destroy
Deletes a DecoderHandle
instance and frees all underlying resources.
public static void Decoder_Destroy(
DecoderHandle handle) // Handle to the Decoder to be destroyed.
SpeechlyDecoder.Decoder_WriteSamples
Write audio samples to a DecoderHandle
instance. The audio must have
a single channel, and the samples must be represented by 32-bit floats
normalized within [-1.0, 1.0]. The default sample rate is 16kHz. This
can be adjusted by calling SpeechyDecoder.SetInputSampleRate
before
feeding the samples.
public static void Decoder_WriteSamples(
DecoderHandle handle, // The DecoderHandle instance to use for transcription
float[] samples, // an array with samples (must not be null!)
long samples_size, // number of samples to read from the array
int end_of_input) // 1 if this is the last chunk of samples, 0 otherwise
Note that the samples
array can be re-used with new samples or
garbage collected after calling this method. The samples_size
parameter specifies how many samples to read from the buffer and must
be <= samples.length
.
The end_of_input
parameter is a 0/1 flag that indicates if this is
the last chunk of samples for this audio stream or not. Note that
failing to set end_of_input = 1
after the last audio chunk may cause
SpeechlyDecoder.Decoder_WaitResults
to block indefinitely.
Throws
DecoderException
if an error occurrs when feeding the audio.
SpeechlyDecoder.Decoder_SetInputSampleRate
Sets the sample rate for incoming audio. Natively the decoder works with 16kHz audio. If your audio has a higher sample rate, please use this function to tell the decoder the correct sample rate of your audio.
public static void Decoder_SetInputSampleRate(
DecoderHandle handle, // The DecoderHandle instance to set the sample rate on
int sample_rate) // The sample rate
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.Decoder_WaitResults
Reads a CResultWord
that contains one word of transcript from a
DecoderHandle
instance. This method blocks until there is a new word
available, or until the end of audio has been reached and the decoder
is guaranteed to not return any more words.
public static CResultWord Decoder_WaitResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from
Returns
A CResultWord
. End of stream is indicated by the returned
word being equal to ""
.
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.Decoder_GetResults
Reads a CResultWord
that contains one word of transcript from a
Decoder. This method returns immediately.
public static CResultWord Decoder_GetResults(
DecoderHandle handle) // The DecoderHandle instance to read transcript from
Returns
A CResultWord
or null
when no new words are
available. End of stream is indicated by the returned word being equal
to ""
.
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.Decoder_EnableVAD
Enables Voice Activity Detection (VAD) on the given DecoderHandle
instance. The main use of this is to reduce the CPU load of the
decoder, as it prevents the decoder from processing those parts of the
audio that are likely to contain only silence. The downside of
enabling VAD is that it may introduce some errors in the resulting
transcript, in particular some words may be missed if the VAD
incorrectly prevents some parts from the audio from being processed.
public static void Decoder_EnableVAD(
DecoderHandle decoder // the DecoderHandle instance on which to enable VAD
int enabled) // set to 1 to enable VAD, set to 0 to disable
Throws
DecoderException
if an error occurrs.
SpeechlyDecoder.Decoder_GetParamI
Get an integer parameter by parameter id.
public static int Decoder_GetParamI(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant
See Speechly Constants for valid parameter ids.
Throws
DecoderException
if the given parameterId
is not valid.
SpeechlyDecoder.Decoder_SetParamI
Set an integer parameter by parameter id.
public static void Decoder_SetParamI(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
int parameterValue) // The value for the parameter
See Speechly Constants for valid parameter ids.
Throws
DecoderException
if the given parameterId
is not valid or
if parameterValue
is invalid for the given parameterId
.
SpeechlyDecoder.Decoder_GetParamF
Get a float parameter by parameter id.
public static int Decoder_GetParamF(
DecoderHandle decoder, // A DecoderHandle instance
long parameterId) // A parameter id constant
See Speechly Constants for valid parameter ids.
Throws
DecoderException
if the given parameterId
is not valid.
SpeechlyDecoder.Decoder_SetParamF
Set a float parameter by parameter id.
public static void Decoder_SetParamF(
DecoderHandle decoder // A DecodeHandle instance
long parameterId, // A parameter id constant
float parameterValue) // The value for the parameter
See Speechly Constants for valid parameter ids.
Throws
DecoderException
if the given parameterId
is not valid or
if parameterValue
is invalid for the given parameterId
.
SpeechlyDecoder.Decoder_GetNumSamples
Returns the total number of samples processed by a given
DecoderHandle
instance since the previous time
SpeechlyDecoder.Decoder_WriteSamples
was called with end_of_input = 1
.
public static int Decoder_GetNumSamples(
DecoderHandle decoder) // The DecoderHandle instance
SpeechlyDecoder_Decoder_GetNumVoiceSamples
Returns the total number of samples that are not part of silence
regions processed by a given DecoderHandle
instance since the
previous time SpeechlyDecoder.Decoder_WriteSamples
was called with
end_of_input = 1
.
public static int Decoder_GetNumVoiceSamples(
DecoderHandle decoder) // The DecoderHandle instance
SpeechlyDecoder.Decoder_GetNumCharacters
Returns the total number of characters transcribed by a given
DecoderHandle
instance since the previous time
SpeechlyDecoder.Decoder_WriteSamples
was called with end_of_input = 1
.
public static int Decoder_GetNumCharacters(
DecoderHandle decoder) // The DecoderHandle instance
SpeechlyDecoder.SpeechlyDecoderVersion
A utility method that returns a String
representing the version
of this Speechly decoder library.
SpeechlyDecoder.SpeechlyDecoderBuild
A utility method that returns a long
representing the build id
of this Speechly decoder library.
SpeechlyDecoder.CResultWord_Destroy
Deallocates a CResultWord
when it's no longer needed.
public static void CResultWord_Destroy(
CResultWord result_word) // The CResultWord instance to destroy
The CResultWord
class
An instance of CResultWord
represents a single word of transcript together
with its begin and end timestamps. The timestamps are measured from the start of
the audio stream. A CResultWord
instance has the self-explanatory methods
public String getWord()
public long getStart_time()
public long getEnd_time()
End of stream is indicated by getWord()
returning the empty string ""
.
Note that when a CResultWord
instance is no longer needed, it must
be disposed of by explicitly calling
SpeechlyDecoder.CResultWord_Destroy
.
Speechly Constants
The decoder library uses a number of integer constants to specify
parameters (in the GetParamI
/SetParamI
/GetParamF
/SetParamF
methods) as
well as error codes. The constants appear as static integers in the
SpeechlyDecoder
class.
Error constants
The relevant error constants are given below.
SpeechlyDecoder.SPEECHLY_ERROR_MISMATCH_IN_MODEL_ARCHITECTURE
The model bundle being loaded is targeted for a different machine learning backend than the one used by this library. By default, the Android SDK can only run models for Tensorflow Lite.
SpeechlyDecoder.SPEECHLY_ERROR_INVALID_MODEL
The model bundle is corrupted.
SpeechlyDecoder.SPEECHLY_ERROR_EXPIRED_MODEL
The model bundle contains a licence that has expired. All Speechly model bundles have a predefined lifetime after which the decoder library refuses to load the model.
SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_PARAMETER
The parameter Id given to SetParamI
, SetParamF
, GetParamI
or GetParamF
is invalid.
SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_PARAMETER_VALUE
The parameter value given to SetParamI
, SetParamF
is invalid for the
given parameterId
.
SpeechlyDecoder.SPEECHLY_ERROR_MEMORY_ERROR
There was an error related to memory allocation. Most likely this means you have run out of memory.
SpeechlyDecoder.SPEECHLY_ERROR_UNEXPECTED_ERROR
Something unexpected happened in the decoder.
SpeechlyDecoder.SPEECHLY_ERROR_NONE
No error happened. Note that the error code of a thrown DecoderException
should always be something else than this. So there should never be a need
to check for this value.
Parameter constants
These are valid values for the parameterId
argument in the
GetParamI
/SetParamI
/GetParamF
/SetParamF
methods. For the time being
these are all related to internals of the Voice Activity Detection feature.
In general it should not be necessary to set/get any of these, as the
default values will provide good performance in most cases. These constants
are subject to change, and will be more thoroughly documented in a future
release. They are listed here merely for the sake of completeness.
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_TO_NOISE_DB_F
VAD signal-to-noise energy ratio needed for frame to be 'loud'.
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_NOISE_GATE_DB_F
VAD Energy threshold - below this won't trigger activation. Range (-90.0f, 0.0f).
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_NOISE_LEARN_HALFTIME_MS_I
VAD Rate of background noise learn. Defined as duration in which background noise energy is moved halfway towards current frame's energy. Range (0, 5000).
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_ACTIVATION_F
VAD 'loud' to 'silent' ratio in signal_search_frames to activate 'is_signal_detected'. Range(.0f, 1.0f).
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_RELEASE_F
VAD 'loud' to 'silent' ratio in signal_search_frames to keep 'is_signal_detected' active. Only evaluated when the sustain period is over. Range(.0f, 1.0f).
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_SUSTAIN_MS_I
VAD duration to keep 'is_signal_detected' active. Renewed as long as VADActivation is holds true. Range(0, 8000).
SpeechlyDecoder.SPEECHLY_VAD_CONFIG_SIGNAL_SEARCH_FRAMES_I
VAD number of past audio frames analyzed by energy threshold VAD. Range(1, 32).