Speechly Batch API
The APIs used for asynchronous spoken language understanding.
speechly.identity.v2.IdentityAPI
Speechly Identity API is used for creating access tokens for the Speechly APIs.
Methods
name | request | response | description |
---|---|---|---|
Login | LoginRequest | LoginResponse | Performs a login of specific Speechly application. Returns an access token which can be used to access the Speechly API. |
Messages
ApplicationScope
Used as the scope in LoginRequest
when the access is for a single Speechly application.
Fields
name | type | description |
---|---|---|
app_id | string | Speechly application ID. The defined application can be accessed with the returned token. Required. |
config_id | string | Define a specific model configuration to use. Defaults to the application's latest configuration. |
LoginRequest
Top-level message sent by the client for the Login
method.
Fields
name | type | description |
---|---|---|
device_id | string | A unique end-user device identifier. Must be a UUID .Required. |
application | ApplicationScope | Login scope application: use the given application context for all utterances. |
project | ProjectScope | Login scope project: define the target application per utterance. The target applications must be located in the same project. |
LoginResponse
Top-level message returned by the server for the Login
method.
Fields
name | type | description |
---|---|---|
token | string | Access token which can used for the Speechly API. The token is a JSON Web Token and includes all standard claims, as well as custom ones. The token has expiration, so you should check whether it has expired before using it. It is safe to cache the token for future use until its expiration date. |
valid_for_s | uint32 | Amount of seconds the returned token is valid. |
expires_at_epoch | uint64 | Token expiration time in seconds after 1970-01-01 ("unix time"). |
expires_at | string | ISO-formatted UTC timestamp of the expiration time of the returned token. |
ProjectScope
Used as the scope in LoginRequest
when access is required for every application in a Speechly project.
Fields
name | type | description |
---|---|---|
project_id | string | Speechly project ID. Every application in the same project is accessible with the same token. Required. |
speechly.slu.v1.BatchAPI
Run SLU operations on audio sources without actively waiting the results.
Methods
name | request | response | description |
---|---|---|---|
ProcessAudio | ProcessAudioRequest stream | ProcessAudioResponse | Create a new background SLU operation for a single audio source. An audio source can be - audio chunks sent via repeated ProcessAudioRequests, or - URI of a file, reachable from the API The response includes an id that is used to match the operation to theresults. A reference identifier can also be set.The destination can be a webhook URL, in which case the results are posted there when they are ready. The payload is an instance of Operation . |
QueryStatus | QueryStatusRequest | QueryStatusResponse | Query the status of a given batch operation. If the ProcessAudioRequest did not define a results_uri as adestination, the results are returned in the QueryStatusResponse . |
speechly.slu.v1.SLU
Service that implements Speechly SLU (Spoken Language Understanding) API.
To use this service you MUST use an access token from Speechly Identity API.
The token MUST be passed in gRPC metadata with Authorization
key and Bearer ACCESS_TOKEN
as value, e.g. in Go:
ctx := context.Background()
ctx = metadata.AppendToOutgoingContext(ctx, "Authorization", "Bearer "+accessToken)
stream, err := speechlySLUClient.Stream(ctx)
Methods
name | request | response | description |
---|---|---|---|
Stream | SLURequest stream | SLUResponse stream | Performs bidirectional streaming speech recognition: receive results while sending audio. First request MUST be an SLUConfig message with the configuration that describes the audio format being sent. This RPC can handle multiple logical audio segments with the use of SLUEvent_START and SLUEvent_STOP messages,which are used to indicate the beginning and the end of a segment. A typical call timeline will look like this: 1. Client starts the RPC. 2. Client sends SLUConfig message with audio configuration.3. Client sends SLUEvent.START .4. Client sends audio and receives responses from the server. 5. Client sends SLUEvent.STOP .6. Client sends SLUEvent.START .7. Client sends audio and receives responses from the server. 8. Client sends SLUEvent.STOP .9. Client closes the stream and receives responses from the server until EOF is received. NB: the client does not have to wait until the server acknowledges the start / stop events, this is done asynchronously. The client can deduplicate responses based on the audio context ID, which will be present in every response message. |
speechly.slu.v1.WLU
Service that implements Speechly WLU (Written Language Understanding).
To use this service you MUST use an access token from Speechly Identity API.
The token MUST be passed in gRPC metadata with Authorization
key and Bearer ACCESS_TOKEN
as value, e.g. in Go:
ctx := context.Background()
ctx = metadata.AppendToOutgoingContext(ctx, "Authorization", "Bearer "+accessToken)
res, err := speechlyWLUClient.Text(ctx, req)
Methods
name | request | response | description |
---|---|---|---|
Text | WLURequest | WLUResponse | Performs recognition of a text with specified language. |
Texts | TextsRequest | TextsResponse | Performs recognition of a batch of texts with specified language. |
Messages
- AudioConfiguration
- Operation
- Option
- Option
- Option
- ProcessAudioRequest
- ProcessAudioResponse
- QueryStatusRequest
- QueryStatusResponse
- RoundTripMeasurementRequest
- RoundTripMeasurementResponse
- SLUConfig
- SLUEntity
- SLUError
- SLUEvent
- SLUFinished
- SLUIntent
- SLURequest
- SLUResponse
- SLUSegmentEnd
- SLUStart
- SLUStarted
- SLUStop
- SLUTentativeEntities
- SLUTentativeTranscript
- SLUTranscript
- TextsRequest
- TextsResponse
- Transcript
- WLUEntity
- WLUIntent
- WLURequest
- WLUResponse
- WLUSegment
- WLUToken
AudioConfiguration
Describes the audio content of the batch operation.
Fields
name | type | description |
---|---|---|
encoding | Encoding | The encoding of the audio data sent in the stream. Required. |
channels | int32 | The number of channels in the input audio data. Required. |
sample_rate_hertz | int32 | Sample rate in Hertz of the audio data sent in the stream (e.g. 16000). Required. |
language_codes | string | The language(s) of the audio sent in the stream as a BCP-47 language tag (e.g. "en-US"). Defaults to the target application language. Optional. |
Operation
Describes a single batch operation.
Fields
name | type | description |
---|---|---|
id | string | The id of the operation. |
reference | string | The reference id of the operation, if given. |
status | Status | The current status of the operation. |
language_code | string | The language code of the detected language. |
app_id | string | The application context for the operation. |
device_id | string | The device or microphone id for the audio, if applicable. |
transcripts | Transcript | If the operation status is STATUS_DONE and the destination is not set, the results of the operation. |
Option
Option to change the default behaviour of the SLU.
Fields
name | type | description |
---|---|---|
key | string | The key of the option to be set. |
value | string | The values to set the option to. |
SLUConfig.Option
Option to change the default behaviour of the SLU.
Fields
name | type | description |
---|---|---|
key | string | The key of the option to be set. |
value | string | The values to set the option to. |
SLUStart.Option
Option to change the default behaviour of the SLU.
Fields
name | type | description |
---|---|---|
key | string | The key of the option to be set. |
value | string | The values to set the option to. |
ProcessAudioRequest
If sending a stream of ProcessAudioRequest
messages, the first one must
contain the AudioConfiguration
for the audio data. The config
is ignored
in the following messages.
Fields
name | type | description |
---|---|---|
app_id | string | The processing context, Speechly application ID. Required. |
config | AudioConfiguration | Audio configuration. Required. |
audio | bytes | Raw audio data. |
uri | string | URI of audio data. |
results_uri | string | The results JSON will be posted to the given URI. If not given, the results must be fetched using QueryStatus .Optional. |
reference | string | Reference id for the operation. For example an identifier of the source system. Optional. |
options | Option | Additional operation specific options. Optional. |
ProcessAudioResponse
Fields
name | type | description |
---|---|---|
operation | Operation | The details of the created operation. |
QueryStatusRequest
Query the status of an operation. Either id
or reference
must be given.
Fields
name | type | description |
---|---|---|
id | string | ID of an audio processing operation. |
reference | string | Reference ID of an operation. |
QueryStatusResponse
Fields
name | type | description |
---|---|---|
operation | Operation | The details of the audio processing operation. |
RoundTripMeasurementRequest
Network latency measurement request. Sent from the server to measure the time it takes for the client to receive a message and the server to receive the client's response. Also known as RTT.
Fields
name | type | description |
---|---|---|
id | int32 | Measurement id. Multiple measurements can be sent during one connection, so the response should contain the same id as in the request. |
RoundTripMeasurementResponse
Response sent from the client immediately after seeing the RoundTripMeasurementRequest.
Fields
name | type | description |
---|---|---|
id | int32 | id should match the request's id. |
SLUConfig
Describes the configuration of the audio sent by the client. Currently the API only supports single-channel Linear PCM with sample rate of 16 kHz.
Fields
name | type | description |
---|---|---|
encoding | Encoding | The encoding of the audio data sent in the stream. Required. |
channels | int32 | The number of channels in the input audio data. Required. |
sample_rate_hertz | int32 | Sample rate in Hertz of the audio data sent in the stream. Required. |
language_code | string | The language of the audio sent in the stream as a BCP-47 language tag (e.g. "en-US"). Defaults to the target application language. |
options | Option | Special options to change the default behaviour of the SLU for all logical audio segment. |
SLUEntity
Describes an SLU entity.
An entity is a specific object in the phrase that falls into some kind of category,
e.g. in a SAL example *book book a [burger restaurant](restaurant_type) for [tomorrow](date)
"burger restaurant" would be an entity of type restaurant_type
,
and "tomorrow" would be an entity of type date
.
An entity has a start and end indices which map to the indices of words in SLUTranscript messages,
e.g. in the example book a [burger restaurant](restaurant_type) for [tomorrow](date)
it would be:
- Entity "burger restaurant" -
start_position = 2, end_position = 3
- Entity "tomorrow" -
start_position = 5, end_position = 5
The start index is inclusive, but the end index is exclusive, i.e. the interval is [start_position, end_position)
.
Fields
name | type | description |
---|---|---|
entity | string | The type of the entity, e.g. restaurant_type or date . |
value | string | The value of the entity, e.g. burger restaurant or tomorrow . |
start_position | int32 | The starting index of the entity in the phrase, maps to the index field in SLUTranscript .Inclusive. |
end_position | int32 | The finishing index of the entity in the phrase, maps to the index field in SLUTranscript .Exclusive. |
SLUError
Describes the error that happened when processing an audio context. DEPRECATED: Will not be returned. Any errors are returned as gRCP status codes with detail messages.
Fields
name | type | description |
---|---|---|
code | string | Error code (refer to documentation for specific codes). |
message | string | Error message. |
SLUEvent
Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms).
Fields
name | type | description |
---|---|---|
event | Event | The event type being sent. Required. |
app_id | string | The appId for the utterance.Required in the START event if the authorization token is project based. Thegiven application must be part of the project set in the token. Not required if the authorization token is application based. |
SLUFinished
Indicates that the API has stopped processing current audio context. It guarantees that no new messages for that context will be sent by the server.
Fields
name | type | description |
---|---|---|
error | SLUError | DEPRECATED An error which has happened when processing the context, if any. |
SLUIntent
Describes an SLU intent. There can be only one intent per SLU segment.
Fields
name | type | description |
---|---|---|
intent | string | The value of the intent, as defined in SAL. |
SLURequest
Top-level message sent by the client for the Stream
method.
Fields
name | type | description |
---|---|---|
config | SLUConfig | Describes the configuration of the audio sent by the client. MUST be the first message sent to the stream. |
event | SLUEvent | Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms). A context MUST be preceded by a start event and concluded with a stop event, otherwise the server WILL terminate the stream with an error. DEPRECATED in favour of SLUStart and SLUStop |
audio | bytes | Contains a chunk of the audio being streamed. |
rtt_response | RoundTripMeasurementResponse | Response to an RTT measurement request from server. Should be sent immediately after receiving the RoundTripMeasurementRequest in the stream. If ignored, no round trip measurements are made. |
start | SLUStart | Indicates the beginning of a logical audio segment (audio context in Speechly terms). A context MUST be preceded by a SLUStart, (or the deprecated SLUEvent start event) otherwise the server WILL terminate the stream with an error. |
stop | SLUStop | Indicates the end of a logical audio segment (audio context in Speechly terms). A context MUST be concluded with a SLUStop, (or the deprecated SLUEvent stop event) otherwise the server WILL terminate the stream with an error. |
SLUResponse
Top-level message sent by the server for the Stream
method.
Fields
name | type | description |
---|---|---|
audio_context | string | The ID of the audio context that this response belongs to. |
segment_id | int32 | The ID of the SLU segment that this response belongs to. This will be 0 for SLUStarted and SLUFinished responses. |
transcript | SLUTranscript | Final SLU transcript. |
entity | SLUEntity | Final SLU entity. |
intent | SLUIntent | Final SLU intent. |
segment_end | SLUSegmentEnd | A special marker message that indicates that the segment with specified segment_id has been finalised and no new responses belonging to that segment will be sent. The client is expected to discard any tentative responses in this segment. |
tentative_transcript | SLUTentativeTranscript | Tentative SLU transcript. |
tentative_entities | SLUTentativeEntities | Tentative SLU entities. |
tentative_intent | SLUIntent | Tentative SLU intent. |
started | SLUStarted | A special marker message that indicates that the audio context with specified audio_context idhas been started by the API and all audio data sent by the client will be processed in that context. This message is an asynchronous acknowledgement for client-side SLUEvent_START message. |
finished | SLUFinished | A special marker message that indicates that the audio context with specified audio_context idhas been stopped by the API and no new responses for that context will be sent. The client is expected to discard any non-finalised segments. This message is an asynchronous acknowledgement for client-side SLUEvent_STOP message. |
rtt_request | RoundTripMeasurementRequest | Initiates a round trip network latency measurement. The response handler should respond to this message by sending a RoundTripMeasurementResponse in the request stream. The measurement is stored server side and used to minimise the latency in the future. |
SLUSegmentEnd
Indicates the end of the segment. Upon receiving this, the segment should be finalised and all future messages for that segment (if any) discarded.
Fields
name | type | description |
---|
SLUStart
Indicates the beginning and the end of a logical audio segment (audio context in Speechly terms).
Fields
name | type | description |
---|---|---|
app_id | string | The appId for the utterance.Required if the authorization token is project based. The given application must be part of the project set in the token. Not required if the authorization token is application based. |
options | Option | Special options to change the default behaviour of the SLU for this audio segment. |
SLUStarted
Indicates that the API has started processing the portion of audio as new audio context. This does not guarantee that the server will not send any more messages for the previous audio context.
Fields
name | type | description |
---|
SLUStop
Indicates the end of a logical audio segment (audio context in Speechly terms).
Fields
name | type | description |
---|
SLUTentativeEntities
Describes tentative entities.
Fields
name | type | description |
---|---|---|
tentative_entities | SLUEntity | A list of entities, which must be treated as tentative. This is not an aggregate of all entities in the audio, but rather it ONLY contains entities that have not been finalised yet. e.g. if at the start there are two tentatively recognised entities - ["burger restaurant", "tomorrow"] but then the API marks "burger restaurant" as final and recognises a new tentative entity "for two", this will contain ["tomorrow", "for two"]. |
SLUTentativeTranscript
Describes a tentative transcript.
Tentative transcript is an interim recognition result, which may change over time, e.g. a phrase "find me a red t-shirt" can be tentatively recognised as "find me a tea", until the API processes the audio completely.
Fields
name | type | description |
---|---|---|
tentative_transcript | string | Aggregated tentative transcript from the beginning of the audio until current moment in time. Consecutive transcripts will have this value appended to, e.g. if in the first message it's "find me", in the next it may be "find me a t-shirt". |
tentative_words | SLUTranscript | A list of individual words which compose tentative_transcript .All words must be considered tentative. |
SLUTranscript
Describes an SLU transcript. A transcript is a speech-to-text element of the phrase, i.e. a word recognised from the audio.
Fields
name | type | description |
---|---|---|
word | string | The word recongised from the audio. |
index | int32 | The position of the word in the whole phrase, zero-based. |
start_time | int32 | The end time of the word in the audio, in milliseconds from the beginning of the audio. |
end_time | int32 | The end time of the word in the audio, in milliseconds from the beginning of the audio. |
TextsRequest
Top-level message sent by the client for the Texts
method.
Fields
name | type | description |
---|---|---|
app_id | string | The target application for the texts request. Required. |
requests | WLURequest | List of WLURequest. Required. |
TextsResponse
Top-level message sent by the server for the Texts
method.
Fields
name | type | description |
---|---|---|
responses | WLUResponse | List of WLUResponses. Required. |
Transcript
Describes an SLU transcript. A transcript is a speech-to-text element of the phrase, i.e. a word recognised from the audio.
Fields
name | type | description |
---|---|---|
word | string | The word recongised from the audio. |
index | int32 | The position of the word in the whole phrase, zero-based. |
start_time | int32 | The end time of the word in the audio, in milliseconds from the beginning of the audio. |
end_time | int32 | The end time of the word in the audio, in milliseconds from the beginning of the audio. |
WLUEntity
Describes a single entity in a segment.
An entity is a specific object in the phrase that falls into some kind of category,
e.g. in a SAL example *book book a [burger restaurant](restaurant_type) for [tomorrow](date)
"burger restaurant" would be an entity of type restaurant_type
,
and "tomorrow" would be an entity of type date
.
An entity has a start and end indices which map to the indices of words in WLUToken messages,
e.g. in the example book a [burger restaurant](restaurant_type) for [tomorrow](date)
it would be:
- Entity "burger restaurant" -
start_position = 2, end_position = 3
- Entity "tomorrow" -
start_position = 5, end_position = 5
The start index is inclusive, but the end index is exclusive, i.e. the interval is [start_position, end_position)
.
Fields
name | type | description |
---|---|---|
entity | string | The type of the entity, e.g. restaurant_type or date . |
value | string | The value of the entity, e.g. burger restaurant or tomorrow . |
start_position | int32 | The starting index of the entity in the phrase, maps to the index field in SLUTranscript .Inclusive. |
end_position | int32 | The finishing index of the entity in the phrase, maps to the index field in SLUTranscript .Exclusive. |
WLUIntent
Describes the intent of a segment. There can only be one intent per segment.
Fields
name | type | description |
---|---|---|
intent | string | The value of the intent, as defined in SAL. |
WLURequest
Top-level message sent by the client for the Text
method.
Fields
name | type | description |
---|---|---|
language_code | string | The language of the text sent in the request as a BCP-47 language tag (e.g. "en-US"). Required. |
text | string | The text to recognise. Required. |
reference_time | Timestamp | The reference time for postprocessing. By default, the current date is used. Optional. |
WLUResponse
Top-level message sent by the server for the Text
method.
Fields
name | type | description |
---|---|---|
segments | WLUSegment | A list of WLU segments. |
WLUSegment
Describes a WLU segment. A segment is a logical portion of text denoted by its intent, e.g. in a phrase "book me a flight and rent a car" there would be a segment for "book me a flight" and another for "rent a car".
Fields
name | type | description |
---|---|---|
text | string | The portion of text that contains this segment. |
tokens | WLUToken | The list of word tokens which are contained in this segment. |
entities | WLUEntity | The list of entities which are contained in this segment. |
intent | WLUIntent | The intent that defines this segment. |
annotated_text | string | The value of text annotated in SAL format. |
WLUToken
Describes a single word token in a segment.
Fields
name | type | description |
---|---|---|
word | string | The value of the word. |
index | int32 | Position of the token in the text. |