API Reference

API reference for the Speechly API

Description

The Speechly SLU service provides spoken language understanding in a bidirectional stream. It takes the SLURequest stream as an input and outputs the SLUResponse stream in real-time.

A new stream is started by initializing the SLU engine with a config value.

The Speechly API requires that a user has an access token from the Identity service. The token must be included in the metadata as an Authorization key with value Bearer TOKEN_HERE.

When a new SLU.Stream is started, the client must first send the config value, which configures the SLU engine. Should it not be the first message sent, the stream closes with an error.

When the client sends the SLUevent.START message to start the audio stream and starts sending data, the server responds with SLUResponse messages until the stream gets ended by the client sending SLUEvent.STOP.

Stream requires config

If the first message to the SLU service is not config, the stream closes with an error. You’ll need a Speechly app ID for creating the configuration. You can get your app ID by signing up to the Speechly Dashboard.

Identity service

service Identity {
    rpc Login(LoginRequest) returns (LoginResponse) {}
}

The Identity service provides client login. When successful, it returns an access token to access the SLU and WLU services.

LoginRequest

message LoginRequest {
    string device_id = 1;
    string app_id = 2;
}
NameTypeDescription
intentstringA unique identifier for the end-user.
intentstringAn application ID registered with Speechly.

Example


const identity = new Speechly.v1.Identity(host, credentials);
identity.login({ appId, deviceId }, (err, response) => {
  if (err) {
    return reject(err);
  }
  const token = response.token;
});

LoginResponse

message LoginResponse {
    string token = 1; 
}
NameTypeDescription
intentstringAn access token to be used for the SLU and WLU services.

SLU service

service SLU {
    rpc Stream(stream SLURequest) returns (stream SLUResponse) {}
}
const metadata = new grpc.Metadata();
metadata.add("Authorization", `Bearer ${token}`);
const client = new Speechly.v1.SLU(host, credentials);
const slu = client.Stream(metadata);

Messages

SLURequest

message SLURequest {
    oneof streaming_request {
        SLUConfig config = 1;
        SLUEvent event = 2;
        bytes audio = 3;
    }
}
NameTypeDescription
configSLUConfigProvides the configuration for initializing the SLU service.
eventSLUEventEither ´START´ or ´STOP´ to begin or end the stream.
audiobytesThe actual audio stream.

Example

slu.write({ config: { channels: 1, sampleRateHertz: 16000 } });
slu.write({ event: { event: "START" } });
const audio = new wav.Reader();
audio.on("data", audioData => {
  if (slu.writable) {
    slu.write({ audio: audioData });
  }
});
slu.on("data", data => {
  console.dir(data, { depth: null });
});
audio.on("end", () => {
  slu.write({ event: { event: "STOP" } });
  slu.end();
});
fs.createReadStream("../audio.wav").pipe(audio);

See the complete example.

SLUConfig

message SLURequest {
    enum Encoding {
        LINEAR16 = 0; 
    }
    Encoding encoding = 1;
    int32 channels = 2;
    int32 sample_rate_hertz = 3;
    string language_code = 4;
  }
}
NameTypeDescription
encodingEncodingThe choice of audio encoding as an object; ´LINEAR16 = 0´ is for raw, linear 16-bit PCM audio.
channelsint32The number of channels in the audio stream must be at least 1.
sample_rate_hertzint32The audio stream sampling rate must be at least 8000Hz, but 16000Hz is preferred.
language_codestringA valid language code, such as en_US. Must match one of the languages defined in the app ID configuration.

SLUEvent

The SLUEvent is a control event sent by the client in the SLU.Stream RPC.

message SLUEvent {
    enum Event {
        START = 0; 
        STOP = 1; 
    }
    Event event = 1;
}
NameTypeDescription
enumEventEither START or STOP to start and stop the audio stream.

SLUResponse

message SLUResponse {
    string audio_context = 1;  
    int32 segment_id = 2;
    oneof streaming_response {
        SLUTranscript transcript = 3;
        SLUEntity entity = 4;
        SLUIntent intent = 5;
        SLUSegmentEnd segment_end = 6;

        SLUTentativeTranscript tentative_transcript = 7;
        SLUTentativeEntities tentative_entities = 8;
        SLUIntent tentative_intent = 9;

        SLUStarted started = 10;
        SLUFinished finished = 11;
    }
}
NameTypeDescription
audio_contextstringThe identifier to match the server response to the audio context.
segment_idint32The identifier to match the server reponse to the segment context.
streaming_responseoneofOne of nine possible streaming responses from the server.
transcriptSLUTranscriptThe final transcript of an utterance, once the segment is finished.
entitySLUEntityThe final entity of the segment.
intentSLUIntentThe final intent of the segment.
tentative_transcriptSLUTentativeTranscriptThe tentative transcript of the utterance. Sent continuosly while a segment is being processed. Subject to change.
tentative_entitiesSLUTentativeEntitiesThe tentative entities in the utterance. Sent continously while a segment is being processed. Subject to change.
tentative_intentSLUIntentThe tentative intent of a segment. Subject to change.
startedSLUStartedSent when the server is ready to receive audio stream.
finishedSLUFinishedSent when the server has closed the audio stream.

Responses sent by the server in the SLU.Stream RPC when receiving audio stream. The stream always starts with SLUStarted, and either ends to an error or in SLUFinished.

The actual responses are either tentative or final. The tentative responses are subject to change until finalized. They are discarded once the server has sent the final response.

SLUTentativeTranscript

message SLUTentativeTranscript {
    string tentative_transcript = 1;
    repeated SLUTranscript tentative_words = 2;
}
NameTypeDescription
tentative_transcriptstringThe tentative transcript of the utterance.
tentative_wordsSLUTranscriptThe array of words in the transcript.

The message sent by the server for the tentative transcript of the voice data before the SLU stream is finished either by an error or the client sending SLUEvent.STOP.

Tentative results can change

The tentative results are subject to change until the SLU stream is finished.

SLUTentativeEntities

message SLUTentativeEntities {
    repeated SLUEntity tentative_entities = 1;
}
NameTypeDescription
tentative_entitiesSLUEntityThe array of entities in the transcript.

The message sent by the server for the tentative entitites in an utterance.

Tentative results can change

Tentative results are subject to change until the SLU stream is finished.

SLUEntity

message SLUEntity {
    string entity = 1;
    string value = 2;
    int32 start_position = 3;
    int32 end_position = 4;
}
NameTypeDescription
tentative_transcriptstringAn entity from the utterance.
valuestringThe value of the entity.
start_positionint32The starting position of the entity in the transcript. Inclusive.
end_positionint32The ending position of the entity in the transcript. Exclusive.

The message sent by the server for the final entity results.

SLUIntent

message SLUIntent {
    string intent = 1;
}
NameTypeDescription
intentstringThe intent of the segment.

The message sent by the server for the final intent results.

SLUSegmentEnd

message SLUSegmentEnd {
}

The message sent by the server at the end of a SLU segment.

SLUStarted

message SLUStarted {
}

The message sent by the server when the audio context is initialized. Once initialized, the server sends the SluStarted message, which contains the audio_context for matching the rest of the response messages to that specific utterance.

While processing the audio, the server sends SLUTentativeEvents messages continuously.

SLUFinished

message SLUFinished {
    // If the audio context finished with an error, then this field
    // contains a value.
    SLUError error = 2;
}

The message sent by the server when the audio context is finished either by an error or the client sending SLUEvent.STOP. If the context is finished due to an error, the SLUError contains an error message.

SLUError

message SLUError {
    string code = 1; 
    string message = 2; 
}
NameTypeDescription
codestringThe error code.
messagestringA human-readable error message.

Example code

JavaScript

const fs = require("fs");
const protoLoader = require("@grpc/proto-loader");
const grpc = require("grpc");
const wav = require("wav");

const appId = process.env.APP_ID;
if (appId === undefined) {
  throw new Error("APP_ID environment variable needs to be set");
}

let host = "api.speechgrinder.com";
let credentials = grpc.credentials.createSsl();

const SgGrpc = grpc.loadPackageDefinition(
  protoLoader.loadSync("../sg.proto", {
    keepCase: false,
    longs: String,
    enums: String,
    defaults: true,
    oneofs: true
  })
);

const login = (deviceId, appId) => {
  return new Promise((resolve, reject) => {
    const identity = new SgGrpc.speechgrinder.sgapi.v1.Identity(host, credentials);
    identity.login({ appId, deviceId }, (err, response) => {
      if (err) {
        return reject(err);
      }
      return resolve(response.token);
    });
  });
};

const start = slu => {
  const audio = new wav.Reader();

  slu.write({ config: { channels: 1, sampleRateHertz: 16000 } });
  slu.write({ event: { event: "START" } });

  audio.on("data", audioData => {
    if (slu.writable) {
      slu.write({ audio: audioData });
    }
  });
  slu.on("data", data => {
    console.dir(data, { depth: null });
  });
  audio.on("end", () => {
    slu.write({ event: { event: "STOP" } });
    slu.end();
  });
  fs.createReadStream("../audio.wav").pipe(audio);
};

Promise.resolve()
  .then(() => login("node-simple-test", appId))
  .catch(err => {
    console.error(err);
    process.exit();
  })
  .then(token => {
    const metadata = new grpc.Metadata();
    metadata.add("Authorization", `Bearer ${token}`);
    const client = new SgGrpc.speechgrinder.sgapi.v1.Slu(host, credentials);
    const slu = client.Stream(metadata);
    return start(slu);
  })
  .catch(err => {
    console.error(err);
  });

  

Python

import os
import wave
import uuid

import grpc

from speechly_pb2 import SLURequest, SLUConfig, SLUEvent, LoginRequest
from speechly_pb2_grpc import IdentityStub as IdentityService
from speechly_pb2_grpc import SLUStub as SLUService

chunk_size = 8000

def audio_iterator():
    yield SLURequest(config=SLUConfig(channels=1, sample_rate_hertz=16000))
    yield SLURequest(event=SLUEvent(event='START'))
    with wave.open('output.wav', mode='r') as audio_file:
        audio_bytes = audio_file.readframes(chunk_size)
        while audio_bytes:
            yield SLURequest(audio=audio_bytes)
            audio_bytes = audio_file.readframes(chunk_size)
    yield SLURequest(event=SLUEvent(event='STOP'))

with grpc.secure_channel('api.speechly.com', grpc.ssl_channel_credentials()) as channel:
    token = IdentityService(channel) \
        .Login(LoginRequest(device_id=str(uuid.uuid4()), app_id=os.environ['APP_ID'])) \
        .token

with grpc.secure_channel('api.speechly.com', grpc.ssl_channel_credentials()) as channel:
    slu = SLUService(channel)
    responses = slu.Stream(
        audio_iterator(),
        None,
        [('authorization', 'Bearer {}'.format(token))])
    for response in responses:
        print(response)

WLU service

The Speechly written language understanding service provides natural language understanding for written languages. It uses the same model as the SLU service; hence, it returns the same results for the same transcripts.

When a string is sent to the service, it returns with intents, entities, and transcripts for each segment as a response. The maximum size of a message is 16KB.

service WLU { 
    rpc Text(WLURequest) returns (WLUResponse) {}
}

Messages

WLURequest

message WLURequest {
    string language_code = 1;
    string text = 2;
}
NameTypeDescription
language_codestringA valid language for the model used in the application.
textstringThe text from which the intents, entities, and transcripts are extracted.

WLUResponse

message WLUResponse {
    repeated WLUSegment segments = 1;
}
NameTypeDescription
segmentsWLUSegmentAny number of segments extracted from a string sent to WLURequest.

WLUSegment

message WLUSegment {
    string text = 1;
    repeated WLUToken tokens = 2;
    repeated WLUEntity entities = 3;
    WLUIntent intent = 4;
}
NameTypeDescription
textstringA segment extracted from WLURequest.
tokenWLUTokenAny number of words extracted from the segment.
entitiesWLUEntityAny number of entities extracted from the segment.
intentWLUIntentThe intent of the segment.

WLUToken

message WLUToken {
    string word = 1;
    int32 index = 2;
}
NameTypeDescription
wordstringA word in a segment.
indexint32The position of the word in the segment.

WLUEntity

message WLUEntity {
    string entity = 1;
    string value = 2;
    int32 start_position = 3;
    int32 end_position = 4;
}
NameTypeDescription
entitystringAn entity extracted from a segment.
valuestringThe value of the entity.
start_positionint32The starting position of the word/words containing the entity and its value. Inclusive.
end_positionint32The ending position of the word/words containing the entity and its value. Exclusive.

WLUIntent

message WLUIntent {
    string intent = 1;
}
NameTypeDescription
intentstringThe intent of a segment.

Profile image for ottomatias

Last updated by ottomatias on November 20, 2020 at 15:48 +0200

Found an error on our documentation? Please file an issue or make a pull request