Building a web app using Speechly Browser Client
Learn how to add voice features to a web app using Speechly Browser Client.
Getting started
This guide assumes you've got some basic knowledge of JavaScript app development. We'll be creating a simple HTML/JS web app and use Parcel as bundler. Feel free to use your favorite bundler, this guide doesn't really go too deep into that.
You'll also need a Speechly account and a Speechly application that's using a Conformer model. If you are new to Speechly, you can follow our quick start guide to get started.
Project setup
Before we get started, you need to create a directory for your project. Then, create some HTML and JS files inside a src
directory and add some content into them.
mkdir src
touch src/index.html src/app.js
console.log('Hello world');
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>My Speechly App</title>
<script type="module" src="app.js"></script>
</head>
<body>
<h1>Hello world</h1>
</body>
</html>
Next, install Parcel
npm install --save-dev parcel
# or
yarn add --dev parcel
Then, update package scripts to get our app started.
{
"source": "src/index.html",
"scripts": {
"start": "parcel"
},
"devDependencies": {
"parcel": "^2.8.3"
},
}
Finally, start the development server.
npm start
Open localhost:1234 to see the application running.
Installation
Now that our project setup, install the @speechly/browser-client
package.
npm install @speechly/browser-client
# or
yarn add @speechly/browser-client
Then, import BrowserClient
, create a new client instance and pass your App ID to it. Get your App ID from Speechly Dashboard or by using Speechly CLI list
command.
import { BrowserClient } from '@speechly/browser-client';
const client = new BrowserClient({
appId: 'YOUR-APP-ID',
logSegments: true,
debug: true,
})
The debug
and logSegments
properties might be helpful when developing, as they display changes in the client state as well as log the API output.
If you have debug
enabled, you should see from the developer console that the client has connected to the API. Now you are ready to capture microphone audio!
Capture microphone audio
The easiest way to capture audio from the browser microphone is creating a button that toggles the microphone on and off.
First, import BrowserMicrophone
and create a new microphone instance.
import { BrowserClient, BrowserMicrophone } from '@speechly/browser-client';
const microphone = new BrowserMicrophone();
const client = new BrowserClient({
appId: 'YOUR-APP-ID',
logSegments: true,
debug: true,
})
Next, create a button and a click handler for it where you attach the microphone to the client.
<button id="mic">Start microphone</button>
// ...
const micBtn = document.getElementById('mic');
const attachMicrophone = async () => {
if (microphone.mediaStream) return;
await microphone.initialize();
await client.attach(microphone.mediaStream);
};
const handleClick = async () => {
if (client.isActive()) {
await client.stop();
micBtn.innerText = 'Start microphone';
} else {
await attachMicrophone();
await client.start();
micBtn.innerText = 'Stop microphone';
}
};
micBtn.addEventListener('click', handleClick);
attachMicrophone
is a helper function for attaching the microphone to the client. Browsers don't allow accessing the microphone programmatically, that's why it's required to call it from a user initiated action.
The start
and stop
methods are used for manually controlling audio processing. When used together with client.isActive
, you have created simple on/off microphone toggle button.
The first time you press the microphone button you will be prompted for microphone permissions. If you have logSegments
enabled, you should see the API output in the developer console.
React to API updates
A common pattern when working with Speechly Browser Client is to use the client.onSegmentChange
method which adds a listener for current segment
change events.
First, create two elements for tentative and final transcripts.
<button id="mic">Start microphone</button>
<div id="transcripts"></div>
<p id="tentative"></p>
Next, in the onSgementChange
callback, create a transcript string from segment.words
array and render it to the appropriate elements.
//...
const transcripts = document.getElementById('transcripts');
const tentative = document.getElementById('tentative');
client.onSegmentChange((segment) => {
const text = segment.words.map((word) => word.value).join(' ');
tentative.innerHTML = `<em>${text}</em>`;
if (segment.isFinal) {
transcripts.innerHTML += `<p>${text}</em>`;
tentative.innerHTML = '';
}
});
segment
is a structure that accumulates speech recognition (ASR) and natural language understanding (NLU) results. When segment.isFinal
is false
, the segment might be updated several times. When true
, the segment won't be updated anymore and subsequent callbacks within the same audio context refer to the next segment.
Next steps
By now you should have a basic understanding of how to work with Speechly Browser Client and a working web app to that's able to handle speech input and and produce a transcript. Now you're ready to learn about some more advanced features of Speechly:
- Check out the example application on GitHub, it covers features such as voice activity detection, uploading audio files and how to deal with timestamps and NLU results.
- The source code for Speechly Demos is available in our GitHub repository, they are generally more complex applications.
- Also, be sure to check out the API reference documentation.