README
Novo Speech Library
Use the novo-speech
Javascript library to interact with the NovoLearning Speech API from within your (Angular) application.
Getting started
The Novo Speech Library contains a set of classes which help you to setup sessions and perform interactions. The library can be used in any VanillaJS and Angular project.
To be able to interact with the Novo Speech API from within the browser you need to obtain a client user token
. This token should be generated by your backend using the secret Publisher Secret Token
, which you received from NovoLearning.
Make sure to never publish your Publisher Secret Token
, since it allows to create and delete users on behalf of your publisher account.
Backend
Client User
Creating a Before you can use the library in your client application you'll have to create a Client User
. In order to create this token you need your Publisher ID
and Publisher Secret Token
, which you received from NovoLearning.
Perform the following CURL request to create a new Client User
:
curl -X POST 'https://gm.novo-learning.com/v0/publishers/${publisherId}/users'
-H 'Content-Type: application/json;charset=utf-8'
-H 'Authentication-Token': ${publisherSecretToken}'
--data-binary '{"username": ${username}, "password": ${password}}'
Note: never publish your
Publisher Secret Token
, since it allows others to sign up and delete user accounts
Client User Token
Generating a After signing up a user you can generate a Client User Token
. This token is used by the library to make calls on behalf of your user.
Create a new user by performing the following request:
curl -X POST 'https://gm.novo-learning.com/v0/publishers/${publisherId}/login'
-H 'Content-Type: application/json;charset=utf-8'
--data-binary '{"username": ${username}, "password": ${password}}'
The returned token allows your user to start ASR sessions. Provide this token to your frontend application.
By default a user is allowed to run a maximum of 5 ASR session at the same time so make sure to generate a unique token per user.
Client
To get started download and add the library to your project:
npm install @novo-learning/speech
Integrate the client library
The library is written in VanillaJS and shipped with an optional Angular Module. This documentation assumes the library to be used with an Angular application, which takes care of the required initialization. If you want to use this library in your Angular Application you can skip the following step, otherwise continue reading to understand how the library should be initialized.
Initializing in a non-angular application
The main entrypoint for this library is the NovoSpeechController
. There should only be one instance of the NovoSpeechController
througout your entire application (singleton).
Execute the following code to initialize a new instance of the NovoSpeechController
:
const speech = new NovoSpeechController({
/**
* The Client User Token as obtained in previous steps.
*
* Note that this can also be set via the `updateToken(token: string)`
* method.
*/
token?: string;
/**
* Full URL to the API endpoint.
* Leave empty to make use of the default endpoint.
*/
api?: string;
/**
* Force the recorder which should be used. By default
* the library will pick the best available recorder.
*
* Options:
* - RecorderApi.HTML5_WEBAUDIO
* - RecorderApi.HTML5_MEDIARECORDER
* - RecorderApi.CORDOVA
*/
recorder?: RecorderApi;
});
Initializing in your Angular application
To get started using the Novo Speech API in your Angular Application you have to include the NovoSpeechModule
in your application's main module (usually app.module.ts
):
@NgModule({
declarations: [
AppComponent
],
imports: [
NovoSpeechModule.forRoot({
token?: "<-- your user token. Note that this can also be set via the `updateToken(token: string)` method. -->"
api?: "<!-- api endpoint to use. Leave empty to use the default endpoint -->"
recorder?: "<!-- RecorderApi to use. Leave empty to pick the best available -->"
})
],
bootstrap: [AppComponent]
})
export class AppModule { }
The module provides a singleton of the NovoSpeechController
which you can use in any of your services or components:
constructor(private speech: NovoSpeechController) { }
Establish a session
Before you can interact with the API you need to establish a (new) session. Establishing a new session can be done by invoking the ensureSession(language: LanguageCode, sNodeId?: number)
method with your desired language:
const session = await speech.ensureSession('en');
Note: only provide the
sNodeId
when you know what you're doing. By default it should be left empty.
Note: sessions are reused when
ensureSession()
is invoked with the same language (andsNodeId
).
Initialize a Grammar Interaction
Now you've established your session it's time to initialize a Grammer Interaction. To initialize a GrammarInteraction instance you'll have to call one of the three available factory methods on the NovoSpeechController
. Each method requires you to pass the active SessionManager as its first argument, followed by the GrammarInteraction type's specific parameters.
There are currently three GrammarInteraction types supported which can be initialized by invoking one of the following methods:
Multiple Choice Grammar
const grammar = await speech.getMultipleChoiceGrammar(session: SessionManager, options: string[], returnIntermediateResults = false): Promise<NovoSpeechGrammarInteraction>```
Call this method to initialize a new MultipleChoice GrammarInteraction. This interaction type can be used to recognize whether the user said one of the given options.
Forced Alignment Grammar
const grammar = await speech.getForcedAlignmentGrammar(session: SessionManager, text: string | string[], returnIntermediateResults = false): Promise<NovoSpeechGrammarInteraction>
Initializes a new Forced Alignment GrammarInteraction. This interaction assumes the learner will say the given text, and can be used to obtain detailed pronunciation information in the form of word- and phone level confidence scores.
Open Recording Grammar
const grammar = await speech.getOpenRecordingGrammar(session: SessionManager): Promise<NovoSpeechGrammarInteraction>
Open Recording GrammarInteractions can be used to create (and store) audio recordings. No actual speech recognition will take place.
Interact with your GrammarInteraction
When you've initialized your GrammarInteraction
instance it's time to start interacting with it. Each GrammarInteraction
instance has two public Observable properties which emit (intermediate) results:
grammar.result$: Observable<AsrResult>
grammar.intermediateResults$: Observable<AsrIntermediateResult>
These observables will emit results during your recording (intermediateResults$
), or as soon as your recording stops (result$
).
To start a recording invoke the grammar.record(useEOS?: boolean): void
method. This will start listening for the user to interact and stop recording automatically as soon as the End-Of-Sentence (EOS) is detected.
You can also manually stop a recording by invoking grammar.stop(skipRecognition?: boolean)
.
Assuming your session remains active each GrammarInteraction instance can be reused for an infinite amount of times to make subsequent recordings.
Example
Putting it all together an interaction component (Angular) could look something like this:
@Component({ ... })
export class InteractionComponent {
/**
* Holds the options for the current exercise. These
* could, eg, be updated by a textarea in your component.
*/
readonly exercise$: BehaviorSubject<string[]> = new BehaviorSubject<string[]>([
'this',
'that'
]);
/**
* The current exercise language.
*/
readonly language$: BehaviorSubject<string> = new BehaviorSubject<string>('en');
/**
* Create a new session for the active language and share it
* among all subscribers.
*/
readonly session$: Observable<SessionManager> = this.language$.pipe(
distinctUntilChanged(),
switchMap(lang => from(this.speech.ensureSession(lang))),
shareReplay(1)
);
/**
* Indicates if the current session is loading
*/
readonly loadingSession$: Observable<boolean> = this.session$.pipe(
switchMap(s => s.state$),
map(s => s === AsrSessionState.StartingSession)
);
/**
* Create a GrammarInteraction instance for the provided exercise
* and session.
*/
readonly interaction$: Observable<NovoSpeechGrammarInteraction> = combineLatest([this.exercise$, this.session$]).pipe(
switchMap(([exercise, session]) =>
from(this.speech.getMultipleChoiceGrammar(session, exercise))
),
shareReplay(1),
);
/**
* Read intermediate results from the active interaction
*/
readonly intermediateResults$: Observable<AsrIntermediateResult> = this.interaction$.pipe(
switchMap(i => i.intermediateResults$),
filter((r): r is AsrIntermediateResult => isIntermediateAsrResult(r)),
);
/**
* Read final results from the interaction set
*/
readonly asrResult$: Observable<AsrResult> = this.interaction$.pipe(
switchMap(i => i.result$),
filter((r): r is AsrResult => !isIntermediateAsrResult(r)),
);
constructor(readonly speech: NovoSpeechController) { }
async toggleRecording(interactionSet: NovoSpeechGrammarInteraction) {
try {
if (!interactionSet.isActivated()) {
await interactionSet.activate();
}
(interactionSet.isRecording()) ? interactionSet.stop() : interactionSet.record();
} catch (err) {
console.error(`Can't start recording`, err?.message || err);
}
}
}
Take a look at the novo-speech-example-app
directory for a full working example.
Note: you have to copy the
environment.copy.ts
file before you can start the demo application.
Development
Developing the library for local use inside a different repository
First, run npm install
in the root of this repository. Then run from this directory:
npm run build
npm test
cd dist
yarn link
Now in your other repository, you can run
yarn link "@novo-learning/speech"
and your local copy of this repo will be used instead of the published version.