Krisp Audio SDK v6.0.0
Loading...
Searching...
No Matches
krisp-audio-sdk-vad.hpp File Reference

Go to the source code of this file.

Functions

KRISP_AUDIO_API KrispAudioSessionID krispAudioVadCreateSession (KrispAudioSamplingRate inputSampleRate, KrispAudioFrameDuration frameDuration, const char *modelName)
 This function creates Voice Activity Detection session object ( VAD ) More...
 
KRISP_AUDIO_API int krispAudioVadCloseSession (KrispAudioSessionID pSession)
 This function releases all data tied to this particular session, closes the given VAD session. More...
 
KRISP_AUDIO_API float krispAudioVadFrameInt16 (KrispAudioSessionID pSession, const short *pFrameIn, unsigned int frameInSize)
 This function processes the given frame and returns the VAD detection value. Works with shorts (int16) with value in range [-2^15+1, 2^15] More...
 
KRISP_AUDIO_API float krispAudioVadFrameFloat (KrispAudioSessionID pSession, const float *pFrameIn, unsigned int frameInSize)
 This function processes the given frame and returns the VAD detection value. Works with float values normalized in range [-1,1] More...
 

Function Documentation

◆ krispAudioVadCloseSession()

KRISP_AUDIO_API int krispAudioVadCloseSession ( KrispAudioSessionID  pSession)

This function releases all data tied to this particular session, closes the given VAD session.

Parameters
[in,out]pSessionHandle to the VAD session to be closed
Return values
0success, negative on error

◆ krispAudioVadCreateSession()

KRISP_AUDIO_API KrispAudioSessionID krispAudioVadCreateSession ( KrispAudioSamplingRate  inputSampleRate,
KrispAudioFrameDuration  frameDuration,
const char *  modelName 
)

This function creates Voice Activity Detection session object ( VAD )

Parameters
[in]inputSampleRateSampling frequency of the input data.
[in]frameDurationFrame duration
[in]modelNameThe session ties to this model, and processes the future frames using it If modelName is nullptr then the SDK auto-detects the model based on input sampleRate.
Attention
Always provide modelName explicitly to avoid ambiguity
Returns
created session handle

◆ krispAudioVadFrameFloat()

KRISP_AUDIO_API float krispAudioVadFrameFloat ( KrispAudioSessionID  pSession,
const float *  pFrameIn,
unsigned int  frameInSize 
)

This function processes the given frame and returns the VAD detection value. Works with float values normalized in range [-1,1]

Parameters
[in]pSessionThe VAD Session to which the frame belongs
[in]pFrameInPointer to input frame. It's a continuous buffer with overall size of frameDuration * inputSampleRate / 1000
[in]frameInSizeThis is buffer size which must be frameDuration * inputSampleRate / 1000
Returns
Value in range [0,1]. The scale is adjusted so that 0.5 corresponds to the best F1 score on our test dataset (based on TIMIT core test dataset speech examples). The Threshold needs to be adjusted to fit a particular use case.

◆ krispAudioVadFrameInt16()

KRISP_AUDIO_API float krispAudioVadFrameInt16 ( KrispAudioSessionID  pSession,
const short *  pFrameIn,
unsigned int  frameInSize 
)

This function processes the given frame and returns the VAD detection value. Works with shorts (int16) with value in range [-2^15+1, 2^15]

Parameters
[in]pSessionThe VAD Session to which the frame belongs
[in]pFrameInPointer to input frame. It's a continuous buffer with overall size of frameDuration * inputSampleRate / 1000
[in]frameInSizeThis is buffer size which must be frameDuration * inputSampleRate / 1000
Returns
Value in range [0,1]. The scale is adjusted so that 0.5 corresponds to the best F1 score on our test dataset (based on TIMIT core test dataset speech examples). The Threshold needs to be adjusted to fit a particular use case.