Go to the source code of this file.
◆ krispAudioVadCloseSession()
This function releases all data tied to this particular session, closes the given VAD session.
- Parameters
-
[in,out] | pSession | Handle to the VAD session to be closed |
- Return values
-
0 | success, negative on error |
◆ krispAudioVadCreateSession()
This function creates Voice Activity Detection session object ( VAD )
- Parameters
-
[in] | inputSampleRate | Sampling frequency of the input data. |
[in] | frameDuration | Frame duration |
[in] | modelName | The session ties to this model, and processes the future frames using it If modelName is nullptr then the SDK auto-detects the model based on input sampleRate. |
- Attention
- Always provide modelName explicitly to avoid ambiguity
- Returns
- created session handle
◆ krispAudioVadFrameFloat()
This function processes the given frame and returns the VAD detection value. Works with float values normalized in range [-1,1]
- Parameters
-
[in] | pSession | The VAD Session to which the frame belongs |
[in] | pFrameIn | Pointer to input frame. It's a continuous buffer with overall size of frameDuration * inputSampleRate / 1000 |
[in] | frameInSize | This is buffer size which must be frameDuration * inputSampleRate / 1000 |
- Returns
- Value in range [0,1]. The scale is adjusted so that 0.5 corresponds to the best F1 score on our test dataset (based on TIMIT core test dataset speech examples). The Threshold needs to be adjusted to fit a particular use case.
◆ krispAudioVadFrameInt16()
This function processes the given frame and returns the VAD detection value. Works with shorts (int16) with value in range [-2^15+1, 2^15]
- Parameters
-
[in] | pSession | The VAD Session to which the frame belongs |
[in] | pFrameIn | Pointer to input frame. It's a continuous buffer with overall size of frameDuration * inputSampleRate / 1000 |
[in] | frameInSize | This is buffer size which must be frameDuration * inputSampleRate / 1000 |
- Returns
- Value in range [0,1]. The scale is adjusted so that 0.5 corresponds to the best F1 score on our test dataset (based on TIMIT core test dataset speech examples). The Threshold needs to be adjusted to fit a particular use case.