Skip to content

Conversation

@matejmarinko-soniox
Copy link
Contributor

This PR addresses issues preventing the Soniox STT plugin from working correctly with LiveKit's turn_detection_mode="stt" and adds missing usage metrics.

Fixes #4034.

End of speech event

Previously, the plugin received <end> tokens from Soniox but did not forward them to the LiveKit agent. This caused the agent to hang or rely on timeouts to close turns. Plugin now emits SpeechEventType.END_OF_SPEECH upon receiving an <end> token from Soniox.

Usage metrics

Implemented RecognitionUsage event emission. The plugin now calculates and reports audio duration based on total_audio_proc_ms from the Soniox response.

Other changes

  • Updated the default model from stt-rt-preview to stt-rt-v3. The old model is already just an alias to the stt-rt-v3 model.
  • VAD refactor (Breaking change): Removed the vad parameter from init and the internal VAD task. Manual finalization using VAD events was removed. The plugin now relies entirely on Soniox's native server-side endpoint detection. This simplifies the architecture and aligns behavior with other STT plugins (e.g., Deepgram, AssemblyAI).

@chenghao-mou chenghao-mou self-assigned this Dec 19, 2025
Copy link
Member

@chenghao-mou chenghao-mou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM. Tested locally with a key. This will likely be included in the release after the holidays.

@chenghao-mou chenghao-mou merged commit cb9f182 into livekit:main Dec 23, 2025
8 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Soniox STT Plugin Missing END_OF_SPEECH Events and RecognitionUsage Metrics

2 participants