Skip to content

Conversation

@sam-s10s
Copy link
Contributor

What's Changed?

  • Migrated from Speechmatics Real-Time SDK to new Voice SDK
  • Improved performance of the STT engine
  • Improved handling of speaker start / end events
  • Better handing of background speakers when in focus mode

Deprecation Warning

  • The end_of_utterance_mode has been changed to turn_detection_mode to reduce ambiguity
  • Use TurnDetectionMode.EXTERNAL (default) for any VAD within LiveKit
  • Use TurnDetectionMode.ADAPTIVE or TurnDetectionMode.SMART_TURN to use the plugin VAD / turn detection

Copy link
Contributor

@longcw longcw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! could you help to resolve the conflicts with the main branch. and something nit:

prefer_current_speaker=prefer_current_speaker,
focus_speakers=focus_speakers if is_given(focus_speakers) else [],
ignore_speakers=ignore_speakers if is_given(ignore_speakers) else [],
language=_set(language),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
language=_set(language),
language=language,

transcription_config.language = language

# Prepare the config
self._config = self._prepare_config(language)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stt.stream() will overwrite the self._config, what if there are multiple streams created for a single stt instance?

group: List of SpeechFragment objects.
logger.debug(f"{event} -> {message}")

async def _handle_partial_segment(self, message: dict[str, Any]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems all these handles including the _send_frames are sync methods, there is no need to add async for them.

"""Endpoint and turn detection handling mode.
How the STT engine handles the endpointing of speech. If using Pipecat's built-in endpointing,
then use `TurnDetectionMode.EXTERNAL`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Pipecat -> LiveKit

language=_set(language),
output_locale=_set(output_locale),
domain=_set(domain),
turn_detection_mode=_set(turn_detection_mode),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This and focus_mode do not allow None values.

@property
def model(self) -> str:
return "unknown"
return str(self._stt_options.turn_detection_mode) if self._stt_options else "UNKNOWN"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is usually reserved for STT model names like Whisper, so we really don't have to change it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants