Skip to content

Conversation

@princeaden1
Copy link
Collaborator

@princeaden1 princeaden1 commented Sep 23, 2025

feat: text-to-speech for AI responses

Add text-to-speech functionality to AI chat responses with e2e testing.

Changes

  • TTS Button: Play/stop button for AI messages with state management
  • Voice Support: Uses browser's Web Speech API with voice selection
  • UI States: Visual feedback with play/stop icons and tooltips
  • E2E Tests: Complete test coverage following existing patterns
audio.mov

Summary by cubic

Adds text-to-speech to assistant messages so users can listen to chat replies. Includes a play/stop control and end-to-end tests.

  • New Features
    • Play/stop button on assistant messages with icons and tooltips
    • Web Speech API integration with voice selection and support check
    • Text cleaning strips markdown, HTML, and Dyad tags before reading
    • E2E test mocks speechSynthesis and verifies state changes

Comment on lines 181 to 197
const toggle = (
text?: string,
options?: { rate?: number; pitch?: number; volume?: number },
) => {
speechSynthesis.cancel();
if (isPlaying) {
if (isPaused) {
resume();
} else {
pause();
}
} else {
if (text) {
speak(text, options);
}
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a logical issue in the toggle function where speechSynthesis.cancel() is called unconditionally at the start, which stops any ongoing speech. This prevents the pause/resume functionality from working correctly, as there's no speech left to pause or resume after cancellation.

Consider restructuring the function to only cancel speech when necessary:

const toggle = (
  text?: string,
  options?: { rate?: number; pitch?: number; volume?: number },
) => {
  if (isPlaying) {
    if (isPaused) {
      resume();
    } else {
      pause();
    }
  } else {
    // Only cancel and start new speech when not currently playing
    speechSynthesis.cancel();
    if (text) {
      speak(text, options);
    }
  }
};

This way, pause/resume will work as expected while still ensuring clean state when starting new speech.

Suggested change
const toggle = (
text?: string,
options?: { rate?: number; pitch?: number; volume?: number },
) => {
speechSynthesis.cancel();
if (isPlaying) {
if (isPaused) {
resume();
} else {
pause();
}
} else {
if (text) {
speak(text, options);
}
}
};
const toggle = (
text?: string,
options?: { rate?: number; pitch?: number; volume?: number },
) => {
if (isPlaying) {
if (isPaused) {
resume();
} else {
pause();
}
} else {
speechSynthesis.cancel();
if (text) {
speak(text, options);
}
}
};

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 34 to 37
speak: (utterance: MockSpeechSynthesisUtterance) => {
isPlaying = true;
setTimeout(() => utterance.onstart?.(), 10);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mock implementation of speechSynthesis.speak() currently sets up the onstart callback but doesn't trigger the onend callback. This creates an incomplete simulation of the TTS lifecycle, as the test can't verify that the functionality properly completes and resets its state.

Consider adding another setTimeout to trigger utterance.onend() after a reasonable delay:

speak: (utterance: MockSpeechSynthesisUtterance) => {
  isPlaying = true;
  setTimeout(() => utterance.onstart?.(), 10);
  setTimeout(() => utterance.onend?.(), 100); // Add this to complete the lifecycle
},

This would allow the test to verify both the start and completion states of the TTS functionality.

Suggested change
speak: (utterance: MockSpeechSynthesisUtterance) => {
isPlaying = true;
setTimeout(() => utterance.onstart?.(), 10);
},
speak: (utterance: MockSpeechSynthesisUtterance) => {
isPlaying = true;
setTimeout(() => utterance.onstart?.(), 10);
setTimeout(() => utterance.onend?.(), 100);
},

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 3 files

Prompt for AI agents (all 6 issues)

Understand the root cause of the following 6 issues and fix them.


<file name="e2e-tests/text_to_speech.spec.ts">

<violation number="1" location="e2e-tests/text_to_speech.spec.ts:9">
Mock injection added after page load: addInitScript runs only on new document contexts, so the current page won&#39;t be mocked. Move addInitScript before navigation or reload after adding.</violation>
</file>

<file name="src/hooks/useTextToSpeech.ts">

<violation number="1" location="src/hooks/useTextToSpeech.ts:14">
Guard getVoices() to avoid ReferenceError when Web Speech API is unavailable.</violation>

<violation number="2" location="src/hooks/useTextToSpeech.ts:42">
Unconditional cancel in toggle() prevents proper pause/resume; remove the cancel here (speak() already cancels when starting).</violation>

<violation number="3" location="src/hooks/useTextToSpeech.ts:136">
Use ?? so a provided 0 pitch is not overridden.</violation>

<violation number="4" location="src/hooks/useTextToSpeech.ts:137">
Use ?? so volume 0 (mute) is honored.</violation>

<violation number="5" location="src/hooks/useTextToSpeech.ts:200">
Referencing window during render will crash in SSR. Guard with typeof window !== &quot;undefined&quot;.</violation>
</file>

React with 👍 or 👎 to teach cubic. Mention @cubic-dev-ai to give feedback, ask questions, or re-run the review.

await po.importApp("minimal");

// Mock speechSynthesis API
await po.page.addInitScript(() => {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mock injection added after page load: addInitScript runs only on new document contexts, so the current page won't be mocked. Move addInitScript before navigation or reload after adding.

Prompt for AI agents
Address the following comment on e2e-tests/text_to_speech.spec.ts at line 9:

<comment>Mock injection added after page load: addInitScript runs only on new document contexts, so the current page won&#39;t be mocked. Move addInitScript before navigation or reload after adding.</comment>

<file context>
@@ -0,0 +1,55 @@
+  await po.importApp(&quot;minimal&quot;);
+
+  // Mock speechSynthesis API
+  await po.page.addInitScript(() =&gt; {
+    let isPlaying = false;
+
</file context>

✅ Addressed in 298da44

useEffect(() => {
return () => {
if (utteranceRef.current) {
speechSynthesis.cancel();
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unconditional cancel in toggle() prevents proper pause/resume; remove the cancel here (speak() already cancels when starting).

Prompt for AI agents
Address the following comment on src/hooks/useTextToSpeech.ts at line 42:

<comment>Unconditional cancel in toggle() prevents proper pause/resume; remove the cancel here (speak() already cancels when starting).</comment>

<file context>
@@ -0,0 +1,214 @@
+  useEffect(() =&gt; {
+    return () =&gt; {
+      if (utteranceRef.current) {
+        speechSynthesis.cancel();
+      }
+    };
</file context>

✅ Addressed in 298da44

};

// Check if TTS is supported
const isSupported = "speechSynthesis" in window;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Referencing window during render will crash in SSR. Guard with typeof window !== "undefined".

Prompt for AI agents
Address the following comment on src/hooks/useTextToSpeech.ts at line 200:

<comment>Referencing window during render will crash in SSR. Guard with typeof window !== &quot;undefined&quot;.</comment>

<file context>
@@ -0,0 +1,214 @@
+  };
+
+  // Check if TTS is supported
+  const isSupported = &quot;speechSynthesis&quot; in window;
+  return {
+    speak,
</file context>
Suggested change
const isSupported = "speechSynthesis" in window;
const isSupported = typeof window !== "undefined" && "speechSynthesis" in window;

✅ Addressed in 298da44

}

utterance.rate = options?.rate || 1;
utterance.pitch = options?.pitch || 1;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ?? so a provided 0 pitch is not overridden.

Prompt for AI agents
Address the following comment on src/hooks/useTextToSpeech.ts at line 136:

<comment>Use ?? so a provided 0 pitch is not overridden.</comment>

<file context>
@@ -0,0 +1,214 @@
+    }
+
+    utterance.rate = options?.rate || 1;
+    utterance.pitch = options?.pitch || 1;
+    utterance.volume = options?.volume || 1;
+
</file context>
Suggested change
utterance.pitch = options?.pitch || 1;
utterance.pitch = options?.pitch ?? 1;
Fix with Cubic

// Load available voices
useEffect(() => {
const loadVoices = () => {
const availableVoices = speechSynthesis.getVoices();
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guard getVoices() to avoid ReferenceError when Web Speech API is unavailable.

Prompt for AI agents
Address the following comment on src/hooks/useTextToSpeech.ts at line 14:

<comment>Guard getVoices() to avoid ReferenceError when Web Speech API is unavailable.</comment>

<file context>
@@ -0,0 +1,214 @@
+  // Load available voices
+  useEffect(() =&gt; {
+    const loadVoices = () =&gt; {
+      const availableVoices = speechSynthesis.getVoices();
+      setVoices(availableVoices);
+
</file context>
Suggested change
const availableVoices = speechSynthesis.getVoices();
const availableVoices = typeof window !== "undefined" && "speechSynthesis" in window ? window.speechSynthesis.getVoices() : [];

✅ Addressed in 298da44


utterance.rate = options?.rate || 1;
utterance.pitch = options?.pitch || 1;
utterance.volume = options?.volume || 1;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use ?? so volume 0 (mute) is honored.

Prompt for AI agents
Address the following comment on src/hooks/useTextToSpeech.ts at line 137:

<comment>Use ?? so volume 0 (mute) is honored.</comment>

<file context>
@@ -0,0 +1,214 @@
+
+    utterance.rate = options?.rate || 1;
+    utterance.pitch = options?.pitch || 1;
+    utterance.volume = options?.volume || 1;
+
+    // Event handlers
</file context>
Suggested change
utterance.volume = options?.volume || 1;
utterance.volume = options?.volume ?? 1;

✅ Addressed in 298da44

Comment on lines 207 to 208
<CircleStop className="h-4 w-4 text-green-500" />
<span className="sm:inline">Stop</span>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Stop" text appears to be inconsistently styled compared to the play state. When playing, the text shows <span className="sm:inline">Stop</span>, but this will only display on small screens and up. For consistency and better UX, consider making the "Stop" text always visible when the button is in the stop state, since it provides important context to users. Either remove the sm:inline class or ensure the parent span doesn't have a hidden class that might be affecting visibility on mobile devices.

Suggested change
<CircleStop className="h-4 w-4 text-green-500" />
<span className="sm:inline">Stop</span>
<CircleStop className="h-4 w-4 text-green-500" />
<span>Stop</span>

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Comment on lines 143 to 146
utterance.onstart = () => {
setIsPlaying(true);
setIsPaused(false);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race condition bug: The isPlaying state is set asynchronously in the onstart callback, but the ChatMessage component checks this state synchronously when deciding whether to start or stop playback. If a user clicks the TTS button rapidly before the onstart callback fires, the component will think speech isn't playing and call toggle() again instead of stop(), causing speech to restart rather than stop. This creates confusing UX where the button appears to be malfunctioning. Fix by either: 1) Setting isPlaying = true immediately in the speak() function before calling speechSynthesis.speak(), or 2) Using speechSynthesis.speaking property to check current speech state more reliably, or 3) Adding a 'starting' state to prevent double-clicks during the async startup period.

Spotted by Diamond

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

@wwwillchen
Copy link
Collaborator

thanks @princeaden1. i pulled it down and it does work, but I'm not super sure about the use case - is it for accessibility or some other reason? in general - i'd recommend opening an issue for new feature requests to discuss the use case and see whether it's aligned with the projects direction. for example, right now it's reading the content in the think tags which can be quite verbose and somewhat confusing for users IMO.

@princeaden1
Copy link
Collaborator Author

@wwwillchen Thanks for taking time to review the PR🙏. This PR is for both accessibility and to let devs listen to responses if they don’t want to read through chats (similar to ChatGPT’s feature). The think tags can be also removed from the spoken output. I'll also make sure to open an issue first for future feature requests to be first discussed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants