Text-to-speech capabilities (#11)

# Text-to-speech capabilities ## ♻️ Current situation & Problem Currently, SpeziChat provides speech-to-text capabilities but no integration with the text-to-speech capabilities of SpeziSpeech. ## ⚙️ Release Notes - Add text-to-speech capabilities to the `ChatView` via the `.speak()` and `.speechToolbarButton()` view modifiers ## 📚 Documentation Added inline Docs ## ✅ Testing Adjusted UI tests ## 📝 Code of Conduct & Contributing Guidelines By submitting creating this pull request, you agree to follow our [Code of Conduct](https://github.com/StanfordSpezi/.github/blob/main/CODE_OF_CONDUCT.md) and [Contributing Guidelines](https://github.com/StanfordSpezi/.github/blob/main/CONTRIBUTING.md): - [x] I agree to follow the [Code of Conduct](https://github.com/StanfordSpezi/.github/blob/main/CODE_OF_CONDUCT.md) and [Contributing Guidelines](https://github.com/StanfordSpezi/.github/blob/main/CONTRIBUTING.md).
StanfordSpezi · Feb 22, 2024 · ad4c88a · ad4c88a
1 parent 8233230
commit ad4c88a
Show file tree

Hide file tree

Showing 14 changed files with 409 additions and 47 deletions.
diff --git a/Package.swift b/Package.swift
@@ -27,7 +27,8 @@ let package = Package(
         .target(
             name: "SpeziChat",
             dependencies: [
-                .product(name: "SpeziSpeechRecognizer", package: "SpeziSpeech")
+                .product(name: "SpeziSpeechRecognizer", package: "SpeziSpeech"),
+                .product(name: "SpeziSpeechSynthesizer", package: "SpeziSpeech")
             ]
         ),
         .testTarget(

diff --git a/README.md b/README.md
@@ -55,7 +55,7 @@ These entries are mandatory for apps that utilize microphone and speech recognit
 ## Usage
 
 The underlying data model of [SpeziChat](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat) is a [`Chat`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chat). It represents the content of a typical text-based chat between user and system(s). A `Chat` is nothing more than an ordered array of [`ChatEntity`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity)s which contain the content of the individual messages.
-A `ChatEntity` consists of a [`ChatEntity/Role`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity/role-swift.enum), a timestamp as well as an `String`-based content which can contain Markdown-formatted text.
+A `ChatEntity` consists of a [`ChatEntity/Role`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity/role-swift.enum), a timestamp as well as an `String`-based content which can contain Markdown-formatted text. In addition, a flag indicates if the `ChatEntity` is complete and no further content will be added.
 
 > [!NOTE]  
 > The [`ChatEntity`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity) is able to store Markdown-based content which in turn is rendered as styled text in the `ChatView`, `MessagesView`, and `MessageView`.
@@ -79,6 +79,9 @@ struct ChatTestView: View {
 }
 ```
 
+> [!NOTE]
+> The `ChatView` provides speech-to-text (recognition) as well as text-to-speech (synthesize) accessibility capabilities out-of-the-box via the [`SpeziSpeech`](https://github.com/StanfordSpezi/SpeziSpeech) module, facilitating seamless interaction with the content of the `ChatView`.
+
 ### Messages View
 
 The [`MessagesView`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/messagesview) displays a `Chat` containing multiple `ChatEntity`s with different `ChatEntity/Role`s in a typical chat-like fashion.

diff --git a/Sources/SpeziChat/ChatView+SpeechButton.swift b/Sources/SpeziChat/ChatView+SpeechButton.swift
@@ -0,0 +1,107 @@
+//
+// This source file is part of the Stanford Spezi open source project
+//
+// SPDX-FileCopyrightText: 2023 Stanford University and the project authors (see CONTRIBUTORS.md)
+//
+// SPDX-License-Identifier: MIT
+//
+
+import SwiftUI
+
+
+/// The underlying `ViewModifier` of `View/speechToolbarButton(enabled:muted:)`.
+private struct ChatViewSpeechButtonModifier: ViewModifier {
+    @Binding var muted: Bool
+
+
+    func body(content: Content) -> some View {
+        content
+            .toolbar {
+                ToolbarItem(placement: .primaryAction) {
+                    Button(action: {
+                        muted.toggle()
+                    }) {
+                        if !muted {
+                            Image(systemName: "speaker")
+                                .accessibilityIdentifier("Speaker")
+                                .accessibilityLabel(Text("Text to speech is enabled, press to disable text to speech.", bundle: .module))
+                        } else {
+                            Image(systemName: "speaker.slash")
+                                .accessibilityIdentifier("Speaker strikethrough")
+                                .accessibilityLabel(Text("Text to speech is disabled, press to enable text to speech.", bundle: .module))
+                        }
+                    }
+                }
+            }
+    }
+}
+
+
+extension View {
+    /// Adds a toolbar `Button` to mute or unmute text-to-speech capabilities.
+    ///
+    /// When attaching the ``speechToolbarButton(enabled:muted:)`` modifier to a `View` that resides within a SwiftUI `NavigationStack`,
+    /// a `Button` is added to the toolbar that enables text-to-speech capabilities.
+    /// The outside `View` is able to observe taps on that `Button` via passing in a SwiftUI `Binding` as the `muted` parameter, directly tracking the state of the `Button` but also being able to modify it from the outside.
+    /// In addition, the button can be programatically hidden by adjusting the `enabled` parameter at any time.
+    ///
+    /// - Warning: Ensure that the ``ChatView`` resides within a SwiftUI `NavigationStack`, otherwise the added toolbar `Button` won't be shown.
+    ///
+    /// ### Usage
+    ///
+    /// The code snipped below demonstrates a minimal example of adding a text-to-speech toolbar button that mutes or unmutes text-to-speech output generation.
+    ///
+    /// ```swift
+    /// struct ChatTestView: View {
+    ///     @State private var chat: Chat = [
+    ///         ChatEntity(role: .assistant, content: "**Assistant** Message!")
+    ///     ]
+    ///     @State private var muted = true
+    ///
+    ///     var body: some View {
+    ///         ChatView($chat)
+    ///             .speak(chat, muted: muted)
+    ///             .speechToolbarButton(muted: $muted)
+    ///             .task {
+    ///                 // Add new completed `assistant` content to the `Chat` that is outputted via speech.
+    ///                 // ...
+    ///             }
+    ///     }
+    /// }
+    /// ```
+    ///
+    /// - Parameters:
+    ///    - muted: A SwiftUI `Binding` that indicates if the speech output is currently muted. The `Binding` enables the adjustment of the muted status by both the caller and the toolbar `Button`.
+    public func speechToolbarButton(
+        muted: Binding<Bool>
+    ) -> some View {
+        modifier(
+            ChatViewSpeechButtonModifier(
+                muted: muted
+            )
+        )
+    }
+}
+
+
+#if DEBUG
+#Preview {
+    @State var chat: Chat = .init(
+        [
+            ChatEntity(role: .system, content: "System Message!"),
+            ChatEntity(role: .system, content: "System Message (hidden)!"),
+            ChatEntity(role: .user, content: "User Message!"),
+            ChatEntity(role: .assistant, content: "Assistant Message!"),
+            ChatEntity(role: .function(name: "test_function"), content: "Function Message!")
+        ]
+    )
+    @State var muted = true
+
+
+    return NavigationStack {
+        ChatView($chat)
+            .speak(chat, muted: muted)
+            .speechToolbarButton(muted: $muted)
+    }
+}
+#endif
diff --git a/Sources/SpeziChat/ChatView+SpeechOutput.swift b/Sources/SpeziChat/ChatView+SpeechOutput.swift
@@ -0,0 +1,154 @@
+//
+// This source file is part of the Stanford Spezi open source project
+//
+// SPDX-FileCopyrightText: 2023 Stanford University and the project authors (see CONTRIBUTORS.md)
+//
+// SPDX-License-Identifier: MIT
+//
+
+import SpeziSpeechSynthesizer
+import SwiftUI
+
+
+/// The underlying `ViewModifier` of `View/speak(_:muted:)`.
+private struct ChatViewSpeechModifier: ViewModifier {
+    let chat: Chat
+    let muted: Bool
+
+    @Environment(\.scenePhase) private var scenePhase
+    @State private var speechSynthesizer = SpeechSynthesizer()
+
+
+    func body(content: Content) -> some View {
+        content
+            // Output speech when new complete assistant message is the last message
+            // Cancel speech output as soon as new message arrives with user role
+            .onChange(of: chat, initial: true) { _, _ in
+                guard !muted,
+                      let lastChatEntity = chat.last,
+                      lastChatEntity.complete else {
+                    return
+                }
+
+                if lastChatEntity.role == .assistant {
+                    speechSynthesizer.speak(lastChatEntity.content)
+                } else if lastChatEntity.role == .user {
+                    speechSynthesizer.stop()
+                }
+            }
+            // Cancel speech output when muted button is tapped in the toolbar
+            .onChange(of: muted) { _, newValue in
+                if newValue {
+                    speechSynthesizer.stop()
+                }
+            }
+            // Cancel speech output when view disappears
+            .onChange(of: scenePhase) { _, newValue in
+                switch newValue {
+                case .background, .inactive: speechSynthesizer.stop()
+                default: break
+                }
+            }
+    }
+}
+
+
+extension View {
+    /// Provides text-to-speech capabilities to the ``ChatView``.
+    ///
+    /// Attaching the modifier to a ``ChatView`` will enable the automatic speech output of the latest added ``ChatEntity/Role-swift.enum/assistant`` ``Chat`` message that is ``ChatEntity/complete``.
+    /// The text-to-speech capability can be muted via a `Bool` flag in the ``speak(_:muted:)`` modifier.
+    ///
+    /// It is important to note that only the latest ``ChatEntity/Role-swift.enum/assistant`` and ``ChatEntity/complete`` ``Chat`` messages will be synthesized to natural language speech, as soon as it is persisted in the ``Chat``.
+    /// The speech output is immediately stopped as soon as a ``ChatEntity/complete`` ``ChatEntity/Role-swift.enum/user`` message is added to the ``Chat``,
+    /// the passed `muted` `Binding` turns to `true`, or the `View` becomes inactive or is moved to the background.
+    ///
+    /// ### Usage
+    ///
+    /// The code snipped below demonstrates a minimal example of text-to-speech capabilities. At first, the speech output is muted, only after ten seconds the speech output of newly incoming ``Chat`` messages will be synthesized.
+    ///
+    /// ```swift
+    /// struct ChatTestView: View {
+    ///     @State private var chat: Chat = [
+    ///         ChatEntity(role: .assistant, content: "**Assistant** Message!")
+    ///     ]
+    ///     @State private var muted = true
+    ///
+    ///     var body: some View {
+    ///         ChatView($chat)
+    ///             .speak(chat, muted: muted)
+    ///             .task {
+    ///                 try? await Task.sleep(for: .seconds(10))
+    ///                 muted = false
+    ///
+    ///                 // Add new completed `assistant` content to the `Chat` that is outputted via speech.
+    ///                 // ...
+    ///             }
+    ///     }
+    /// }
+    /// ```
+    ///
+    /// - Parameters:
+    ///    - chat: The ``Chat`` which should be used for generating the speech output.
+    ///    - muted: Indicates if the speech output is currently muted, defaults to `false`.
+    public func speak(
+        _ chat: Chat,
+        muted: Bool = false
+    ) -> some View {
+        modifier(
+            ChatViewSpeechModifier(
+                chat: chat,
+                muted: muted
+            )
+        )
+    }
+}
+
+
+#if DEBUG
+#Preview("ChatView") {
+    @State var chat: Chat = .init(
+        [
+            ChatEntity(role: .system, content: "System Message!"),
+            ChatEntity(role: .system, content: "System Message (hidden)!"),
+            ChatEntity(role: .user, content: "User Message!"),
+            ChatEntity(role: .assistant, content: "Assistant Message!"),
+            ChatEntity(role: .function(name: "test_function"), content: "Function Message!")
+        ]
+    )
+
+    return NavigationStack {
+        ChatView($chat)
+    }
+}
+
+#Preview("ChatViewSpeechOutput") {
+    @State var chat: Chat = .init(
+        [
+            ChatEntity(role: .assistant, content: "Assistant Message!")
+        ]
+    )
+    @State var muted = false
+
+
+    return NavigationStack {
+        ChatView($chat)
+            .speak(chat, muted: muted)
+    }
+}
+
+#Preview("ChatViewSpeechOutputDisabled") {
+    @State var chat: Chat = .init(
+        [
+            ChatEntity(role: .assistant, content: "Assistant Message!")
+        ]
+    )
+    @State var muted = true
+
+
+    return NavigationStack {
+        ChatView($chat)
+            .speak(chat, muted: muted)
+    }
+}
+#endif