Skip to content

Commit

Permalink
Text-to-speech capabilities (#11)
Browse files Browse the repository at this point in the history
# Text-to-speech capabilities

## ♻️ Current situation & Problem
Currently, SpeziChat provides speech-to-text capabilities but no
integration with the text-to-speech capabilities of SpeziSpeech.


## ⚙️ Release Notes 
- Add text-to-speech capabilities to the `ChatView` via the `.speak()`
and `.speechToolbarButton()` view modifiers


## 📚 Documentation
Added inline Docs


## ✅ Testing
Adjusted UI tests


## 📝 Code of Conduct & Contributing Guidelines 

By submitting creating this pull request, you agree to follow our [Code
of
Conduct](https://github.com/StanfordSpezi/.github/blob/main/CODE_OF_CONDUCT.md)
and [Contributing
Guidelines](https://github.com/StanfordSpezi/.github/blob/main/CONTRIBUTING.md):
- [x] I agree to follow the [Code of
Conduct](https://github.com/StanfordSpezi/.github/blob/main/CODE_OF_CONDUCT.md)
and [Contributing
Guidelines](https://github.com/StanfordSpezi/.github/blob/main/CONTRIBUTING.md).
  • Loading branch information
philippzagar authored Feb 22, 2024
1 parent 8233230 commit ad4c88a
Show file tree
Hide file tree
Showing 14 changed files with 409 additions and 47 deletions.
3 changes: 2 additions & 1 deletion Package.swift
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,8 @@ let package = Package(
.target(
name: "SpeziChat",
dependencies: [
.product(name: "SpeziSpeechRecognizer", package: "SpeziSpeech")
.product(name: "SpeziSpeechRecognizer", package: "SpeziSpeech"),
.product(name: "SpeziSpeechSynthesizer", package: "SpeziSpeech")
]
),
.testTarget(
Expand Down
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ These entries are mandatory for apps that utilize microphone and speech recognit
## Usage

The underlying data model of [SpeziChat](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat) is a [`Chat`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chat). It represents the content of a typical text-based chat between user and system(s). A `Chat` is nothing more than an ordered array of [`ChatEntity`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity)s which contain the content of the individual messages.
A `ChatEntity` consists of a [`ChatEntity/Role`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity/role-swift.enum), a timestamp as well as an `String`-based content which can contain Markdown-formatted text.
A `ChatEntity` consists of a [`ChatEntity/Role`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity/role-swift.enum), a timestamp as well as an `String`-based content which can contain Markdown-formatted text. In addition, a flag indicates if the `ChatEntity` is complete and no further content will be added.

> [!NOTE]
> The [`ChatEntity`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/chatentity) is able to store Markdown-based content which in turn is rendered as styled text in the `ChatView`, `MessagesView`, and `MessageView`.
Expand All @@ -79,6 +79,9 @@ struct ChatTestView: View {
}
```

> [!NOTE]
> The `ChatView` provides speech-to-text (recognition) as well as text-to-speech (synthesize) accessibility capabilities out-of-the-box via the [`SpeziSpeech`](https://github.com/StanfordSpezi/SpeziSpeech) module, facilitating seamless interaction with the content of the `ChatView`.
### Messages View

The [`MessagesView`](https://swiftpackageindex.com/stanfordspezi/spezichat/documentation/spezichat/messagesview) displays a `Chat` containing multiple `ChatEntity`s with different `ChatEntity/Role`s in a typical chat-like fashion.
Expand Down
107 changes: 107 additions & 0 deletions Sources/SpeziChat/ChatView+SpeechButton.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
//
// This source file is part of the Stanford Spezi open source project
//
// SPDX-FileCopyrightText: 2023 Stanford University and the project authors (see CONTRIBUTORS.md)
//
// SPDX-License-Identifier: MIT
//

import SwiftUI


/// The underlying `ViewModifier` of `View/speechToolbarButton(enabled:muted:)`.
private struct ChatViewSpeechButtonModifier: ViewModifier {
@Binding var muted: Bool


func body(content: Content) -> some View {
content
.toolbar {
ToolbarItem(placement: .primaryAction) {
Button(action: {
muted.toggle()
}) {
if !muted {
Image(systemName: "speaker")
.accessibilityIdentifier("Speaker")
.accessibilityLabel(Text("Text to speech is enabled, press to disable text to speech.", bundle: .module))
} else {
Image(systemName: "speaker.slash")
.accessibilityIdentifier("Speaker strikethrough")
.accessibilityLabel(Text("Text to speech is disabled, press to enable text to speech.", bundle: .module))
}
}
}
}
}
}


extension View {
/// Adds a toolbar `Button` to mute or unmute text-to-speech capabilities.
///
/// When attaching the ``speechToolbarButton(enabled:muted:)`` modifier to a `View` that resides within a SwiftUI `NavigationStack`,
/// a `Button` is added to the toolbar that enables text-to-speech capabilities.
/// The outside `View` is able to observe taps on that `Button` via passing in a SwiftUI `Binding` as the `muted` parameter, directly tracking the state of the `Button` but also being able to modify it from the outside.
/// In addition, the button can be programatically hidden by adjusting the `enabled` parameter at any time.
///
/// - Warning: Ensure that the ``ChatView`` resides within a SwiftUI `NavigationStack`, otherwise the added toolbar `Button` won't be shown.
///
/// ### Usage
///
/// The code snipped below demonstrates a minimal example of adding a text-to-speech toolbar button that mutes or unmutes text-to-speech output generation.
///
/// ```swift
/// struct ChatTestView: View {
/// @State private var chat: Chat = [
/// ChatEntity(role: .assistant, content: "**Assistant** Message!")
/// ]
/// @State private var muted = true
///
/// var body: some View {
/// ChatView($chat)
/// .speak(chat, muted: muted)
/// .speechToolbarButton(muted: $muted)
/// .task {
/// // Add new completed `assistant` content to the `Chat` that is outputted via speech.
/// // ...
/// }
/// }
/// }
/// ```
///
/// - Parameters:
/// - muted: A SwiftUI `Binding` that indicates if the speech output is currently muted. The `Binding` enables the adjustment of the muted status by both the caller and the toolbar `Button`.
public func speechToolbarButton(
muted: Binding<Bool>
) -> some View {
modifier(
ChatViewSpeechButtonModifier(
muted: muted
)
)
}
}


#if DEBUG
#Preview {
@State var chat: Chat = .init(
[
ChatEntity(role: .system, content: "System Message!"),
ChatEntity(role: .system, content: "System Message (hidden)!"),
ChatEntity(role: .user, content: "User Message!"),
ChatEntity(role: .assistant, content: "Assistant Message!"),
ChatEntity(role: .function(name: "test_function"), content: "Function Message!")
]
)
@State var muted = true


return NavigationStack {
ChatView($chat)
.speak(chat, muted: muted)
.speechToolbarButton(muted: $muted)
}
}
#endif
154 changes: 154 additions & 0 deletions Sources/SpeziChat/ChatView+SpeechOutput.swift
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
//
// This source file is part of the Stanford Spezi open source project
//
// SPDX-FileCopyrightText: 2023 Stanford University and the project authors (see CONTRIBUTORS.md)
//
// SPDX-License-Identifier: MIT
//

import SpeziSpeechSynthesizer
import SwiftUI


/// The underlying `ViewModifier` of `View/speak(_:muted:)`.
private struct ChatViewSpeechModifier: ViewModifier {
let chat: Chat
let muted: Bool

@Environment(\.scenePhase) private var scenePhase
@State private var speechSynthesizer = SpeechSynthesizer()


func body(content: Content) -> some View {
content
// Output speech when new complete assistant message is the last message
// Cancel speech output as soon as new message arrives with user role
.onChange(of: chat, initial: true) { _, _ in
guard !muted,
let lastChatEntity = chat.last,
lastChatEntity.complete else {
return
}

if lastChatEntity.role == .assistant {
speechSynthesizer.speak(lastChatEntity.content)
} else if lastChatEntity.role == .user {
speechSynthesizer.stop()
}
}
// Cancel speech output when muted button is tapped in the toolbar
.onChange(of: muted) { _, newValue in
if newValue {
speechSynthesizer.stop()
}
}
// Cancel speech output when view disappears
.onChange(of: scenePhase) { _, newValue in
switch newValue {
case .background, .inactive: speechSynthesizer.stop()
default: break
}
}
}
}


extension View {
/// Provides text-to-speech capabilities to the ``ChatView``.
///
/// Attaching the modifier to a ``ChatView`` will enable the automatic speech output of the latest added ``ChatEntity/Role-swift.enum/assistant`` ``Chat`` message that is ``ChatEntity/complete``.
/// The text-to-speech capability can be muted via a `Bool` flag in the ``speak(_:muted:)`` modifier.
///
/// It is important to note that only the latest ``ChatEntity/Role-swift.enum/assistant`` and ``ChatEntity/complete`` ``Chat`` messages will be synthesized to natural language speech, as soon as it is persisted in the ``Chat``.
/// The speech output is immediately stopped as soon as a ``ChatEntity/complete`` ``ChatEntity/Role-swift.enum/user`` message is added to the ``Chat``,
/// the passed `muted` `Binding` turns to `true`, or the `View` becomes inactive or is moved to the background.
///
/// ### Usage
///
/// The code snipped below demonstrates a minimal example of text-to-speech capabilities. At first, the speech output is muted, only after ten seconds the speech output of newly incoming ``Chat`` messages will be synthesized.
///
/// ```swift
/// struct ChatTestView: View {
/// @State private var chat: Chat = [
/// ChatEntity(role: .assistant, content: "**Assistant** Message!")
/// ]
/// @State private var muted = true
///
/// var body: some View {
/// ChatView($chat)
/// .speak(chat, muted: muted)
/// .task {
/// try? await Task.sleep(for: .seconds(10))
/// muted = false
///
/// // Add new completed `assistant` content to the `Chat` that is outputted via speech.
/// // ...
/// }
/// }
/// }
/// ```
///
/// - Parameters:
/// - chat: The ``Chat`` which should be used for generating the speech output.
/// - muted: Indicates if the speech output is currently muted, defaults to `false`.
public func speak(
_ chat: Chat,
muted: Bool = false
) -> some View {
modifier(
ChatViewSpeechModifier(
chat: chat,
muted: muted
)
)
}
}


#if DEBUG
#Preview("ChatView") {
@State var chat: Chat = .init(
[
ChatEntity(role: .system, content: "System Message!"),
ChatEntity(role: .system, content: "System Message (hidden)!"),
ChatEntity(role: .user, content: "User Message!"),
ChatEntity(role: .assistant, content: "Assistant Message!"),
ChatEntity(role: .function(name: "test_function"), content: "Function Message!")
]
)

return NavigationStack {
ChatView($chat)
}
}

#Preview("ChatViewSpeechOutput") {
@State var chat: Chat = .init(
[
ChatEntity(role: .assistant, content: "Assistant Message!")
]
)
@State var muted = false


return NavigationStack {
ChatView($chat)
.speak(chat, muted: muted)
}
}

#Preview("ChatViewSpeechOutputDisabled") {
@State var chat: Chat = .init(
[
ChatEntity(role: .assistant, content: "Assistant Message!")
]
)
@State var muted = true


return NavigationStack {
ChatView($chat)
.speak(chat, muted: muted)
}
}
#endif
Loading

0 comments on commit ad4c88a

Please sign in to comment.