Recognize and synthesize natural language speech.
The Spezi Speech component provides an easy and convenient way to recognize (speech-to-text) and synthesize (text-to-speech) natural language content, facilitating seamless interaction with an application. It builds on top of Apple's Speech and AVFoundation frameworks.
You need to add the Spezi Speech Swift package to your app in Xcode or Swift package.
Important
If your application is not yet configured to use Spezi, follow the Spezi setup article to set up the core Spezi infrastructure.
The SpeechRecognizer
and SpeechSynthesizer
modules need to be registered in a Spezi-based application using the configuration
in a SpeziAppDelegate
:
class ExampleAppDelegate: SpeziAppDelegate {
override var configuration: Configuration {
Configuration {
SpeechRecognizer()
SpeechSynthesizer()
// ...
}
}
}
Note
You can learn more about a Module
in the Spezi documentation.
To ensure that your application has the necessary permissions for microphone access and speech recognition, follow the steps below to configure the target properties within your Xcode project:
- Open your project settings in Xcode by selecting PROJECT_NAME > TARGET_NAME > Info tab.
- You will need to add two entries to the
Custom iOS Target Properties
(so theInfo.plist
file) to provide descriptions for why your app requires these permissions:- Add a key named
Privacy - Microphone Usage Description
and provide a string value that describes why your application needs access to the microphone. This description will be displayed to the user when the app first requests microphone access. - Add another key named
Privacy - Speech Recognition Usage Description
with a string value that explains why your app requires the speech recognition capability. This will be presented to the user when the app first attempts to perform speech recognition.
- Add a key named
These entries are mandatory for apps that utilize microphone and speech recognition features. Failing to provide them will result in your app being unable to access these features.
SpeechTestView
provides a demonstration of the capabilities of Spezi Speech.
It showcases the interaction with the SpeechRecognizer
to provide speech-to-text capabilities and the SpeechSynthesizer
to generate speech from text.
struct SpeechTestView: View {
// Get the `SpeechRecognizer` and `SpeechSynthesizer` from the SwiftUI `Environment`.
@Environment(SpeechRecognizer.self) private var speechRecognizer
@Environment(SpeechSynthesizer.self) private var speechSynthesizer
// The transcribed message from the user's voice input.
@State private var message = ""
var body: some View {
VStack {
// Button used to start and stop recording by triggering the `microphoneButtonPressed()` function.
Button("Record") {
microphoneButtonPressed()
}
.padding(.bottom)
// Button used to start and stop playback of the transcribed message by triggering the `playbackButtonPressed()` function.
Button("Playback") {
playbackButtonPressed()
}
.padding(.bottom)
Text(message)
}
}
private func microphoneButtonPressed() {
if speechRecognizer.isRecording {
// If speech is currently recognized, stop the transcribing.
speechRecognizer.stop()
} else {
// If the recognizer is idle, start a new recording.
Task {
do {
// The `speechRecognizer.start()` function returns an `AsyncThrowingStream` that yields the transcribed text.
for try await result in speechRecognizer.start() {
// Access the string-based result of the transcribed result.
message = result.bestTranscription.formattedString
}
}
}
}
}
private func playbackButtonPressed() {
if speechSynthesizer.isSpeaking {
// If speech is currently synthezized, pause the playback.
speechSynthesizer.pause()
} else {
// If synthesizer is idle, start with the text-to-speech functionality.
speechSynthesizer.speak(message)
}
}
}
SpeziSpeech also supports selecting voices, including personal voices.
The following example shows how a user can be given a choice of voices in their current locale and the selected voice can be used to synthesize speech.
struct SpeechVoiceSelectionExample: View {
@Environment(SpeechSynthesizer.self) private var speechSynthesizer
@State private var selectedVoiceIndex = 0
@State private var message = ""
var body: some View {
VStack {
TextField("Enter text to be spoken", text: $message)
.textFieldStyle(RoundedBorderTextFieldStyle())
.padding()
Picker("Voice", selection: $selectedVoiceIndex) {
ForEach(speechSynthesizer.voices.indices, id: \.self) { index in
Text(speechSynthesizer.voices[index].name)
.tag(index)
}
}
.pickerStyle(.inline)
.accessibilityIdentifier("voicePicker")
.padding()
Button("Speak") {
speechSynthesizer.speak(
message,
voice: speechSynthesizer.voices[selectedVoiceIndex]
)
}
}
.padding()
}
}
Personal voices are supported on iOS 17 and above. Users must first create a personal voice. Using personal voices also requires obtaining authorization from the user. To request access to any available personal voices, you can use the getPersonalVoices()
method of the SpeechSynthesizer
. Personal voices will then become available alongside system voices.
For more information, please refer to the API documentation.
Contributions to this project are welcome. Please make sure to read the contribution guidelines and the contributor covenant code of conduct first.
This project is licensed under the MIT License. See Licenses for more information.