Skip to content

Commit

Permalink
Create Audio API.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ducanh-ho2296 authored Nov 18, 2024
1 parent 4983106 commit 898b4de
Showing 1 changed file with 176 additions and 0 deletions.
176 changes: 176 additions & 0 deletions docs/Model Serving/Audio API.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,176 @@
---
sidebar_position: 5
id: audio-api
title: Audio API Reference
tags:
- Audio API
- Transcription
- Translation
- OpenAI Integration

---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

## Introduction

T-System LLM Serving provides powerful audio APIs integrated with the OpenAI package, enabling seamless transcription and translation capabilities. These APIs support high-quality audio processing, offering features like automatic language detection, transcription in native languages, and translation into English. This guide demonstrates how to use the audio API with the OpenAI package.

:::info
To explore other API capabilities, such as Chat, Completion, or Vision models, refer to their respective documentation sections.
:::

## Environment Setup

To interact with the audio API, set up your environment variables as follows:

```bash
# Set API base URL
export API_BASE=https://llm-server.llmhub.t-systems.net

# Set your API key
export API_KEY=YOUR_LLMHUB_KEY
```

In Python, these environment variables will be automatically used by the `OpenAI` client.

## Model List

**Step 1: Install OpenAI package**
```bash
pip install openai
```

**Step 2: Initialize the OpenAI client for audio APIs**

<Tabs>
<TabItem value="py" label="Python" default>
```python showLineNumbers
import os
from openai import OpenAI

client = OpenAI(
api_key=os.getenv('API_KEY'),
base_url=os.getenv('API_BASE') + '/audio' # /audio endpoint for transcription and translation
)

models = client.models.list()
for model in models.data:
print(model.id)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash
curl -X GET "$API_BASE/audio/models" \
-H "Authorization: Bearer $API_KEY"
```
</TabItem>
</Tabs>

:::info
Example output:
```
whisper-1
```
`whisper-1` is the current supported model for audio processing.
:::

## Audio Transcription

The transcription API converts audio into text in the same language as the audio input. It automatically detects the language from the first 30 seconds of audio if the `language` parameter is not specified.

<Tabs>
<TabItem value="py" label="Python" default>
```python showLineNumbers
import time

audio_file_path = "/path/to/audio_file.mp3" # Path to your audio file

start = time.time()
with open(audio_file_path, "rb") as audio_file:
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="de", # Optional: specify input language (e.g., German)
temperature=0.0
)
stop = time.time()

print("\nTime taken: ", stop - start)
print("Transcription:", transcription)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash
curl -X POST "$API_BASE/audio/transcriptions" \
-H "Authorization: Bearer $API_KEY" \
-F "model=whisper-1" \
-F "language=de" \
-F "temperature=0.0" \
-F "file=@/path/to/audio_file.mp3"
```
</TabItem>
</Tabs>

**Example Output:**
```
Time taken: 9.11 seconds
Transcription: {'text': 'Eine kleine Dickmadam zog sich eine Hose an. Die Hose krachte, Dickmadam lachte, zog sie wieder aus und du bist raus.', 'task': 'transcribe', 'duration': 10.58}
```

:::info
- **`language`**: Optional. Specify the language of the input audio. If omitted, the model will auto-detect the language based on the first 30 seconds of audio.
- **`temperature`**: Adjusts the creativity of transcription output (default: `0.0` for deterministic results).
:::

## Audio Translation

The translation API translates audio content into English and transcribes it simultaneously.

<Tabs>
<TabItem value="py" label="Python" default>
```python showLineNumbers
import time

audio_file_path = "/path/to/audio_file.mp3" # Path to your audio file

start = time.time()
with open(audio_file_path, "rb") as audio_file:
translation = client.audio.translations.create(
model="whisper-1",
file=audio_file,
temperature=1.0
)
stop = time.time()

print("\nTime taken: ", stop - start)
print("Translation:", translation)
```
</TabItem>
<TabItem value="curl" label="cURL">
```bash
curl -X POST "$API_BASE/audio/translations" \
-H "Authorization: Bearer $API_KEY" \
-F "model=whisper-1" \
-F "temperature=1.0" \
-F "file=@/path/to/audio_file.mp3"
```
</TabItem>
</Tabs>

**Example Output:**
```
Time taken: 8.66 seconds
Translation: {'text': "A little lady took her pants off. The lady laughed, took them off again, and you're out.", 'task': 'translate', 'duration': 10.58}
```

:::info
- **Translation Use Case**: The translation API is useful for converting non-English audio into English for accessibility and analysis.
- **`temperature`**: Higher values (e.g., `1.0`) encourage more varied outputs, while lower values (e.g., `0.0`) generate deterministic results.
:::

## Key Features of Audio APIs

1. **Auto Language Detection**: Simplifies processing by identifying the input language automatically.
2. **Customizable Output**: Adjust transcription and translation behavior with parameters like `language` and `temperature`.
3. **Efficient Processing**: Low latency for both transcription and translation tasks.

0 comments on commit 898b4de

Please sign in to comment.