Text to Speech and Viseme Assignment

This is a module of my AI powered animatronics pipeline. It is responsible for taking input text and generating an audio file as well as timestamped markers for each viseme change.

The visemes can be used to translate directly to jaw movements on the animatronics.

Elevenlabs

Currently the whole thing is only powered by ElevenLabs. You will need an account and an API token on your environment as ELEVENLABS_TOKEN.

Starting the Server

You can run the server with ./main.py api and make a request like this:

curl -L http://127.0.0.1:5000/generate \
  -X POST \
  -d voiceName="[ElevenVoices] American Female Teen" \
  -d text="this is a test" \
  -d name="test 1"

response:

{
  "audio": "<base64 encoded hex string of the mp3 audio>",
  "audioLength": 1.1493877551020408,
  "emitRatio": 0.7,
  "mp3File": "/home/ken/projects/ai-skeletons/phone-generation/test 1.mp3",
  "outputName": "test 1",
  "prompt": "this is a test",
  "results": [
    ["0.060", "T"],
    ["0.100", "i"],
    ["0.190", "s"],
    ["0.250", "i"],
    ["0.340", "s"],
    ["0.380", "@"],
    ["0.480", "t"],
    ["0.530", "e"],
    ["0.700", "s"],
    ["0.810", "t"]
  ],
  "voiceID": "FxXx1SvSMrk96HmqFCUS",
  "voiceName": "[ElevenVoices] American Female Teen"
}

The results are an array containing the timestamp (in seconds) and the viseme symbol based on this table.

The audio can be decoded by reversing the encoding:

jq -r '.audio' test.json \
  | xxd -r -p \
  | base64 -d \
> test-1-decoded.mp3

Running Manually

You can also use the tool for generation without standing up the api using ./main.py generateFull

See the options here.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
audio		audio
chataigne-module		chataigne-module
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
inventory-full.csv		inventory-full.csv
inventory.csv		inventory.csv
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text to Speech and Viseme Assignment

Elevenlabs

Starting the Server

Running Manually

About

Releases

Packages

Languages

License

kkoch986/ai-skeletons-speech-generation

Folders and files

Latest commit

History

Repository files navigation

Text to Speech and Viseme Assignment

Elevenlabs

Starting the Server

Running Manually

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages