This is the YAML data structure of the multilinear model that is output by model-builder. Currently, the multilinear model represents two subspaces, one for the speaker anatomy and another one for tongue pose, that are linked together by a core tensor.
The file consists of a map:
Here, the dimensions of the two subspaces are stored:
- OriginalSpeakerMode : the original dimension of the speaker subspace that corresponds to the amount of speakers in the used sample database
- OriginalPhonemeMode : original dimension of the tongue pose subspace that corresponds the amount of differenct phonemes in the sample database
- TruncatedSpeakerMode and TruncatedPhonemeMode : dimensions of the two subspaces after truncation was applied
- VertexMode : this is the amount of vertices of the used mesh structure multiplied by 3.
This is a tensor of third order stored in serialized manner. The entries are furthermore saved in binary form in order to reduce the storage requirements. The entry at (i, j, k) in the tensor is available at index i TruncatedPhonemeMode VertexMode + j VertexMode + k.
- SpeakerMode : The mean coordinate of all speakers in the speaker subspace
- PhonemeMode : The mean coordinate of all phonemes in the phoneme subspace
This map entry provides information about the shape space:
- Origin : These are the vertex coordinates of the mean mesh in serialized manner: [x0, y0, z0, x1, x2, y2, z2, ...].
- Faces : A list containing the faces of the meshes that are generated by the model. A face itself is a list of vertex indices.
Dimensions:
OriginalSpeakerMode: 9
OriginalPhonemeMode: 7
TruncatedSpeakerMode: 6
TruncatedPhonemeMode: 5
VertexMode: 9300
CoreTensor: !!binary [....]
MeanWeights:
SpeakerMode: [...]
PhonemeMode: [...]
ShapeSpace:
Origin: [1, 2, 3, 4, 5, ...]
Faces: [[0, 1, 2], [2, 3, 4], ...]