Skip to content

Latest commit

 

History

History
72 lines (50 loc) · 2.3 KB

modelJson.md

File metadata and controls

72 lines (50 loc) · 2.3 KB

Multilinear Model JSON Format

Overview

This is the JSON data structure of the multilinear model that is output by model-builder. Currently, the multilinear model represents two subspaces, one for the speaker anatomy and another one for tongue pose, that are linked together by a core tensor. This format is identical to the YAML format with respect to functionality. However, it outputs everything in plain text, which increases the storage requirements.

The file consists of a map:

Dimensions

Here, the dimensions of the two subspaces are stored:

  • OriginalSpeakerMode : the original dimension of the speaker subspace that corresponds to the amount of speakers in the used sample database
  • OriginalPhonemeMode : original dimension of the tongue pose subspace that corresponds the amount of differenct phonemes in the sample database
  • TruncatedSpeakerMode and TruncatedPhonemeMode : dimensions of the two subspaces after truncation was applied
  • VertexMode : this is the amount of vertices of the used mesh structure multiplied by 3.

CoreTensor

This is a tensor of third order stored in serialized manner. The entries are furthermore saved in binary form in order to reduce the storage requirements. The entry at (i, j, k) in the tensor is available at index i TruncatedPhonemeMode VertexMode + j VertexMode + k.

MeanWeights

  • SpeakerMode : The mean coordinate of all speakers in the speaker subspace
  • PhonemeMode : The mean coordinate of all phonemes in the phoneme subspace

ShapeSpace

This map entry provides information about the shape space:

  • Origin : These are the vertex coordinates of the mean mesh in serialized manner: [x0, y0, z0, x1, x2, y2, z2, ...].
  • Faces : A list containing the faces of the meshes that are generated by the model. A face itself is a list of vertex indices.

Example

{

  "Dimensions" : {
    "OriginalSpeakerMode" : 9,
    "OriginalPhonemeMode" : 7,
    "TruncatedSpeakerMode" : 6,
    "TruncatedPhonemeMode" : 5,
    "VertexMode" : 9300
  },

  "CoreTensor": [
    1.234,
    2.984,
    ...
  ],


  "MeanWeights" : {
    "SpeakerMode" : [...],
    "PhonemeMode" : [...]
  },

  "ShapeSpace" : {
    "Origin": [1, 2, 3, 4, 5, ...],
    "Faces" : [[0, 1, 2], [2, 3, 4], ...]
  }

}