fiwGAN (Featural InfoWaveGAN): Lexical Learning in Generative Adversarial Phonology

PAPER HERE: https://www.sciencedirect.com/science/article/pii/S0893608021001052

In fiwGAN.py. An architecture for modeling lexical learning from raw acoustic inputs called Featural InfoWaveGAN (fiwGAN) that combines Deep Convolutional GAN architecture for audio data (WaveGAN) with categorical variables in information theoretic proposal InfoGAN. Unlike InfoGAN, latent code is distributed binomially and the training is performed with sigmoid cross-entropy. Based on WaveGAN (Donahue et al. 2019) and InfoGAN (Chen et al. 2016), partially also on code by Rodionov (2018).

ciwGAN (Categorical InfoWaveGAN)

An architecture for modeling lexical learning from raw acoustic inputs called Categorical InfoWaveGAN that combines Deep Convolutional GAN architecture for audio data (WaveGAN) with categorical variables in information theoretic proposal InfoGAN.

Based on WaveGAN (Donahue et al. 2019) (https://github.com/chrisdonahue/wavegan) and WGAN-GP implementation of InfoGAN by Sergey Rodionov (https://github.com/singnet/semantic-vision/blob/master/experiments/concept_learning/gans/info-wgan-gp/10_originfo_sepQ_v2_lr1e-3/train.py).

In addition to the Generator and the Discriminator networks, the architecture introduces a network that learns to classify generated outputs and forces the Generator to encode lexical information in its latent space. Lexical and semantic encoding is represented with a set of categorical binary variables. The network is trained on five lexical items from TIMIT. The network learns to generate lexical items and encodes the identity of each item in categorical variables of the latent space. By manipulating the categorical variables in the latent space that encode lexical information, the network outputs the five lexical items, suggesting that each lexical item is represented with unique categorical code. Such representation can serve as the basis for lexical and semantic learning from raw acoustic input.

After 19,244 steps trained on oily, water, rag, suit and year from TIMIT, the network learns to output lexical items based on latent code. The following generated outputs are generated with the following values of c:

[1, 0, 0, 0, 0]: suit
[0, 1, 0, 0, 0]: year
[0, 0, 1, 0, 0]: water
[0, 0, 0, 1, 0]: oily
[0, 0, 0, 0, 1]: rag

Audio sample 1 Audio sample 2

To change number of categorical latent variables:

--num_categ n

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
static		static
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
backup.py		backup.py
check.py		check.py
cinfowavegan.py		cinfowavegan.py
loader.py		loader.py
owens.sh		owens.sh
train_ciwgan.py		train_ciwgan.py
train_fiwgan.py		train_fiwgan.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fiwGAN (Featural InfoWaveGAN): Lexical Learning in Generative Adversarial Phonology

ciwGAN (Categorical InfoWaveGAN)

About

Releases

Packages

Languages

License

DeliJingyiC/fiwGAN-ciwGAN

Folders and files

Latest commit

History

Repository files navigation

fiwGAN (Featural InfoWaveGAN): Lexical Learning in Generative Adversarial Phonology

ciwGAN (Categorical InfoWaveGAN)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages