Diffusion model is a type of generative model. Its approach is different from GAN, VAE and Flow-based models. In my repository, I re-setup diffusion model from scratch to do some experiments:
- Diffusion Model: Training with simple loss
- Inference with DDPM and DDIM
- Using (label, image, text) as condition for diffusion model
- Latent diffusion: Image space to latent space with VAE
- Stable diffusion: Latent + Condition Diffusion
- Classifier-free guidance
- Sketch2Image: using condition as sketch image
- Medical Image Segmentation: using condition as medical image
https://github.com/huynhspm/Generative-Model
cd Generative-Model
conda create -n diffusion python=3.10
conda activate diffusion
pip install -r requirements.txt
set-up CUDA_VISIBLE_DEVICES and WANDB_API_KEY before training
export CUDA_VISIBLE_DEVICES=0
export WANDB_API_KEY=???
choose from available experiments in folder "configs/experiment" or create your experiment to suit your task.
# for generation task
python src/train.py experiment=generation/diffusion/train/mnist trainer.devices=1
# for reconstruction task
python src/train.py experiment=reconstruction/vq_vae/celeba trainer.devices=1
# for segmentation task
python src/train.py experiment=segmentation/condition_diffusion/train/lidc trainer.devices=1
set-up CUDA_VISIBLE_DEVICES and WANDB_API_KEY before evaluating
export CUDA_VISIBLE_DEVICES=0
export WANDB_API_KEY=???
choose from available experiments in folder "configs/experiment" or create your experiment to suit your task.
# for generation task
python src/eval.py experiment=generation/diffusion/eval/mnist trainer.devices=1
# for reconstruction task
...
# for segmentation task
python src/eval.py experiment=segmentation/condition_diffusion/eval/lidc trainer.devices=1
...
-
Generation task:
-
Segmentation task:
- Self Attention
- Cross Attention
- Spatial Transformer
- ResNet Block
- VGG Block
- DenseNet Block
- Inception Block
- Time
- Label: animal (dog, cat), number (0,1,...9), gender (male, female)
- Image: Sketch2Image, Segmentation
- Text: not implemented
- DDPM: Denoising Diffusion Probabilistic Models
- DDIM: Denoising Diffusion Implicit Models
- Unet: Encoder, Decoder
- Unconditional Diffusion Model
- Conditional diffusion model (label, image, text - need to implement text embedder model)
- Variational autoencoder: Vanilla (only work for reconstruction), VQ
- Latent diffusion model
- Stable diffusion model
- Classifier-free; not work
Dataset | Image-Size | FID (features=2048, ddim -> ddpm) | Config |
---|---|---|---|
Mnist | 32x32 | 2.65 -> 0.89 | Train, Eval |
Fashion-Mnist | 32x32 | 3.31 -> 2.42 | Train, Eval |
Cifar10 | 32x32 | 5.54 -> 3.58 | Train, Eval |
Dataset | Image-Size | FID (features=2048, ddim -> ddpm) | Config |
---|---|---|---|
Mnist | 32x32 | 3.91 -> 1.16 | Train, Eval |
Fashion-Mnist | 32x32 | 3.10 -> 2.15 | Train, Eval |
Cifar10 | 32x32 | 5.66 -> 3.37 | Train, Eval |
Gender | 64x64 | 3. | Train, Eval |
CelebA | 64x64 | 3. | Train, Eval |
- Sketch2Image (Sketch, Fake, Real)