A clean and scalable template to kickstart your deep learning project πβ‘π₯
Click on Use this template
button above to initialize new repository.
This template tries to be as generic as possible. You should be able to easily modify behavior in train.py in case you need some unconventional configuration wiring.
This is work in progress. I'm currently figuring out the best workflow for scalable experimentation process. Suggestions are always welcome!
- PyTorch Lightning provides great abstractions for well structured ML code and advanced features like checkpointing, gradient accumulation, distributed training, etc.
- Hydra provides convenient way to manage experiment configurations and advanced features like overriding any config parameter from command line, scheduling execution of many runs, etc.
- Predefined Structure: clean and scalable so that work can easily be extended and replicated (see #Project Structure)
- Rapid Experimentation: thanks to automating pipeline with config files and hydra command line superpowers
- Little Boilerplate: so pipeline can be easily modified (see train.py)
- Main Configuration: main config file specifies default training configuration (see #Main Project Configuration)
- Experiment Configurations: stored in a separate folder, they can be composed out of smaller configs, override chosen parameters or define everything from scratch (see #Experiment Configuration)
- Experiment Tracking: many logging frameworks can be easily integrated! (see #Experiment Tracking)
- Logs: all logs (checkpoints, data from loggers, chosen hparams, etc.) are stored in a convenient folder structure imposed by Hydra (see #Logs)
- Smoke Tests: simple bash scripts running 1-2 epoch experiments to check if your model doesn't crash under different conditions (see tests)
- Hyperparameter Search: made easier with Hydra built in plugins like Optuna Sweeper
- Workflow: comes down to 4 simple steps (see #Workflow)
- Warning: this template currently uses development version of hydra which might be unstable (we wait until Hydra 1.1 is released).
- Inspired by: PyTorchLightning/deep-learninig-project-template, drivendata/cookiecutter-data-science, tchaton/lightning-hydra-seed, Erlemar/pytorch_tempest, ryul99/pytorch-project-template.
- To learn how to configure PyTorch with Hydra take a look at this detailed MNIST tutorial.
- Repositories useful for configuring PyTorch and PyTorch Lightning classes with Hydra: romesco/hydra-lightning, pytorch/hydra-torch.
- Suggestions are always welcome!
The directory structure of new project looks like this:
βββ configs <- Hydra configuration files
β βββ trainer <- Configurations of Lightning trainers
β βββ model <- Configurations of Lightning models
β βββ datamodule <- Configurations of Lightning datamodules
β βββ callbacks <- Configurations of Lightning callbacks
β βββ logger <- Configurations of Lightning loggers
β βββ experiment <- Configurations of experiments
β β
β βββ config.yaml <- Main project configuration file
β βββ config_optuna.yaml <- Configuration of Optuna hyperparameter search
β
βββ data <- Project data
β
βββ logs <- Logs generated by Hydra and PyTorch Lightning loggers
β
βββ notebooks <- Jupyter notebooks
β
βββ tests <- Tests of any kind
β βββ quick_tests.sh <- A couple of quick experiments to test if your model
β β doesn't crash under different training conditions
β βββ ...
β
βββ src
β βββ architectures <- PyTorch model architectures
β βββ callbacks <- PyTorch Lightning callbacks
β βββ datamodules <- PyTorch Lightning datamodules
β βββ datasets <- PyTorch datasets
β βββ models <- PyTorch Lightning models
β βββ transforms <- Data transformations
β βββ utils <- Utility scripts
β βββ inference_example.py <- Example of inference with trained model
β βββ template_utils.py <- Some extra template utilities
β
βββ train.py <- Train model with chosen experiment configuration
β
βββ .gitignore
βββ LICENSE
βββ README.md
βββ conda_env_gpu.yaml <- File for installing conda env for GPU
βββ conda_env_cpu.yaml <- File for installing conda env for CPU
βββ requirements.txt <- File for installing python dependencies
βββ setup.py <- File for installing project as a package
- Hydra superpowers
- Override any config parameter from command line
- Easily switch between different loggers, callbacks sets, optimizers, etc. from command line
- Sweep over hyperparameters from command line
- Automatic logging of run history
- Sweeper integrations for Optuna, Ray and others
- Optional callbacks for Weigths&Biases (wandb_callbacks.py)
- To support reproducibility:
- UploadCodeToWandbAsArtifact
- UploadCheckpointsToWandbAsArtifact
- WatchModelWithWandb
- To provide examples of logging custom visualisations and metrics with callbacks:
- LogBestMetricScoresToWandb
- LogF1PrecisionRecallHeatmapToWandb
- LogConfusionMatrixToWandb
- To support reproducibility:
Validating correctness of config with Hydra schemas(TODO)- Method to pretty print configuration composed by Hydra at the start of the run, using Rich library (template_utils.py)
- Method to log chosen parts of Hydra config to all loggers (template_utils.py)
- Example of hyperparameter search with Optuna sweeps (config_optuna.yaml)
Example of hyperparameter search with Weights&Biases sweeps(TODO)- Examples of simple bash scripts to check if your model doesn't crash under different training conditions (tests/)
- Example of inference with trained model (inference_example.py)
- Built in requirements (requirements.txt)
- Built in conda environment initialization (conda_env_gpu.yaml, conda_env_cpu.yaml)
- Built in python package setup (setup.py)
- Example with MNIST classification (mnist_model.py, mnist_datamodule.py)
Location: configs/config.yaml
Main project config contains default training configuration.
It determines how config is composed when simply executing command: python train.py
# to execute run with default training configuration simply run:
# python train.py
# specify here default training configuration
defaults:
- trainer: default_trainer.yaml
- model: mnist_model.yaml
- datamodule: mnist_datamodule.yaml
- callbacks: default_callbacks.yaml # set this to null if you don't want to use callbacks
- logger: null # set logger here or use command line (e.g. `python train.py logger=wandb`)
# path to original working directory (that `train.py` was executed from in command line)
# hydra hijacks working directory by changing it to the current log directory,
# so it's useful to have path to original working directory as a special variable
# read more here: https://hydra.cc/docs/next/tutorials/basic/running_your_app/working_directory
work_dir: ${hydra:runtime.cwd}
# path to folder with data
data_dir: ${work_dir}/data/
# pretty print config at the start of the run using Rich library
print_config: True
# output paths for hydra logs
hydra:
run:
dir: logs/runs/${now:%Y-%m-%d}/${now:%H-%M-%S}
sweep:
dir: logs/multiruns/${now:%Y-%m-%d_%H-%M-%S}
subdir: ${hydra.job.num}
Location: configs/experiment
You can store many experiment configurations in this folder.
Example experiment configuration:
# to execute this experiment run:
# python train.py +experiment=exp_example_simple
defaults:
- override /trainer: default_trainer.yaml
- override /model: mnist_model.yaml
- override /datamodule: mnist_datamodule.yaml
- override /callbacks: default_callbacks.yaml
- override /logger: null
# all parameters below will be merged with parameters from default configurations set above
# this allows you to overwrite only specified parameters
seed: 12345
trainer:
max_epochs: 10
gradient_clip_val: 0.5
model:
lr: 0.001
lin1_size: 128
lin2_size: 256
lin3_size: 64
datamodule:
batch_size: 64
train_val_test_split: [55_000, 5_000, 10_000]
More advanced experiment configuration:
# to execute this experiment run:
# python train.py +experiment=exp_example_full
defaults:
- override /trainer: null
- override /model: null
- override /datamodule: null
- override /callbacks: null
- override /logger: null
# we override default configurations with nulls to prevent them from loading at all
# instead we define all modules and their paths directly in this config,
# so everything is stored in one place for more readibility
seed: 12345
trainer:
_target_: pytorch_lightning.Trainer
gpus: 0
min_epochs: 1
max_epochs: 10
gradient_clip_val: 0.5
model:
_target_: src.models.mnist_model.LitModelMNIST
optimizer: adam
lr: 0.001
weight_decay: 0.00005
architecture: SimpleDenseNet
input_size: 784
lin1_size: 256
dropout1: 0.30
lin2_size: 256
dropout2: 0.25
lin3_size: 128
dropout3: 0.20
output_size: 10
datamodule:
_target_: src.datamodules.mnist_datamodule.MNISTDataModule
data_dir: ${data_dir}
batch_size: 64
train_val_test_split: [55_000, 5_000, 10_000]
num_workers: 0
pin_memory: False
logger:
wandb:
tags: ["best_model", "uwu"]
notes: "Description of this model."
- Write your PyTorch Lightning model (see mnist_model.py for example)
- Write your PyTorch Lightning datamodule (see mnist_datamodule.py for example)
- Write your experiment config, containing paths to your model and datamodule (see project/configs/experiment for examples)
- Run training with chosen experiment config:
python train.py +experiment=experiment_name.yaml
Hydra creates new working directory for every executed run.
By default, logs have the following structure:
β
βββ logs
β βββ runs # Folder for logs generated from single runs
β β βββ 2021-02-15 # Date of executing run
β β β βββ 16-50-49 # Hour of executing run
β β β β βββ .hydra # Hydra logs
β β β β βββ wandb # Weights&Biases logs
β β β β βββ checkpoints # Training checkpoints
β β β β βββ ... # Any other thing saved during training
β β β βββ ...
β β β βββ ...
β β βββ ...
β β βββ ...
β β
β βββ multiruns # Folder for logs generated from multiruns (sweeps)
β βββ 2021-02-15_16-50-49 # Date and hour of executing sweep
β β βββ 0 # Job number
β β β βββ .hydra # Hydra logs
β β β βββ wandb # Weights&Biases logs
β β β βββ checkpoints # Training checkpoints
β β β βββ ... # Any other thing saved during training
β β βββ 1
β β βββ 2
β β βββ ...
β βββ ...
β βββ ...
β
You can change this structure by modifying paths in config.yaml.
PyTorch Lightning provides built in loggers for Weights&Biases, Neptune, Comet, MLFlow, Tensorboard and CSV. To use one of them, simply add its config to configs/logger and run:
python train.py logger=logger_config.yaml
You can use many of them at once (see configs/logger/many_loggers.yaml for example).
(TODO)
(TODO)
(TODO)
What it does
First, install dependencies:
# clone project
git clone https://github.com/YourGithubName/your-repo-name
cd your-repo-name
# optionally create conda environment
conda env create -f conda_env_gpu.yaml -n your_env_name
conda activate your_env_name
# install requirements
pip install -r requirements.txt
Next, you can train model with default configuration without logging:
python train.py
Or you can train model with chosen logger like Weights&Biases:
# set project and entity names in `project/configs/logger/wandb.yaml`
wandb:
project: "your_project_name"
entity: "your_wandb_team_name"
# train model with Weights&Biases
python train.py logger=wandb
Or you can train model with chosen experiment config:
# experiment configurations are placed in folder `configs/experiment/`
python train.py +experiment=exp_example_simple
To execute all experiments from folder run:
# execute all experiments from folder `configs/experiment/`
python train.py -m '+experiment=glob(*)'
You can override any parameter from command line like this:
python train.py trainer.max_epochs=20 model.lr=0.0005
To train on GPU:
python train.py trainer.gpus=1
Attach some callback set to run:
# callback sets configurations are placed in `configs/callbacks/`
python train.py callbacks=default_callbacks
Combaining it all:
python train.py -m '+experiment=glob(*)' trainer.max_epochs=10 logger=wandb
To create a sweep over some hyperparameters run:
# this will run 6 experiments one after the other,
# each with different combination of batch_size and learning rate
python train.py -m datamodule.batch_size=32,64,128 model.lr=0.001,0.0005
To sweep with Optuna:
# this will run hyperparameter search defined in `configs/config_optuna.yaml`
python train.py -m --config-name config_optuna.yaml +experiment=exp_example_simple
Resume from checkpoint:
# checkpoint can be either path or URL
# path should be either absolute or prefixed with `${work_dir}/`
# use quotes '' around argument or otherwise $ symbol breaks it
python train.py '+trainer.resume_from_checkpoint=${work_dir}/logs/runs/2021-02-28/16-50-49/checkpoints/last.ckpt'
Optionally you can install project as a package with setup.py:
# install from local files
pip install -e .
# or install from git repo
pip install git+git://github.com/YourGithubName/your-repo-name.git --upgrade
So you can easily import any file into any other file like so:
from src.models.mnist_model import LitModelMNIST
from src.datamodules.mnist_datamodule import MNISTDataModule