Multi-GPU Training with PyTorch and TensorFlow

About

This workshop provides demostrations of multi-GPU training for PyTorch Distributed Data Parallel (DDP) and PyTorch Lightning. Multi-GPU training in TensorFlow is demonstrated using MirroredStrategy.

Setup

Make sure you can run Python on Adroit:

$ ssh <YourNetID>@adroit.princeton.edu  # VPN required if off-campus
$ git clone https://github.com/PrincetonUniversity/multi_gpu_training.git
$ cd multi_gpu_training
$ module load anaconda3/2021.11
(base) $ python --version
Python 3.9.7

Getting Help

If you encounter any difficulties with the material in this guide then please send an email to [email protected] or attend a help session.

Authorship

This guide was created by Jonathan Halverson and members of PICSciE and Research Computing.

Name		Name	Last commit message	Last commit date
Latest commit History 239 Commits
01_single_gpu		01_single_gpu
02_pytorch_ddp		02_pytorch_ddp
03_pytorch_lightning		03_pytorch_lightning
04_tensorflow		04_tensorflow
README.md		README.md
speedup_vs_gpus.png		speedup_vs_gpus.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-GPU Training with PyTorch and TensorFlow

About

Setup

Getting Help

Authorship

About

Releases

Packages

Languages

SenonETS/DDP_Slurm_PrincetonUniversity

Folders and files

Latest commit

History

Repository files navigation

Multi-GPU Training with PyTorch and TensorFlow

About

Setup

Getting Help

Authorship

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages