Skip to content

This material contains content on how to profile and optimize simple Pytorch mnist code using NVIDIA Nsight Systems and Pytorch Profiler

License

Notifications You must be signed in to change notification settings

openhackathons-org/AI-Profiler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimizing a Deep Neural Network (DNN) training program

This folder contains contents for AI training program profiling.

  • NVIDIA Nsight Systems
  • PyTorch Profiler with TensorBoard Plugin
  • TensorBoard Visualization
  • Optimization Techniques

Prerequisites

To run this tutorial you will need a machine with NVIDIA GPU.

  • Install the latest Docker or Singularity.
  • To be able to see the profiler output, please download NVIDIA Nsight Systems' latest version from here.
  • Linux ubuntu OS

Running on containers

To start with, you will have to build a Docker or Singularity container.

Docker Container

To build a docker container, run: sudo docker build --network=host -t <imagename>:<tagnumber> .

For instance: sudo docker build -t pytorch:1.0 .

The code labs have been written using Jupyter notebooks and a Dockerfile has been built to simplify deployment. In order to serve the docker instance for a student, it is necessary to expose port 8888 from the container, for instance, the following command would expose port 8888 inside the container as port 8888 on the lab machine:

sudo docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it --rm --network=host -v ~/ai_profiler/workspace:/workspace pytorch:1.0 jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace

The --gpus flag is used to enable all NVIDIA GPUs during container runtime. The --rm flag is used to clean an temporary images created during the running of the container. The -it flag enables killing the jupyter server with ctrl-c.

The --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 enable sufficient memory allocation to run pytorch within the docker environment.

The jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace command launch the jupyter notebook inside the container. The flag -v allows the mapping of working directory on your local machine ~/ai_profiler/profiler/workspace:/workspace to worspace directory inside the container.

Then, open the jupyter notebook in browser: http://localhost:8888 Start working on the lab by clicking on the start_here.ipynb notebook.

Singularity Container

To build the singularity container, run: sudo singularity build --fakeroot <image_name>.simg Singularity

Fore example: singularity build --fakeroot pytorch.simg Singularity

Then, run the container: singularity run --nv --bind ~/ai_profiler/workspace:/workspace pytorch.simg jupyter-lab --no-browser --allow-root --ip=0.0.0.0 --port=8888 --NotebookApp.token="" --notebook-dir=/workspace

The --nv flag is used to enable all NVIDIA GPUs during container runtime. The --bind allows the mapping of working directory on your local machine ~/ai_profiler/profiler/workspace:/workspace to worspace directory inside the container.

Then, open the jupyter notebook in browser: http://localhost:8888 Start working on the lab by clicking on the Start_Here.ipynb notebook.

Running on Local Machine

  • Install PyTorch here
  • Install essentials:
    pip3 install jupyterlab
    pip3 install ipywidgets
    pip3 install torch_tb_profiler
  • Install NVIDIA Nsight Systems version 2022.1.1 from here and set path. Please run nsys --version from the terminal to ensure you are using the version 2022.1.1 or above

Tutorial Duration

The total bootcamp material would take 2 hours.

About

This material contains content on how to profile and optimize simple Pytorch mnist code using NVIDIA Nsight Systems and Pytorch Profiler

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published