Automatic Code Generation for Kernel Fusion

In this project, an inference framework using Tensor Cores and Code Generation is developed to show how kernel can be fused under the circumstance of a fully connected network with flexible hidden channels and arbitrary activation function. The fused kernel is not precompiled but code generated according to the PyTorch model provided by the user. Some experiments are done to analyze how GEMM patterns, memory usage and data flowing can affect the performance.

Thesis Information

Title: Automatic Code Generation for Kernel Fusion
Authors: Shi, Da
Supervisor: Weiss, Sebastian
Technical University of Munich

Environment

Required environment:

NVIDIA GPU with RTX, e.g. RTX20xx or RTX30xx (we use an RTX2080 Ti)
CUDA 11
Python 3.8 or higher, see environment.txt for the required packages

Tested systems:

Ubuntu 20.04, gcc 9.4.0, CUDA 11.5, Python 3.8, PyTorch 1.9.1

Source Codes:

conda create -n py38torch19 python=3.8
conda activate py38torch19
git clone --recursive https://github.com/DaShi-Git/masterThesis.git
cd masterThesis
pip install -r environment.txt

If torch==1.9.1+cu111 could not be found, torch==1.9.1 is here alternative.

Installation

Source Codes:

cd project
# go to the repository of masterThesis/project
python setup.py install

This step compiles the binding function between python and CUDA launch instruction, but the concrete kernel function is not compiled here, since the fused kernel is implemented in .cuh file, it will be compiled after the model structure is known.

A new package called matmul-cuda will be installed to the current conda environment.

After installation, user can call functions in the new package matmul-cuda by importing this package in a python file.

Application

Running the function matmul_cuda.evaluate_flexible_MLP(*params) and providing the corresponding parameters can infer the provided model and input batches. An user interface is designed to enable the model trained with PyTorch framework to make faster inference on this project.

Project Structure

designModel/train_model.py is for designing a PyTorch fully-connected model, and saving the structure and parameters in repository models. The model checkpoint is large, so it is not included in the github origin repository.

An interface can load the model and feed it to the inference framework, see the experiments.

To performe the evaluation, see experiments in the repository experiments.

The binding function is binding_flexible_MLP.cpp. The kernel MLPFlexible_shuffle.cuh is for data shuffling between fragments. The kernels MLPFlexible_32batches.cuh and MLPFlexible_32batches.cuh are for different batch sizes with shared memory.

After the model structure is defined, a new CUDA kernel will be generated by calling matmul_cuda.evaluate_flexible_MLP(*params), the new kernel is stored in project/kernel_cache.

Evaluating a model

Source Codes:

python experiments/evaluation/evaluation_flexible_MLP6_flex_hidden_channel.py

It reports the kernel run time, correctness and the activition function designed by user. Notice that if the input sample is too much, the correctness check will take a long time. Fewer input samples are preferred with correctness check being performed.

Different kernels can be evaluated by copying their implementations to project/MLPFlexible.cuh. For example the implementation of data shuffling kernel is in project/MLPFlexible_shuffle.cuh, after copying it to project/MLPFlexible.cuh the reslut can be evaluated by runing experiments/evaluation/evaluation_flexible_MLP6_flex_hidden_channel.py.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
extensionCpp		extensionCpp
project		project
README.md		README.md
environment.txt		environment.txt
gitpush.sh		gitpush.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automatic Code Generation for Kernel Fusion

Thesis Information

Environment

Installation

Application

Project Structure

Evaluating a model

About

Releases

Packages

Languages

DaShi-Git/masterThesis

Folders and files

Latest commit

History

Repository files navigation

Automatic Code Generation for Kernel Fusion

Thesis Information

Environment

Installation

Application

Project Structure

Evaluating a model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages