This project aims to explore the differences in feature aspects between DiT-based diffusion models and Unet-based diffusion models. We found that DiT-based diffusion models have consistent feature scales across different layers, while Unet models exhibit significant changes in feature scales and resolutions across different layers.
I'm working on accelerating dit training and reasoning at the feature and token compression level, so don't hesitate to contact me if you're interested in doing high-impact open source projects!
The project utilizes code from the following repositories:
If you use this project in your research, please cite the following:
@misc{guo2024dit,
author = {Qin Guo and Dongxu Yue},
title = {DiT-Visualization},
year = {2024},
howpublished = {\url{https://github.com/guoqincode/DiT-Visualization}},
note = {Exploring the differences between DiT-based and Unet-based diffusion models in feature aspects using code from diffusers, Plug-and-Play, and PixArt}
}