Smol Imagen

Unofficial implementation of Google's Imagen: "Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"

Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding

Setup

pip install git+https://github.com/jenkspt/smol-imagen.git

Usage

from smol_imagen import ImagenCLIP64

model = ImagenCLIP64()

Acknowledgements

Google TPU Research Cloud (TRC program) for providing TPUs for training.

Citations

@misc{https://doi.org/10.48550/arxiv.2205.11487,
  doi = {10.48550/ARXIV.2205.11487},
  
  url = {https://arxiv.org/abs/2205.11487},
  
  author = {Saharia, Chitwan and Chan, William and Saxena, Saurabh and Li, Lala and Whang, Jay and Denton, Emily and Ghasemipour, Seyed Kamyar Seyed and Ayan, Burcu Karagol and Mahdavi, S. Sara and Lopes, Rapha Gontijo and Salimans, Tim and Ho, Jonathan and Fleet, David J and Norouzi, Mohammad},
  
  keywords = {Computer Vision and Pattern Recognition (cs.CV), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Smol Imagen

Setup

Usage

Acknowledgements

Citations

Files

README.md

Latest commit

History

README.md

File metadata and controls

Smol Imagen

Setup

Usage

Acknowledgements

Citations