Skip to content

Commit

Permalink
Merge branch 'main' into fix-fast-tests-sayak
Browse files Browse the repository at this point in the history
  • Loading branch information
sayakpaul authored Dec 23, 2024
2 parents a2211b2 + ea1ba0b commit 490f0d1
Show file tree
Hide file tree
Showing 153 changed files with 15,730 additions and 683 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/nightly_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,8 @@ jobs:
config:
- backend: "bitsandbytes"
test_location: "bnb"
- backend: "gguf"
test_location: "gguf"
runs-on:
group: aws-g6e-xlarge-plus
container:
Expand Down
3 changes: 2 additions & 1 deletion .github/workflows/push_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,8 @@ jobs:
group: gcp-ct5lp-hightpu-8t
container:
image: diffusers/diffusers-flax-tpu
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache defaults:
options: --shm-size "16gb" --ipc host --privileged ${{ vars.V5_LITEPOD_8_ENV}} -v /mnt/hf_cache:/mnt/hf_cache
defaults:
run:
shell: bash
steps:
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/push_tests_mps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ jobs:
shell: arch -arch arm64 bash {0}
run: |
${CONDA_RUN} python -m pip install --upgrade pip uv
${CONDA_RUN} python -m uv pip install -e [quality,test]
${CONDA_RUN} python -m uv pip install -e ".[quality,test]"
${CONDA_RUN} python -m uv pip install torch torchvision torchaudio
${CONDA_RUN} python -m uv pip install accelerate@git+https://github.com/huggingface/accelerate.git
${CONDA_RUN} python -m uv pip install transformers --upgrade
Expand Down
16 changes: 15 additions & 1 deletion docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,10 @@
title: Getting Started
- local: quantization/bitsandbytes
title: bitsandbytes
- local: quantization/gguf
title: gguf
- local: quantization/torchao
title: torchao
title: Quantization Methods
- sections:
- local: optimization/fp16
Expand Down Expand Up @@ -234,6 +238,8 @@
title: Textual Inversion
- local: api/loaders/unet
title: UNet
- local: api/loaders/transformer_sd3
title: SD3Transformer2D
- local: api/loaders/peft
title: PEFT
title: Loaders
Expand Down Expand Up @@ -270,6 +276,8 @@
title: FluxTransformer2DModel
- local: api/models/hunyuan_transformer2d
title: HunyuanDiT2DModel
- local: api/models/hunyuan_video_transformer_3d
title: HunyuanVideoTransformer3DModel
- local: api/models/latte_transformer3d
title: LatteTransformer3DModel
- local: api/models/lumina_nextdit2d
Expand Down Expand Up @@ -316,6 +324,8 @@
title: AutoencoderKLAllegro
- local: api/models/autoencoderkl_cogvideox
title: AutoencoderKLCogVideoX
- local: api/models/autoencoder_kl_hunyuan_video
title: AutoencoderKLHunyuanVideo
- local: api/models/autoencoderkl_ltx_video
title: AutoencoderKLLTXVideo
- local: api/models/autoencoderkl_mochi
Expand Down Expand Up @@ -392,8 +402,12 @@
title: DiT
- local: api/pipelines/flux
title: Flux
- local: api/pipelines/control_flux_inpaint
title: FluxControlInpaint
- local: api/pipelines/hunyuandit
title: Hunyuan-DiT
- local: api/pipelines/hunyuan_video
title: HunyuanVideo
- local: api/pipelines/i2vgenxl
title: I2VGen-XL
- local: api/pipelines/pix2pix
Expand All @@ -415,7 +429,7 @@
- local: api/pipelines/ledits_pp
title: LEDITS++
- local: api/pipelines/ltx_video
title: LTX
title: LTXVideo
- local: api/pipelines/lumina
title: Lumina-T2X
- local: api/pipelines/marigold
Expand Down
117 changes: 106 additions & 11 deletions docs/source/en/api/attnprocessor.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,135 @@ specific language governing permissions and limitations under the License.
An attention processor is a class for applying different types of attention mechanisms.

## AttnProcessor

[[autodoc]] models.attention_processor.AttnProcessor

## AttnProcessor2_0
[[autodoc]] models.attention_processor.AttnProcessor2_0

## AttnAddedKVProcessor
[[autodoc]] models.attention_processor.AttnAddedKVProcessor

## AttnAddedKVProcessor2_0
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0

[[autodoc]] models.attention_processor.AttnProcessorNPU

[[autodoc]] models.attention_processor.FusedAttnProcessor2_0

## Allegro

[[autodoc]] models.attention_processor.AllegroAttnProcessor2_0

## AuraFlow

[[autodoc]] models.attention_processor.AuraFlowAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedAuraFlowAttnProcessor2_0

## CogVideoX

[[autodoc]] models.attention_processor.CogVideoXAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedCogVideoXAttnProcessor2_0

## CrossFrameAttnProcessor

[[autodoc]] pipelines.text_to_video_synthesis.pipeline_text_to_video_zero.CrossFrameAttnProcessor

## CustomDiffusionAttnProcessor
## Custom Diffusion

[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor

## CustomDiffusionAttnProcessor2_0
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0

## CustomDiffusionXFormersAttnProcessor
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor

## FusedAttnProcessor2_0
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
## Flux

[[autodoc]] models.attention_processor.FluxAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedFluxAttnProcessor2_0

[[autodoc]] models.attention_processor.FluxSingleAttnProcessor2_0

## Hunyuan

[[autodoc]] models.attention_processor.HunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedHunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGHunyuanAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGHunyuanAttnProcessor2_0

## IdentitySelfAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGIdentitySelfAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGIdentitySelfAttnProcessor2_0

## IP-Adapter

[[autodoc]] models.attention_processor.IPAdapterAttnProcessor

[[autodoc]] models.attention_processor.IPAdapterAttnProcessor2_0

[[autodoc]] models.attention_processor.SD3IPAdapterJointAttnProcessor2_0

## JointAttnProcessor2_0

[[autodoc]] models.attention_processor.JointAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGJointAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGJointAttnProcessor2_0

[[autodoc]] models.attention_processor.FusedJointAttnProcessor2_0

## LoRA

[[autodoc]] models.attention_processor.LoRAAttnProcessor

[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0

[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor

[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor

## Lumina-T2X

[[autodoc]] models.attention_processor.LuminaAttnProcessor2_0

## Mochi

[[autodoc]] models.attention_processor.MochiAttnProcessor2_0

[[autodoc]] models.attention_processor.MochiVaeAttnProcessor2_0

## Sana

[[autodoc]] models.attention_processor.SanaLinearAttnProcessor2_0

[[autodoc]] models.attention_processor.SanaMultiscaleAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGCFGSanaLinearAttnProcessor2_0

[[autodoc]] models.attention_processor.PAGIdentitySanaLinearAttnProcessor2_0

## Stable Audio

[[autodoc]] models.attention_processor.StableAudioAttnProcessor2_0

## SlicedAttnProcessor

[[autodoc]] models.attention_processor.SlicedAttnProcessor

## SlicedAttnAddedKVProcessor
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor

## XFormersAttnProcessor

[[autodoc]] models.attention_processor.XFormersAttnProcessor

## AttnProcessorNPU
[[autodoc]] models.attention_processor.AttnProcessorNPU
[[autodoc]] models.attention_processor.XFormersAttnAddedKVProcessor

## XLAFlashAttnProcessor2_0

[[autodoc]] models.attention_processor.XLAFlashAttnProcessor2_0
6 changes: 6 additions & 0 deletions docs/source/en/api/loaders/ip_adapter.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,12 @@ Learn how to load an IP-Adapter checkpoint and image in the IP-Adapter [loading]

[[autodoc]] loaders.ip_adapter.IPAdapterMixin

## SD3IPAdapterMixin

[[autodoc]] loaders.ip_adapter.SD3IPAdapterMixin
- all
- is_ip_adapter_active

## IPAdapterMaskProcessor

[[autodoc]] image_processor.IPAdapterMaskProcessor
15 changes: 15 additions & 0 deletions docs/source/en/api/loaders/lora.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
- [`StableDiffusionLoraLoaderMixin`] provides functions for loading and unloading, fusing and unfusing, enabling and disabling, and more functions for managing LoRA weights. This class can be used with any model.
- [`StableDiffusionXLLoraLoaderMixin`] is a [Stable Diffusion (SDXL)](../../api/pipelines/stable_diffusion/stable_diffusion_xl) version of the [`StableDiffusionLoraLoaderMixin`] class for loading and saving LoRA weights. It can only be used with the SDXL model.
- [`SD3LoraLoaderMixin`] provides similar functions for [Stable Diffusion 3](https://huggingface.co/blog/sd3).
- [`FluxLoraLoaderMixin`] provides similar functions for [Flux](https://huggingface.co/docs/diffusers/main/en/api/pipelines/flux).
- [`CogVideoXLoraLoaderMixin`] provides similar functions for [CogVideoX](https://huggingface.co/docs/diffusers/main/en/api/pipelines/cogvideox).
- [`Mochi1LoraLoaderMixin`] provides similar functions for [Mochi](https://huggingface.co/docs/diffusers/main/en/api/pipelines/mochi).
- [`AmusedLoraLoaderMixin`] is for the [`AmusedPipeline`].
- [`LoraBaseMixin`] provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.

Expand All @@ -38,6 +41,18 @@ To learn more about how to load LoRA weights, see the [LoRA](../../using-diffuse

[[autodoc]] loaders.lora_pipeline.SD3LoraLoaderMixin

## FluxLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.FluxLoraLoaderMixin

## CogVideoXLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.CogVideoXLoraLoaderMixin

## Mochi1LoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.Mochi1LoraLoaderMixin

## AmusedLoraLoaderMixin

[[autodoc]] loaders.lora_pipeline.AmusedLoraLoaderMixin
Expand Down
29 changes: 29 additions & 0 deletions docs/source/en/api/loaders/transformer_sd3.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# SD3Transformer2D

This class is useful when *only* loading weights into a [`SD3Transformer2DModel`]. If you need to load weights into the text encoder or a text encoder and SD3Transformer2DModel, check [`SD3LoraLoaderMixin`](lora#diffusers.loaders.SD3LoraLoaderMixin) class instead.

The [`SD3Transformer2DLoadersMixin`] class currently only loads IP-Adapter weights, but will be used in the future to save weights and load LoRAs.

<Tip>

To learn more about how to load LoRA weights, see the [LoRA](../../using-diffusers/loading_adapters#lora) loading guide.

</Tip>

## SD3Transformer2DLoadersMixin

[[autodoc]] loaders.transformer_sd3.SD3Transformer2DLoadersMixin
- all
- _load_ip_adapter_weights
2 changes: 2 additions & 0 deletions docs/source/en/api/models/autoencoder_dc.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ The following DCAE models are released and supported in Diffusers.
| [`mit-han-lab/dc-ae-f128c512-in-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-in-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-in-1.0)
| [`mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0-diffusers) | [`mit-han-lab/dc-ae-f128c512-mix-1.0`](https://huggingface.co/mit-han-lab/dc-ae-f128c512-mix-1.0)

This model was contributed by [lawrence-cj](https://github.com/lawrence-cj).

Load a model in Diffusers format with [`~ModelMixin.from_pretrained`].

```python
Expand Down
32 changes: 32 additions & 0 deletions docs/source/en/api/models/autoencoder_kl_hunyuan_video.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# AutoencoderKLHunyuanVideo

The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanVideo](https://github.com/Tencent/HunyuanVideo/), which was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.

The model can be loaded with the following code snippet.

```python
from diffusers import AutoencoderKLHunyuanVideo

vae = AutoencoderKLHunyuanVideo.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="vae", torch_dtype=torch.float16)
```

## AutoencoderKLHunyuanVideo

[[autodoc]] AutoencoderKLHunyuanVideo
- decode
- all

## DecoderOutput

[[autodoc]] models.autoencoders.vae.DecoderOutput
2 changes: 1 addition & 1 deletion docs/source/en/api/models/autoencoderkl_ltx_video.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import AutoencoderKLLTXVideo

vae = AutoencoderKLLTXVideo.from_pretrained("TODO/TODO", subfolder="vae", torch_dtype=torch.float32).to("cuda")
vae = AutoencoderKLLTXVideo.from_pretrained("Lightricks/LTX-Video", subfolder="vae", torch_dtype=torch.float32).to("cuda")
```

## AutoencoderKLLTXVideo
Expand Down
30 changes: 30 additions & 0 deletions docs/source/en/api/models/hunyuan_video_transformer_3d.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License. -->

# HunyuanVideoTransformer3DModel

A Diffusion Transformer model for 3D video-like data was introduced in [HunyuanVideo: A Systematic Framework For Large Video Generative Models](https://huggingface.co/papers/2412.03603) by Tencent.

The model can be loaded with the following code snippet.

```python
from diffusers import HunyuanVideoTransformer3DModel

transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
```

## HunyuanVideoTransformer3DModel

[[autodoc]] HunyuanVideoTransformer3DModel

## Transformer2DModelOutput

[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
2 changes: 1 addition & 1 deletion docs/source/en/api/models/ltx_video_transformer3d.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The model can be loaded with the following code snippet.
```python
from diffusers import LTXVideoTransformer3DModel

transformer = LTXVideoTransformer3DModel.from_pretrained("TODO/TODO", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
transformer = LTXVideoTransformer3DModel.from_pretrained("Lightricks/LTX-Video", subfolder="transformer", torch_dtype=torch.bfloat16).to("cuda")
```

## LTXVideoTransformer3DModel
Expand Down
Loading

0 comments on commit 490f0d1

Please sign in to comment.