xDiT is HunyuanVideo's official parallel inference engine. On H100 and H20 GPUs, xDiT reduces the generation time of 1028x720 videos from 31 minutes to 5 minutes, and 960x960 videos from 28 minutes to 6 minutes.
The H100 and H20 performance benchmarks are done with the official HunyuanVideo repository. The L20 performance benchmarks are done with the diffusers
implementation.
The L20 performance benchmarks are measured using this script, along with flash-attn==2.7.2.post1
and CUDA 12.4.
GPU Type | 1 GPU | 2 GPUs | 4 GPUs | 8 GPUs |
---|---|---|---|---|
H100 | 1,904.08 | 925.04 | 514.08 | 337.58 |
H20 | 6,639.17 | 3,400.55 | 1,762.86 | 940.97 |
L20 | 6,043.88 | 3,271.44 | 2,080.05 |
GPU Type | 1 GPU | 2 GPUs | 3 GPUs | 6 GPUs |
---|---|---|---|---|
H100 | 1,735.01 | 934.09 | 645.45 | 367.02 |
H20 | 6,621.46 | 3,400.55 | 2,310.48 | 1,214.67 |
L20 | 6,039.08 | 3,260.62 | 2,284.74 |