You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I’d like to commend you all on this fantastic project—it's truly impressive. I have a few questions and would appreciate any guidance:
Could you provide some details regarding the computational cost of training? Specifically, how much data was used, what type of GPUs were utilized, and how long the training process took?
When following the Accelerate Configuration Example, I encountered an issue when training on 2 H100 setup. The error message I received was: RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16.
To resolve this, I had to modify the line dit.to(accelerator.device) (line 108 in train_flux_deepspeed_controlnet.py) to dit.to(accelerator.device, dtype=weight_dtype), after which training proceeded normally. I'm not entirely sure what caused this discrepancy—any insight into the root of the issue?
I'm training ControlNet on a small dataset of around 3,500 images. Throughout training, the loss seems to remain within the range of 0.5-0.6 after 10k steps. Is this behavior typical, or should I be concerned that something might be off?
I really appreciate any help or advice you can offer. Thanks again for the amazing work you're doing!
The text was updated successfully, but these errors were encountered:
Is there any new progress? i trained pose controlnet with 50000 images,but when inference, even i set strength to 1,The image does not have any guided effect.Anyone can help me?
Hi, I’d like to commend you all on this fantastic project—it's truly impressive. I have a few questions and would appreciate any guidance:
Could you provide some details regarding the computational cost of training? Specifically, how much data was used, what type of GPUs were utilized, and how long the training process took?
When following the Accelerate Configuration Example, I encountered an issue when training on 2 H100 setup. The error message I received was:
RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16.
To resolve this, I had to modify the line
dit.to(accelerator.device)
(line 108 intrain_flux_deepspeed_controlnet.py
) todit.to(accelerator.device, dtype=weight_dtype)
, after which training proceeded normally. I'm not entirely sure what caused this discrepancy—any insight into the root of the issue?I'm training ControlNet on a small dataset of around 3,500 images. Throughout training, the loss seems to remain within the range of 0.5-0.6 after 10k steps. Is this behavior typical, or should I be concerned that something might be off?
I really appreciate any help or advice you can offer. Thanks again for the amazing work you're doing!
The text was updated successfully, but these errors were encountered: