Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions Regarding Training Costs, Dtype Error on H100, and ControlNet Loss Behavior #111

Open
zyyyz opened this issue Sep 17, 2024 · 2 comments

Comments

@zyyyz
Copy link

zyyyz commented Sep 17, 2024

Hi, I’d like to commend you all on this fantastic project—it's truly impressive. I have a few questions and would appreciate any guidance:

  1. Could you provide some details regarding the computational cost of training? Specifically, how much data was used, what type of GPUs were utilized, and how long the training process took?

  2. When following the Accelerate Configuration Example, I encountered an issue when training on 2 H100 setup. The error message I received was:
    RuntimeError: mat1 and mat2 must have the same dtype, but got Half and BFloat16.
    To resolve this, I had to modify the line dit.to(accelerator.device) (line 108 in train_flux_deepspeed_controlnet.py) to dit.to(accelerator.device, dtype=weight_dtype), after which training proceeded normally. I'm not entirely sure what caused this discrepancy—any insight into the root of the issue?

  3. I'm training ControlNet on a small dataset of around 3,500 images. Throughout training, the loss seems to remain within the range of 0.5-0.6 after 10k steps. Is this behavior typical, or should I be concerned that something might be off?

I really appreciate any help or advice you can offer. Thanks again for the amazing work you're doing!

@bonlime
Copy link

bonlime commented Nov 6, 2024

@zyyyz have you been able to successfully train model using code from this repo?

@tianqyun111
Copy link

Is there any new progress? i trained pose controlnet with 50000 images,but when inference, even i set strength to 1,The image does not have any guided effect.Anyone can help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants