Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

Open
ShileiCao opened this issue Nov 11, 2024 · 1 comment
Open

Comments

@ShileiCao
Copy link

Thank you for the impressive work on Auroral! I am currently using Aurora to fine-tune a new variable. After adding a surface variable, I’m experiencing memory overflow on an A800 GPU with 80GB memory during the backpropagation step. Could you please advise on how to resolve this issue?

Would using the Low Rank Adaptation (LoRA) method mentioned in the paper be a recommended approach? Specifically, does this mean freezing the other parameters in the backbone and only fine-tuning the encoder, LoRA layers, and decoder?

Thank you very much for your assistance!

@wesselb
Copy link
Contributor

wesselb commented Dec 3, 2024

Hey @ShileiCao! Thank you for your kind words. :)

Have you taken a look at this page from the documentation? It outlines how to configure activation checkpointing, which is necessary to keep memory usage in control. With that, you should be able to fine-tune to a new variable.

LoRA will unfortunately not make much of a difference in terms of memory usage, so I don't think that would help much in this case. For Aurora, LoRA is mainly used to reduce overfitting when roll-out fine-tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants