Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

ShileiCao · 2024-11-11T13:26:11Z

Thank you for the impressive work on Auroral! I am currently using Aurora to fine-tune a new variable. After adding a surface variable, I’m experiencing memory overflow on an A800 GPU with 80GB memory during the backpropagation step. Could you please advise on how to resolve this issue?

Would using the Low Rank Adaptation (LoRA) method mentioned in the paper be a recommended approach? Specifically, does this mean freezing the other parameters in the backbone and only fine-tuning the encoder, LoRA layers, and decoder?

Thank you very much for your assistance!

wesselb · 2024-12-03T09:20:24Z

Hey @ShileiCao! Thank you for your kind words. :)

Have you taken a look at this page from the documentation? It outlines how to configure activation checkpointing, which is necessary to keep memory usage in control. With that, you should be able to fine-tune to a new variable.

LoRA will unfortunately not make much of a difference in terms of memory usage, so I don't think that would help much in this case. For Aurora, LoRA is mainly used to reduce overfitting when roll-out fine-tuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

ShileiCao commented Nov 11, 2024

wesselb commented Dec 3, 2024

Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

Memory Overflow Issue During Fine-Tuning with New Surface Variable #51

Comments

ShileiCao commented Nov 11, 2024

wesselb commented Dec 3, 2024