Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cant train jina clip v2 due to CUDA out of memory error #3113

Open
httplups opened this issue Dec 3, 2024 · 1 comment
Open

Cant train jina clip v2 due to CUDA out of memory error #3113

httplups opened this issue Dec 3, 2024 · 1 comment

Comments

@httplups
Copy link

httplups commented Dec 3, 2024

Hi guys.
I am trying to fine-tune the clip model: jina-clip-v2 using my own image-sentence pairs.

I have an iterable dataset.

I am running on Colab, with 40GB of GPU RAM.
The model is loaded and occupies only 2GB of RAM.

When I start the training with my iterable dataset, the GPU memory exceeds 40GB of RAM.

Things I tried:
I am using the truncated model with only 64d embeddings only.
FP16 precision
Batch size of 1
accumulate gradient of 4

Even though I am still cannot train.

Anyone can help?

Here is the link of the jupyter notebook on Colab.

https://colab.research.google.com/drive/1sBhTSNSsZtTOli89t4HV4-T5kDkMfegx?usp=sharing

@tomaarsen
Copy link
Collaborator

Hello!

Hmm, that is not great. I've noticed that the per_device_train_batch_size to Training Arguments is commented out, so it's not actually training with a batch size of 1 (but I assume that you did try and train with that batch size before, and just commented things away once it didn't work).

Also - the truncate_dim is just post-processing, so it still runs the base model in the normal hidden size, probably 1024.

One option is to load the model itself in fp16 immediately:

model = SentenceTransformer(model_name, device='cuda', trust_remote_code=True, model_kwargs={"torch_dtype": torch.float16, "device_map": "auto"})

Beyond that, your script looks totally normal. I'm a bit surprised at the very high memory usage.


I did some more digging:

  1. jina-clip seems to be automatically loaded in bf16, whereas we train in fp16. This should have resulted in an immediate crash.
  2. Flash Attention seems to be required, but you don't have triton installed which is also required for flash-attn (granted, maybe you installed it in a cell and then removed it), so you should have gotten an error there too.

Another common issue with notebooks is that old variables can be kept in memory even after rerunning a cell.

I'm struggling to get this to train well with fp16, lots of complaints about Half precision, or Half and Float precision not matching, etc.

Having said that, I'm able to train with full precision by casting the model to fp32 and then setting both fp16=False and bf16=False, with 1 batch size, on a 15GB T4:
image
image

If you can figure out how to train in fp16/bf16, then the memory usage would be much lower. My guess is that there was some issue with variables persisting.
Here's my slightly updated copy: https://colab.research.google.com/drive/1g-HsvDBvtOqDEuFOibe4ORtACKRy5AlD?usp=sharing

  • Tom Aarsen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants