Cant train jina clip v2 due to CUDA out of memory error #3113

httplups · 2024-12-03T13:52:45Z

Hi guys.
I am trying to fine-tune the clip model: jina-clip-v2 using my own image-sentence pairs.

I have an iterable dataset.

I am running on Colab, with 40GB of GPU RAM.
The model is loaded and occupies only 2GB of RAM.

When I start the training with my iterable dataset, the GPU memory exceeds 40GB of RAM.

Things I tried:
I am using the truncated model with only 64d embeddings only.
FP16 precision
Batch size of 1
accumulate gradient of 4

Even though I am still cannot train.

Anyone can help?

Here is the link of the jupyter notebook on Colab.

https://colab.research.google.com/drive/1sBhTSNSsZtTOli89t4HV4-T5kDkMfegx?usp=sharing

tomaarsen · 2024-12-06T11:25:39Z

Hello!

Hmm, that is not great. I've noticed that the per_device_train_batch_size to Training Arguments is commented out, so it's not actually training with a batch size of 1 (but I assume that you did try and train with that batch size before, and just commented things away once it didn't work).

Also - the truncate_dim is just post-processing, so it still runs the base model in the normal hidden size, probably 1024.

One option is to load the model itself in fp16 immediately:

model = SentenceTransformer(model_name, device='cuda', trust_remote_code=True, model_kwargs={"torch_dtype": torch.float16, "device_map": "auto"})

Beyond that, your script looks totally normal. I'm a bit surprised at the very high memory usage.

I did some more digging:

jina-clip seems to be automatically loaded in bf16, whereas we train in fp16. This should have resulted in an immediate crash.
Flash Attention seems to be required, but you don't have triton installed which is also required for flash-attn (granted, maybe you installed it in a cell and then removed it), so you should have gotten an error there too.

Another common issue with notebooks is that old variables can be kept in memory even after rerunning a cell.

I'm struggling to get this to train well with fp16, lots of complaints about Half precision, or Half and Float precision not matching, etc.

Having said that, I'm able to train with full precision by casting the model to fp32 and then setting both fp16=False and bf16=False, with 1 batch size, on a 15GB T4:

If you can figure out how to train in fp16/bf16, then the memory usage would be much lower. My guess is that there was some issue with variables persisting.
Here's my slightly updated copy: https://colab.research.google.com/drive/1g-HsvDBvtOqDEuFOibe4ORtACKRy5AlD?usp=sharing

Tom Aarsen

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cant train jina clip v2 due to CUDA out of memory error #3113

Cant train jina clip v2 due to CUDA out of memory error #3113

httplups commented Dec 3, 2024

tomaarsen commented Dec 6, 2024

Cant train jina clip v2 due to CUDA out of memory error #3113

Cant train jina clip v2 due to CUDA out of memory error #3113

Comments

httplups commented Dec 3, 2024

tomaarsen commented Dec 6, 2024