Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4080显卡,基本跑不了多少数据,过万条训练数据就报错 #54

Open
iissy opened this issue Jul 24, 2024 · 6 comments
Open

Comments

@iissy
Copy link

iissy commented Jul 24, 2024

我已经把配置文件改小了:
`class T5ModelConfig:

d_ff: int = 1024                        # 全连接层维度

d_model: int = 512                      # 词向量维度
num_heads: int = 8                     # 注意力头数 d_model // num_heads == d_kv
d_kv: int = 64                          # d_model // num_heads

num_decoder_layers: int = 6            # Transformer decoder 隐藏层层数
num_layers: int = 6                    # Transformer encoder 隐藏层层数

`

词汇表也只10000,百度百科百万级别数据,我只能取几千条跑,多了就报错。

电脑配置:
显卡:4080(12G显存)
内存:32G,
cpu:i9-14900HX(24核,32线程)

这配置不配训练大模型吗?

@iissy
Copy link
Author

iissy commented Jul 24, 2024

raceback (most recent call last):
File "I:\ChatLM-mini-Chinese\pre_train.py", line 140, in
pre_train(config)
File "I:\ChatLM-mini-Chinese\pre_train.py", line 123, in pre_train
trainer.train(
File "C:\Users\pinbo.conda\envs\chat\lib\site-packages\transformers\trainer.py", line 1932, in train
return inner_training_loop(
File "C:\Users\pinbo.conda\envs\chat\lib\site-packages\accelerate\utils\memory.py", line 142, in decorator
raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
0%| | 8/1281856 [00:16<724:58:07, 2.04s/it]

@iissy iissy changed the title 这个模型训练对硬件要求很高 4080显卡,基本跑不了多少数据,过万条训练数据就报错 Jul 24, 2024
@iissy
Copy link
Author

iissy commented Jul 24, 2024

文中说的16G内存,4G显存,有真实成功跑过吗?求解

@staxd
Copy link

staxd commented Aug 2, 2024

文中说的16G内存,4G显存,有真实成功跑过吗?求解

3000条,24G显存直接拉满

@iissy
Copy link
Author

iissy commented Aug 5, 2024

文中说的16G内存,4G显存,有真实成功跑过吗?求解

3000条,24G显存直接拉满

看了文档,用train.py训练,修改batch_size_per_gpu为1,对内存占用的确很少了。

@charent
Copy link
Owner

charent commented Sep 22, 2024

我这边是可以的,整个SFT都是在16G显存的显卡上进行的,如果显存占用异常,需要检查:

  1. 检查你的数据集的最大文本长度,看是不是有特别长的文本,输入文本和输出文本要加在一起算长度。
  2. main函数中增加torch.set_default_dtype(torch.bfloat16),默认使用bf16数据格式。代码中的默认参数是AMP混合精度,即模型参数及优化器状态使用FP32,梯度使用bf16,优化器状态是最占显存的。
  3. 不使用adamw优化器,使用adafactor或者Lion优化器,这两个优化器的参数使用和adamw不一样,如Lion需要更低的学习率,详情请看官方文档。
  4. RuntimeError: No executable batch size found, reached zero.这个错误请检查你的数据集,是不是有空数据,空字符串也算空数据,或者是太长的数据没有做截断。
  5. 减少你的batch_size大小,想要获得更大的batch_size效果,请用梯度累积gradient_accumulation

@charent
Copy link
Owner

charent commented Sep 22, 2024

RuntimeError: No executable batch size found, reached zero这个错误请参考这两个issues:issues/37issues/40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants