Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tokenizer训练OOM 。内存60G #59

Open
musexiaoluo opened this issue Aug 28, 2024 · 1 comment
Open

tokenizer训练OOM 。内存60G #59

musexiaoluo opened this issue Aug 28, 2024 · 1 comment

Comments

@musexiaoluo
Copy link

调用 train_my_huggingface_wiki_tokenizer 方法时,OOM 。 超过了 60G 的内存

image

@charent
Copy link
Owner

charent commented Sep 22, 2024

试试用Windows训练,把硬盘当作内存用。另外,可以对数据集进行随机采样,没必要用全部的数据训练tokenizer。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants