Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练集与微调集数据重叠 #60

Open
AndyZZt opened this issue Nov 6, 2024 · 0 comments
Open

训练集与微调集数据重叠 #60

AndyZZt opened this issue Nov 6, 2024 · 0 comments

Comments

@AndyZZt
Copy link

AndyZZt commented Nov 6, 2024

1.Belle的数据集命名有点不清楚:在README的预训练部分提到使用Belle的train_2M_CN.json数据集,并且README里面处理微调数据时也用到了这个数据集,而且Belle的仓库里也只看到了train_2M_CN。但是微调的数据处理代码里写的是train_conv_2.json(utils/raw_data_process.py line 1107),这两个地方确定是一样的数据集吗?
image

2.README里提到预训练用了一个Train_3.5M_CN.json数据集,但是在代码中,变成了一个找不到出处的'/data/raw_data/bell_open_source/train_0.8M_CN.json'(utils/raw_data_process.py line 505)

感谢解答~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant