Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Great Job! Would the "Clean" training set (i.e., after removing the identified harmful sample) you used to re-train the VLLMs be released? #6

Open
pzs19 opened this issue Nov 11, 2024 · 1 comment

Comments

@pzs19
Copy link

pzs19 commented Nov 11, 2024

No description provided.

@pzs19 pzs19 changed the title Great Job! Would the "Clean" training set (i.e., after removing the identified harmful sample) you used to re-train the VLLMs be open source? Great Job! Would the "Clean" training set (i.e., after removing the identified harmful sample) you used to re-train the VLLMs be released? Nov 11, 2024
@ys-zong
Copy link
Owner

ys-zong commented Nov 13, 2024

Yes, I'll aim to release it soon. At the same time, if you want to reproduce it yourself, you can take the LlamaGuard model to iterate over the whole dataset and filter the samples labelled as "unsafe".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants