Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

packing loss 的归一化问题 #12

Open
Chandler-Bing opened this issue Aug 27, 2024 · 1 comment
Open

packing loss 的归一化问题 #12

Chandler-Bing opened this issue Aug 27, 2024 · 1 comment

Comments

@Chandler-Bing
Copy link

这里的loss计算是不是应该归一化一下

loss = (loss * shift_weights).sum() -> loss = (loss * shift_weights).sum() / shift_weights.sum()

把loss归一化到token粒度
前一种方式,loss的scale偏大,而且反向传播梯度也会偏大。而且极限情况下,假设每个样本只有1个token,这个batch的loss会爆炸

@bys0318
Copy link
Member

bys0318 commented Sep 5, 2024

这里的shift_weights已经经过归一化了。每个sample的weight加起来为1。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@Chandler-Bing @bys0318 and others