New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

packing loss 的归一化问题 #12

Open

Chandler-Bing opened this issue Aug 27, 2024 · 1 comment

Chandler-Bing commented Aug 27, 2024

这里的loss计算是不是应该归一化一下

loss = (loss * shift_weights).sum() -> loss = (loss * shift_weights).sum() / shift_weights.sum()

把loss归一化到token粒度
前一种方式，loss的scale偏大，而且反向传播梯度也会偏大。而且极限情况下，假设每个样本只有1个token，这个batch的loss会爆炸

The text was updated successfully, but these errors were encountered:

Member

bys0318 commented Sep 5, 2024

这里的shift_weights已经经过归一化了。每个sample的weight加起来为1。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment