We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
这里的loss计算是不是应该归一化一下
loss = (loss * shift_weights).sum() -> loss = (loss * shift_weights).sum() / shift_weights.sum()
loss = (loss * shift_weights).sum()
loss = (loss * shift_weights).sum() / shift_weights.sum()
把loss归一化到token粒度 前一种方式,loss的scale偏大,而且反向传播梯度也会偏大。而且极限情况下,假设每个样本只有1个token,这个batch的loss会爆炸
The text was updated successfully, but these errors were encountered:
这里的shift_weights已经经过归一化了。每个sample的weight加起来为1。
shift_weights
Sorry, something went wrong.
No branches or pull requests
这里的loss计算是不是应该归一化一下
loss = (loss * shift_weights).sum()
->loss = (loss * shift_weights).sum() / shift_weights.sum()
把loss归一化到token粒度
前一种方式,loss的scale偏大,而且反向传播梯度也会偏大。而且极限情况下,假设每个样本只有1个token,这个batch的loss会爆炸
The text was updated successfully, but these errors were encountered: