-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changed to proper Xavier initialization, existing implementation was … #1927
Conversation
…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization
Hi @eknag! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Behavior before
|
Behavior after fix:
|
@eknag Thanks for looking into this. The code is from upstream model: https://github.com/facebookresearch/dlrm/blob/main/dlrm_s_pytorch.py#L212 Could you please submit a PR for the facebookresearch/dlrm repository and see what is the feedback there? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaronenyeshi Can you also help take a look?
@eknag Could you please also add the URL of the upstream PR (facebookresearch/dlrm#358) to the code? We would like to track all such upstream improvements in the code.
@xuzhao9 Added. Let me know if I should do anything else - I'm new to making contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization