changed to proper Xavier initialization, existing implementation was … #1927

eknag · 2023-09-23T20:08:15Z

…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization

facebook-github-bot · 2023-09-23T20:08:19Z

Hi @eknag!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

eknag · 2023-09-23T20:13:28Z

Behavior before

 python run.py dlrm -d cuda -t train
Running train method from dlrm on cuda in eager mode with input batch size 2048 and precision fp32.
gen: tensor([[0.],
        [0.],
        [0.],
        ...,
        [0.],
        [0.],
        [0.]], device='cuda:0', grad_fn=<ReluBackward0>)
loss: 0.3434426486492157
derivative with respect to first linear layer: tensor([[0., 0., 0.,  ..., 0., 0., 0.],

eknag · 2023-09-23T20:14:26Z

Behavior after fix:

➜  benchmark git:(dev/eknag/fix_dlrm_bias_initialization) ✗ python run.py dlrm -d cuda -t train              
Running train method from dlrm on cuda in eager mode with input batch size 2048 and precision fp32.
gen: tensor([[0.0000],
        [0.0000],
        [0.0213],
        ...,
        [0.0000],
        [0.0231],
        [0.0000]], device='cuda:0', grad_fn=<ReluBackward0>)
loss: 0.3336436152458191
derivative with respect to first linear layer: tensor([[ 7.9563e-04,  7.3723e-04,  7.9948e-04,  ...,  7.8230e-04,

…ccumulating" This reverts commit e7566ec.

xuzhao9 · 2023-09-24T00:48:46Z

@eknag Thanks for looking into this. The code is from upstream model: https://github.com/facebookresearch/dlrm/blob/main/dlrm_s_pytorch.py#L212

Could you please submit a PR for the facebookresearch/dlrm repository and see what is the feedback there? Thanks!

eknag · 2023-09-25T18:43:08Z

See facebookresearch/dlrm#358

xuzhao9

@aaronenyeshi Can you also help take a look?

@eknag Could you please also add the URL of the upstream PR (facebookresearch/dlrm#358) to the code? We would like to track all such upstream improvements in the code.

eknag · 2023-09-27T17:30:12Z

@xuzhao9 Added. Let me know if I should do anything else - I'm new to making contributions.

xuzhao9

LGTM

facebook-github-bot · 2023-09-28T22:30:54Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-09-29T20:55:55Z

@xuzhao9 merged this pull request in 827f90b.

changed to proper Xavier initialization, existing implementation was …

6201a73

…resulting in a large negative bias, which was killing all gradients through the following relu. https://paperswithcode.com/method/xavier-initialization

eknag added 2 commits September 23, 2023 13:21

call zero_grad in dlrm training loop to stop gradients from accumulating

e7566ec

Revert "call zero_grad in dlrm training loop to stop gradients from a…

9f4b92d

…ccumulating" This reverts commit e7566ec.

facebook-github-bot added the cla signed label Sep 23, 2023

eknag temporarily deployed to docker-s3-upload September 24, 2023 00:49 — with GitHub Actions Inactive

eknag mentioned this pull request Sep 25, 2023

switched bias initialization to initialize to zeros per standard Xavi… facebookresearch/dlrm#358

Open

xuzhao9 requested review from aaronenyeshi and xuzhao9 September 26, 2023 15:02

xuzhao9 reviewed Sep 26, 2023

View reviewed changes

added link to upstream PR

5d2bf0f

xuzhao9 approved these changes Sep 27, 2023

View reviewed changes

eknag temporarily deployed to docker-s3-upload September 28, 2023 18:40 — with GitHub Actions Inactive

eknag temporarily deployed to docker-s3-upload September 28, 2023 18:41 — with GitHub Actions Inactive

facebook-github-bot closed this in 827f90b Sep 29, 2023

facebook-github-bot added the Merged label Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changed to proper Xavier initialization, existing implementation was … #1927

changed to proper Xavier initialization, existing implementation was … #1927

eknag commented Sep 23, 2023

facebook-github-bot commented Sep 23, 2023

eknag commented Sep 23, 2023

eknag commented Sep 23, 2023

xuzhao9 commented Sep 24, 2023

eknag commented Sep 25, 2023

xuzhao9 left a comment •

edited

Loading

eknag commented Sep 27, 2023

xuzhao9 left a comment

facebook-github-bot commented Sep 28, 2023

facebook-github-bot commented Sep 29, 2023

changed to proper Xavier initialization, existing implementation was … #1927

changed to proper Xavier initialization, existing implementation was … #1927

Conversation

eknag commented Sep 23, 2023

facebook-github-bot commented Sep 23, 2023

Action Required

Process

eknag commented Sep 23, 2023

eknag commented Sep 23, 2023

xuzhao9 commented Sep 24, 2023

eknag commented Sep 25, 2023

xuzhao9 left a comment • edited Loading

Choose a reason for hiding this comment

eknag commented Sep 27, 2023

xuzhao9 left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 28, 2023

facebook-github-bot commented Sep 29, 2023

xuzhao9 left a comment •

edited

Loading