Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Softmax xent #33

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 60 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,13 +386,14 @@ Note | Source
6 | `tf.nn.ctc_loss` backprop | NS | NS | NS | TDO |
7 | Fused sofmax/crossentropy:<br>`tf.nn.*_cross_entropy_with_logits`<br>backprop | NS | NS | NS | NS |

Note | Source | TF < 2.4 | NGC 20.03+ | TF 2.4 |
----:|:----------------------------------------------------------------------------------------------------------------------------------------|:----------|:-----------|:-------|
8 | `tf.image.resize` with `method=ResizeMethod.BILINEAR`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='bilinear'` backprop | NS | TDO | TDO |
9 | `tf.image.resize` with `method=ResizeMethod.NEAREST`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='nearest'` backprop | NS | NS | NS |
10 | `tf.math.segment_sum` and `tf.math.unsorted_segment_sum`<br>forward, and `tf.gather` and `tfa.image.dense_image_warp`<br>backprop | NS | NS | NS |
11 | `tf.image.crop_and_resize` backprop to `image` (on CPU<br>or GPU) and backprop to `boxes` | NS | NS | NS |
12 | `tf.sparse.sparse_dense_matmul` forward | NS | NS | NS |
Note | Source | TF < 2.4 | NGC 20.03+ | TF 2.4 |
----:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------|:-----------|:-------|
8 | `tf.image.resize` with `method=ResizeMethod.BILINEAR`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='bilinear'` backprop | NS | TDO | TDO |
9 | `tf.image.resize` with `method=ResizeMethod.NEAREST`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='nearest'` backprop | NS | NS | NS |
10 | `tf.math.segment_sum`, `tf.math.unsorted_segment_sum`,<br>and `tf.convert_to_tensor` forward.<br>And `tf.gather` and `tfa.image.dense_image_warp`<br>backprop | NS | NS | NS |
11 | `tf.image.crop_and_resize` backprop to `image` (on CPU<br>or GPU) and backprop to `boxes` | NS | NS | NS |
12 | `tf.sparse.sparse_dense_matmul` forward | NS | NS | NS |
13 | `tf.math.unsorted_segment_mean`,<br>`tf.math.unsorted_segment_prod`, and <br>`tf.math.unsorted_segment_sqrt` forward | NS | NS | NS |

##### Key to the Solutions Referenced Above

Expand Down Expand Up @@ -479,11 +480,36 @@ Note | Source
issues [#12](https://github.com/NVIDIA/framework-determinism/issues/12) and
[#24](https://github.com/NVIDIA/framework-determinism/issues/24))
10. Segment reduction ops `tf.math.segment_sum` and
`tf.math.unsorted_segment_sum` have nondeterministic forward operation on
GPU. Other ops that are dependent on these ops, including `tf.gather` and
`tfa.image.dense_image_warp` (both in backprop), therefore also operate
nondeterministically. See
[Issue 39751](https://github.com/tensorflow/tensorflow/issues/39751).
`tf.math.unsorted_segment_sum` can exhibit nondeterministic forward
operation when running on a GPU. `tf.convert_to_tensor`, when fed with
(sparse) `tf.IndexedSlices`, uses this potentially nondeterminitic
segment sum functionality in its forward direction and therefore may
introduce truly random noise into its output when a slice index is
represented more than twice in its input (such as when reducing the word
embedding gradients from multiple instances of the same word in a sentence
or across a batch of sentences). `tf.gather` is often used to select word
embeddings from an embedding matrix in a model's forward direction and
`tf.gather`'s backprop generates sparse gradients conveyed as
`tf.IndexedSlices`. The reduction of the back-propagated sparse gradients
from `tf.gather` by `tf.convert_to_tensor` can therefore introduce truly
random noise into an embedding trainable variable. A lower-performance
work-around for this nondeterminism related to the use of `tf.gather` is
to use `tf.linalg.matmul` instead:

```
# inputs_embeds = tf.gather(embeddings, input_ids)
input_embeds = tf.dtypes.cast(
tf.one_hot(input_ids, embeddings.shape[0]),
embeddings.dtype) @ embeddings
```

Note that the backward (and forward) functionality of `tf.gather` itself
_is_ deterministic. The backprop for `tfa.image.dense_image_warp` may
introduce truly random noise because it also uses the nondeterministic
segment sum functionality. See
[Issue 39751](https://github.com/tensorflow/tensorflow/issues/39751). A
patch that will make the segment sum ops function deterministically is in
development.
11. Backprop to `image` on `tf.image.crop_and_resize` introduces
nondeterministic noise when running on either CPU or GPU. Backprop to
`boxes` introduces nondeterministic noise when running on GPU. See
Expand All @@ -493,6 +519,13 @@ Note | Source
12. The forward path of `tf.sparse.sparse_dense_matmul` introduces
nondeterminism for `tf.float32` and (allegedly) for `tf.float64`. See
TF [Issue 18037](https://github.com/tensorflow/tensorflow/issues/18037).
13. Based on initial work from [Lin Lan](https://github.com/llan-ml), we may
have have ruled-out nondeterminism in other `tf.math.segment_*` ops beyond
`tf.math.segment_sum` and in other `tf.math_unsorted_segment_*` ops beyond
`tf.math.unsorted_segment_sum`, `tf.math.unsorted_segment_mean`,
`tf.math.unsorted_segment_prod`, and `tf.math_unsorted_segment_sqrt`; see
[issue 31](https://github.com/NVIDIA/framework-determinism/issues/31).
Also see note 10, above.

#### Other Possible GPU-Specific Sources of Non-Determinism

Expand Down Expand Up @@ -558,7 +591,7 @@ This section catalogs relevant links.

### TensorFlow Issues

GitHiub issues in the TensorFlow project:
GitHub issues in the TensorFlow project:

Number | Title | Date Opened | Status |
--------------------------------------------------------------:|:-----------------------------------------------------------------------------------------|:------------|:-------|
Expand Down Expand Up @@ -590,7 +623,8 @@ GitHub issues in dependent or related projects:
### TensorFlow Pull Requests

The following pull requests (and some inidividual commits) are those in the
TensorFlow GitHub repo that are directly related to this project. As we have
TensorFlow GitHub repo (`github.com/tensorflow/tensorflow`) that are directly
related to this project. As we have
[discovered](scripts/README.md#find-tensorflow-commits), 1.8% of all commits
seem to reference, or have some relationship with, "determinism" or
"deterministic". As of 2020-01-30, that was 1,391 commits.
Expand Down Expand Up @@ -618,7 +652,8 @@ ID | Title
[38089](https://github.com/tensorflow/tensorflow/pull/38089) | Add reminder to test deterministic cuDNN CTC loss | closed | | |
[38509](https://github.com/tensorflow/tensorflow/pull/38509) | List deterministic op func bug fixes in v2.2<br>release notes | merged | 2020-04-15 | 2.2 |
[39243](https://github.com/tensorflow/tensorflow/pull/39243) | GPU-deterministic tf.image.resize (bilinear) | merged | 2020-09-22 | 2.4 |

[44717](https://github.com/tensorflow/tensorflow/pull/44717) | Add to rel notes: deterministic tf.image.resize (bilinear) | merged | 2020-11-13 | 2.4 |

Notes:
1. These are individual commits.

Expand All @@ -628,6 +663,15 @@ Notes:
[1004]: https://github.com/tensorflow/tensorflow/commit/8b7a3db0b6e09415b5640be4986fb4d7c6e5209a
[1005]: https://github.com/tensorflow/tensorflow/commit/9e096debc4a0909deb69970f38bee7b77e5e5f7d

### Other TensorFlow Organization Pull Requests

These are relevant pull requests against repositories in
`github.com/tensorflow` other than `github.com/tensorflow/tensorflow`

Repository | Number | Title | Date Opened | Status |
:-----------|---------------------------------------------------------:|:----------------------------------------------------------------------|:------------|:-------|
community | [346](https://github.com/tensorflow/community/pull/346) | RFC: Enhancing determinism in TF | 2021-01-19 | Open |

### PyTorch Pull Requests

ID | Title | Status | Date Merged | Version |
Expand Down Expand Up @@ -685,6 +729,7 @@ Andrew Kerr,
Xiang Bo Kong,
Nicolas Koumchatzky,
Jorge Albericio Latorre,
Lin Lan,
Simon Layton,
Ned Letcher,
Jose Alvarez Lopez,
Expand Down
2 changes: 1 addition & 1 deletion fwd9m/tensorflow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,4 @@

# What follows is the public API for fwd9m.tensorflow
from .enable_determinism import _enable_determinism as enable_determinism
from .patch import _patch as patch # deprecated
from .patch import _patch as patch # deprecated
37 changes: 20 additions & 17 deletions fwd9m/tensorflow/enable_determinism.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,46 +23,49 @@

import tensorflow as tf

from .patch import _patch_bias_add
from .patch import _patch_unsorted_segment_sum
from .patch import _patch_segment_sum
from ..utils import _Version as Version
from ..version import __version__ as package_version
# By calling the deprecated patch API here, we continue to test its effect
# without having to test it explicitly. Note that this form of import
# necessarily breaks the Google Python Style Guide rule to import packages
# and modules only (and not individual functions).
from ..tensorflow import patch as patch_bias_add
from . import patch_segment_sum
from . import patch_unsorted_segment_sum
from . import patch_softmax_xent
from . import patch_sparse_softmax_xent
from .. import utils
from .. import version

def _enable_determinism(seed=None):
"""Provides a best-effort recipe to increase framework determinism when
running on GPUs.

Call this method either before or after explicitly importing TensorFlow,
but always before constructing any graphs.

This function cannot address all possible sources of non-determinism. Please
This function cannot address all possible sources of non-determinism. Please
see further instructions at https://github.com/NVIDIA/framework-determinism
to understand how to use it in a larger deterministic context.

Arguments:
seed: <fill in>

Returns: None
"""
tf_vers = Version(tf.version.VERSION)
tf_vers = utils._Version(tf.version.VERSION)
ngc_tf_container_version_string = os.environ.get('NVIDIA_TENSORFLOW_VERSION')
if ngc_tf_container_version_string:
in_ngc_cont = True
ngc_vers = Version(ngc_tf_container_version_string)
ngc_vers = utils._Version(ngc_tf_container_version_string)
else:
in_ngc_cont = False
if not in_ngc_cont and tf_vers.between('1.14', '2.0'):
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
_patch_bias_add()
patch_bias_add(_silent=True)
if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('2.1'):
os.environ['TF_DETERMINISTIC_OPS'] = '1'
if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('1.14'):
_patch_unsorted_segment_sum()
_patch_segment_sum()
# Apply the fused softmax/cross-entropy patch here
patch_segment_sum._patch_segment_sum()
patch_unsorted_segment_sum._patch_unsorted_segment_sum()
patch_softmax_xent._patch_softmax_xent()
patch_sparse_softmax_xent._patch_sparse_softmax_xent()
pass
# TODO: Add other recipe items (e.g. seed)
print("%s (version %s) has been applied to TensorFlow "
"version %s" % (__name__, package_version,
"version %s" % (__name__, version.__version__,
tf_vers.original_version_string))
Loading