NVIDIA · wenscarl · Nov 11, 2020 · Dec 8, 2020 · Feb 17, 2021 · Feb 18, 2021
diff --git a/README.md b/README.md
@@ -386,13 +386,14 @@ Note | Source
    6 | `tf.nn.ctc_loss` backprop                                                     | NS                     | NS                     | NS         | TDO        |
    7 | Fused sofmax/crossentropy:<br>`tf.nn.*_cross_entropy_with_logits`<br>backprop | NS                     | NS                     | NS         | NS         |
 
-Note | Source                                                                                                                                  | TF < 2.4  | NGC 20.03+ | TF 2.4 |
-----:|:----------------------------------------------------------------------------------------------------------------------------------------|:----------|:-----------|:-------|
-   8 | `tf.image.resize` with `method=ResizeMethod.BILINEAR`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='bilinear'` backprop | NS        | TDO        | TDO    |
-   9 | `tf.image.resize` with `method=ResizeMethod.NEAREST`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='nearest'` backprop   | NS        | NS         | NS     |
-  10 | `tf.math.segment_sum` and `tf.math.unsorted_segment_sum`<br>forward, and `tf.gather` and `tfa.image.dense_image_warp`<br>backprop       | NS        | NS         | NS     |
-  11 | `tf.image.crop_and_resize` backprop to `image` (on CPU<br>or GPU) and backprop to `boxes`                                               | NS        | NS         | NS     |
-  12 | `tf.sparse.sparse_dense_matmul` forward                                                                                                 | NS        | NS         | NS     |
+Note | Source                                                                                                                                                        | TF < 2.4  | NGC 20.03+ | TF 2.4 |
+----:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------|:-----------|:-------|
+   8 | `tf.image.resize` with `method=ResizeMethod.BILINEAR`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='bilinear'` backprop                       | NS        | TDO        | TDO    |
+   9 | `tf.image.resize` with `method=ResizeMethod.NEAREST`<br>and `tf.keras.layers.UpSampling2D` with<br>`interpolation='nearest'` backprop                         | NS        | NS         | NS     |
+  10 | `tf.math.segment_sum`, `tf.math.unsorted_segment_sum`,<br>and `tf.convert_to_tensor` forward.<br>And `tf.gather` and `tfa.image.dense_image_warp`<br>backprop | NS        | NS         | NS     |
+  11 | `tf.image.crop_and_resize` backprop to `image` (on CPU<br>or GPU) and backprop to `boxes`                                                                     | NS        | NS         | NS     |
+  12 | `tf.sparse.sparse_dense_matmul` forward                                                                                                                       | NS        | NS         | NS     |
+  13 | `tf.math.unsorted_segment_mean`,<br>`tf.math.unsorted_segment_prod`, and <br>`tf.math.unsorted_segment_sqrt` forward                                          | NS        | NS         | NS     |
 
 ##### Key to the Solutions Referenced Above
 
@@ -479,11 +480,36 @@ Note | Source
      issues [#12](https://github.com/NVIDIA/framework-determinism/issues/12) and
      [#24](https://github.com/NVIDIA/framework-determinism/issues/24))
   10. Segment reduction ops `tf.math.segment_sum` and
-     `tf.math.unsorted_segment_sum` have nondeterministic forward operation on
-     GPU. Other ops that are dependent on these ops, including `tf.gather` and
-     `tfa.image.dense_image_warp` (both in backprop), therefore also operate
-     nondeterministically. See
-     [Issue 39751](https://github.com/tensorflow/tensorflow/issues/39751).
+      `tf.math.unsorted_segment_sum` can exhibit nondeterministic forward
+      operation when running on a GPU. `tf.convert_to_tensor`, when fed with
+      (sparse) `tf.IndexedSlices`, uses this potentially nondeterminitic
+      segment sum functionality in its forward direction and therefore may
+      introduce truly random noise into its output when a slice index is
+      represented more than twice in its input (such as when reducing the word
+      embedding gradients from multiple instances of the same word in a sentence
+      or across a batch of sentences). `tf.gather` is often used to select word
+      embeddings from an embedding matrix in a model's forward direction and
+      `tf.gather`'s backprop generates sparse gradients conveyed as
+      `tf.IndexedSlices`. The reduction of the back-propagated sparse gradients
+      from `tf.gather` by `tf.convert_to_tensor` can therefore introduce truly
+      random noise into an embedding trainable variable. A lower-performance
+      work-around for this nondeterminism related to the use of `tf.gather` is
+      to use `tf.linalg.matmul` instead:
+
+      ```
+      # inputs_embeds = tf.gather(embeddings, input_ids)
+      input_embeds = tf.dtypes.cast(
+          tf.one_hot(input_ids, embeddings.shape[0]),
+          embeddings.dtype) @ embeddings
+      ```
+
+      Note that the backward (and forward) functionality of `tf.gather` itself
+      _is_ deterministic. The backprop for `tfa.image.dense_image_warp` may
+      introduce truly random noise because it also uses the nondeterministic
+      segment sum functionality. See
+      [Issue 39751](https://github.com/tensorflow/tensorflow/issues/39751). A
+      patch that will make the segment sum ops function deterministically is in
+      development.
   11. Backprop to `image` on `tf.image.crop_and_resize` introduces
       nondeterministic noise when running on either CPU or GPU. Backprop to
       `boxes` introduces nondeterministic noise when running on GPU. See
@@ -493,6 +519,13 @@ Note | Source
   12. The forward path of `tf.sparse.sparse_dense_matmul` introduces
       nondeterminism for `tf.float32` and (allegedly) for `tf.float64`. See
       TF [Issue 18037](https://github.com/tensorflow/tensorflow/issues/18037).
+  13. Based on initial work from [Lin Lan](https://github.com/llan-ml), we may
+      have have ruled-out nondeterminism in other `tf.math.segment_*` ops beyond
+      `tf.math.segment_sum` and in other `tf.math_unsorted_segment_*` ops beyond
+      `tf.math.unsorted_segment_sum`, `tf.math.unsorted_segment_mean`,
+      `tf.math.unsorted_segment_prod`, and `tf.math_unsorted_segment_sqrt`; see
+      [issue 31](https://github.com/NVIDIA/framework-determinism/issues/31).
+      Also see note 10, above.
 
 #### Other Possible GPU-Specific Sources of Non-Determinism
 
@@ -558,7 +591,7 @@ This section catalogs relevant links.
 
 ### TensorFlow Issues
 
-GitHiub issues in the TensorFlow project:
+GitHub issues in the TensorFlow project:
 
 Number                                                         | Title                                                                                    | Date Opened | Status |
 --------------------------------------------------------------:|:-----------------------------------------------------------------------------------------|:------------|:-------|
@@ -590,7 +623,8 @@ GitHub issues in dependent or related projects:
 ### TensorFlow Pull Requests
 
 The following pull requests (and some inidividual commits) are those in the
-TensorFlow GitHub repo that are directly related to this project. As we have
+TensorFlow GitHub repo (`github.com/tensorflow/tensorflow`) that are directly
+related to this project. As we have
 [discovered](scripts/README.md#find-tensorflow-commits), 1.8% of all commits
 seem to reference, or have some relationship with, "determinism" or
 "deterministic". As of 2020-01-30, that was 1,391 commits.
@@ -618,7 +652,8 @@ ID                                                           | Title
 [38089](https://github.com/tensorflow/tensorflow/pull/38089) | Add reminder to test deterministic cuDNN CTC loss                | closed |             |         |
 [38509](https://github.com/tensorflow/tensorflow/pull/38509) | List deterministic op func bug fixes in v2.2<br>release notes    | merged | 2020-04-15  | 2.2     |
 [39243](https://github.com/tensorflow/tensorflow/pull/39243) | GPU-deterministic tf.image.resize (bilinear)                     | merged | 2020-09-22  | 2.4     |
-
+[44717](https://github.com/tensorflow/tensorflow/pull/44717) | Add to rel notes: deterministic tf.image.resize (bilinear)       | merged | 2020-11-13  | 2.4     |
+
 Notes:
   1. These are individual commits.
 
@@ -628,6 +663,15 @@ Notes:
 [1004]: https://github.com/tensorflow/tensorflow/commit/8b7a3db0b6e09415b5640be4986fb4d7c6e5209a
 [1005]: https://github.com/tensorflow/tensorflow/commit/9e096debc4a0909deb69970f38bee7b77e5e5f7d
 
+### Other TensorFlow Organization Pull Requests
+
+These are relevant pull requests against repositories in
+`github.com/tensorflow` other than `github.com/tensorflow/tensorflow`
+
+ Repository | Number                                                   | Title                                                                 | Date Opened | Status |
+:-----------|---------------------------------------------------------:|:----------------------------------------------------------------------|:------------|:-------|
+ community  | [346](https://github.com/tensorflow/community/pull/346)  | RFC: Enhancing determinism in TF                                      | 2021-01-19  | Open   |
+
 ### PyTorch Pull Requests
 
 ID                                                     | Title                                                         | Status | Date Merged | Version |
@@ -685,6 +729,7 @@ Andrew Kerr,
 Xiang Bo Kong,
 Nicolas Koumchatzky,
 Jorge Albericio Latorre,
+Lin Lan,
 Simon Layton,
 Ned Letcher,
 Jose Alvarez Lopez,

diff --git a/fwd9m/tensorflow/__init__.py b/fwd9m/tensorflow/__init__.py
@@ -19,4 +19,4 @@
 
 # What follows is the public API for fwd9m.tensorflow
 from .enable_determinism import _enable_determinism as enable_determinism
-from .patch import _patch as patch # deprecated
+from .patch import _patch as patch # deprecated
diff --git a/fwd9m/tensorflow/enable_determinism.py b/fwd9m/tensorflow/enable_determinism.py
@@ -23,46 +23,49 @@
 
 import tensorflow as tf
 
-from .patch import _patch_bias_add
-from .patch import _patch_unsorted_segment_sum
-from .patch import _patch_segment_sum
-from ..utils import _Version as Version
-from ..version import __version__ as package_version
+# By calling the deprecated patch API here, we continue to test its effect
+# without having to test it explicitly. Note that this form of import
+# necessarily breaks the Google Python Style Guide rule to import packages
+# and modules only (and not individual functions).
+from ..tensorflow import patch as patch_bias_add
+from . import patch_segment_sum
+from . import patch_unsorted_segment_sum
+from . import patch_softmax_xent
+from . import patch_sparse_softmax_xent
+from .. import utils
+from .. import version
 
 def _enable_determinism(seed=None):
   """Provides a best-effort recipe to increase framework determinism when
     running on GPUs.
-
     Call this method either before or after explicitly importing TensorFlow,
     but always before constructing any graphs.
-
-    This function cannot address all possible sources of non-determinism. Please 
+    This function cannot address all possible sources of non-determinism. Please
     see further instructions at https://github.com/NVIDIA/framework-determinism
     to understand how to use it in a larger deterministic context.
-
     Arguments:
       seed: <fill in>
-
     Returns: None
   """
-  tf_vers = Version(tf.version.VERSION)
+  tf_vers = utils._Version(tf.version.VERSION)
   ngc_tf_container_version_string = os.environ.get('NVIDIA_TENSORFLOW_VERSION')
   if ngc_tf_container_version_string:
     in_ngc_cont = True
-    ngc_vers = Version(ngc_tf_container_version_string)
+    ngc_vers = utils._Version(ngc_tf_container_version_string)
   else:
     in_ngc_cont = False
   if not in_ngc_cont and tf_vers.between('1.14', '2.0'):
     os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
-    _patch_bias_add()
+    patch_bias_add(_silent=True)
   if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('2.1'):
     os.environ['TF_DETERMINISTIC_OPS'] = '1'
   if in_ngc_cont and ngc_vers.at_least('19.06') or tf_vers.at_least('1.14'):
-    _patch_unsorted_segment_sum()
-    _patch_segment_sum()
-    # Apply the fused softmax/cross-entropy patch here
+    patch_segment_sum._patch_segment_sum()
+    patch_unsorted_segment_sum._patch_unsorted_segment_sum()
+    patch_softmax_xent._patch_softmax_xent()
+    patch_sparse_softmax_xent._patch_sparse_softmax_xent()
     pass
   # TODO: Add other recipe items (e.g. seed)
   print("%s (version %s) has been applied to TensorFlow "
-        "version %s" % (__name__, package_version,
+        "version %s" % (__name__, version.__version__,
                         tf_vers.original_version_string))