-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended Debugger reductions and fixed some bugs #523
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #523 +/- ##
==========================================
- Coverage 77.60% 71.24% -6.37%
==========================================
Files 127 117 -10
Lines 11111 10614 -497
==========================================
- Hits 8623 7562 -1061
- Misses 2488 3052 +564
Continue to review full report at Codecov.
|
if subfolder == None: | ||
subfolder = self.mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use subfolder = os.path.join(self.mode)
?
It makes the intentions of this line much clearer to the reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subfolder is just a string and not a filepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join
returns an object of type str
. If I understand this part correctly, the subfolder variable contains the path to sub directory for tensorboard data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no subfolder is just the name of the reduction. each reduction will be its own subfolder in the tensorboard directory.
if subfolder == None: | ||
subfolder = self.mode | ||
|
||
if subfolder in self.tb_writers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should rename the name of this map. Maybe something more explicit like self.tb_writer_to_dir_map
?
smdebug/core/hook.py
Outdated
@@ -660,16 +663,32 @@ def export_collections(self): | |||
collection_file_name = f"{self.worker}_collections.json" | |||
self.collection_manager.export(self.out_dir, collection_file_name) | |||
|
|||
def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs): | |||
return get_reduction_tensor_name(tensor_name, reduction_name, abs, remove_colon_index=True) | |||
def _get_reduction_tensor_name(self, tensor_name, reduction_name, abs, collection_name=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we accept an empty string collection name?
Should we use the DEFAULT
collection key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extended this part, which is needed for Tensorboard to group visualizations by Debugger collections. To keep it consistent with previous code, the collection name will be empty per default.
reduction_name = "abs_" + reduction_name | ||
tb_writer = self._maybe_get_tb_writer(subfolder=reduction_name) | ||
if tb_writer: | ||
scalar = self._make_numpy_array(tensor_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the value of scalar
if tb_writer = None
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per default Debugger writes reductions like normal tensors into debug-output folder and users can retrieve the data via the smdebug API. I extended this part, so that reductions are also written in Tensorboard format (in case user provided a Tensorboard configuration)
if hasattr(torch.Tensor, reduction_name): | ||
f = getattr(torch.Tensor, reduction_name) | ||
op = f(tensor_data.float()) | ||
if reduction_name == "isnan" or reduction_name == "isinf": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we manage reduction_name values with Enums?
Description of changes:
Extended smdebug's reductions to check for nan- and inf-values and to compute quantiles for PT tensors. Tensors are now also written out in Tensorboard format such that users can display all reductions for a specific tensor within the same visualization and visualizations will be grouped by Debugger collections.
Here is an example visualization:
Style and formatting:
I have run
pre-commit install && pre-commit run --all-files
to ensure that auto-formatting happens with every commit.Issue number, if available
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.