-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use state sampler stub to defer metrics updates when DoFn#process is executed in subprocess. #32600
base: master
Are you sure you want to change the base?
Conversation
83968c6
to
576f7f9
Compare
576f7f9
to
7990130
Compare
7990130
to
03da33d
Compare
|
||
class StateSamplerInterface(ABC): | ||
@abstractmethod | ||
def update_metric(self, typed_metric_name, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks strange to a reader unfamiliar with this change why we are abstracting only one particular method of StateSampler. Furthermore, we are reasoning about the logic how to accumulate the metrics in the stub, which might go out-of sync.
I wonder whether we can replace the statesampler reference in the subprocess with a unittest.mock.MagicMock, and then replay the calls returned by the_mock.mock_calls() on the real reference without introducing another class. As long as the mock doesn't need to return any values, this should work. @robertwb do you think this would be too hacky? Are there some concerns if the replay happens only after the DoFn.process() call finishes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting idea, however I'd rather avoid dependencies on test classes like MagicMock. (It may also have issues with the pickling required here). I do think if we're going to introduce an interface here, we should list all the relevant methods and have the stub explicitly raise an error for the unsupported ones to clarify that it's a partial implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robertwb do we need to apply the interface onto the cython implementation as well in this case? Does cython support inheriting from a python ABC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @robertwb offline - cdef Cython classes cannot inherit from python classes, there is a limited support for multiple inheritance but comes with a potential performance hit, and this is performance sensitive code; we can leave the cython codepath alone and add a comment about the existence of the StateSampler interfaces defined in python codebase but not use them in Cythonized codepath.
|
||
class StateSamplerInterface(ABC): | ||
@abstractmethod | ||
def update_metric(self, typed_metric_name, value): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an interesting idea, however I'd rather avoid dependencies on test classes like MagicMock. (It may also have issues with the pickling required here). I do think if we're going to introduce an interface here, we should list all the relevant methods and have the stub explicitly raise an error for the unsupported ones to clarify that it's a partial implementation.
Any updates on this? |
I will be able to pick this up again early next week. Is the consensus to go with the interface approach but list all the relevant method/raise an error if unimplemented? |
sgtm |
d437e86
to
70b8066
Compare
70b8066
to
18a1414
Compare
I've updated with a more complete interface. Can you please take a look @robertwb ? |
Assigning reviewers. If you would like to opt out of this review, comment R: @damccorm for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Reminder, please take a look at this pr: @damccorm |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
|
Reminder, please take a look at this pr: @liferoad |
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @damccorm for label python. Available commands:
|
Overall lgtm, PTAL at lint issues. Thanks!
|
Reminder, please take a look at this pr: @damccorm |
waiting on author |
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123
), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>
instead.CHANGES.md
with noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.