Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enh/error on timeout #683

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Enh/error on timeout #683

wants to merge 4 commits into from

Conversation

PGijsbers
Copy link
Collaborator

Closes #681

If a process "completes" due to exceeding the activity timeout, it should raise an error, and not just continue.

@codecov-commenter
Copy link

codecov-commenter commented Dec 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.70%. Comparing base (b719142) to head (6bb1085).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #683      +/-   ##
==========================================
+ Coverage   68.15%   68.70%   +0.54%     
==========================================
  Files          54       55       +1     
  Lines        6730     6749      +19     
==========================================
+ Hits         4587     4637      +50     
+ Misses       2143     2112      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

retcode = process.poll()
if retcode is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, this is a reliable way to tell the process is still running. And if the process is still running at this point I think the only reason can be that the communicate function returned early, which should only happen with an activity timeout.

Comment on lines +213 to +215
# if a pipe is not ready it could be timeout or it could be end of process
# so at this point we do not know. Only after the communicate function is over do we know.
# i.e., if the process is still running it does not have a retcode.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably I would have raised the activity timeout from the function which uses it. But at this stage we can unfortunately not detect whether the error should be raised.

res = bench.run(args.task, args.fold)
try:
bench.setup(amlb.SetupMode[args.setup])
except StaleProcessError as e:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move to exception notes instead and/or revise this structure. The reason I did it this way is to communicate more clearly to the user with a final message what went wrong and how to solve it. I want to generally make errors easier to parse, as there are some issues opened that are completely solvable from the traceback, but users can't/don't try to parse those.

@PGijsbers PGijsbers requested a review from shchur December 22, 2024 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Program continues after setup fails
2 participants