Skip to content
This repository has been archived by the owner on Feb 8, 2023. It is now read-only.

[BUG] Pymars fails to exit console normally after python console and script execution when supervisor and worker start locally #5

Closed
ChengjieLi28 opened this issue Aug 22, 2022 · 5 comments · May be fixed by #8
Labels
bug Something isn't working mod: deploy

Comments

@ChengjieLi28
Copy link

Describe the bug
After starting supervisor process and worker process locally, when executing pymars codes in python console or a single python script, pymars cannot exit normally. Instead I have to use ctrl+c to exit the command line.
While pymars can exit normally in IPython.

To Reproduce
To help us reproducing this bug, please provide information below:

  1. Your Python version: 3.9.12
  2. The version of Mars you use: 0.9.0, the latest version of pymars, using pip install.
  3. Versions of crucial packages, such as numpy, scipy and pandas: follow pymars.
  4. Full stack of the error: No logs in supervisor and worker process. And no information in console when press Ctrl+C.
  5. The situation: See the screenshots:
  • Run in the python console:
    image
  • Run in a script:
    image
  1. Minimized code to reproduce the error.
  • Firstly, start supervisor process and worker process locally.
mars-supervisor -H 0.0.0.0 -p 9002 -w 9001
mars-worker -H 0.0.0.0 -p 9003 -s 127.0.0.1:9002
  • Secondly, run the following codes in python console.
import mars
import mars.tensor as mt

s = mars.new_session('http://127.0.0.1:9001')

a = mt.random.rand(4, 4, chunk_size=2)
b = mt.inner(a, a)
b.execute()  # submit tensor to cluster
  • Last, execute exit() try to exit the python console.
@aresnow1 aresnow1 added bug Something isn't working mod: deploy labels Aug 23, 2022
@qianduoduo0904
Copy link

exit seems simply raises SystemExit error (refer to https://stackoverflow.com/a/19747562 ), perhaps we did not properly handle this error?

@aresnow1
Copy link

aresnow1 commented Aug 23, 2022

I reproduced this issue and found the decref thread hang when console exit. Then I commented these two lines and works, https://github.com/xprobe/mars/blob/fc798e3cd3d4e8874c91b1e22ae6919f952e24d0/mars/core/entity/executable.py#L70-L71 Any idea to fix it properly @qianduoduo0904 ?

@qianduoduo0904
Copy link

qianduoduo0904 commented Aug 23, 2022

@aresnow1

see: https://docs.python.org/3/library/atexit.html

Note: The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when os._exit() is called.

According to the note, the atexit function will not be executed when os._exit is called. so we may need another way to trigger DecrefRunner.stop when exit is called.

@aresnow1
Copy link

I cannot reproduce on current master, if there is confirmation from other people, we can close this issue first.

@ChengjieLi28
Copy link
Author

Now this issue has been fixed. Close it now.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working mod: deploy
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants