Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix: solve the problem that the worker process cannot exit in time #70

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

weeweetan
Copy link

If other modules have registered some timers with long intervals, and cancelable is set to 1, after reload, ngx_event_timer_rbtree.root != ngx_event_timer_rbtree.sentinel will always be true and the poll event must be added to the timer every time. I think There is no need to wait for the timer tree to clear. When the worker exits, there should be no more events registered.

@WalterRan
Copy link
Contributor

Thanks for your contribution, we will confirm this issue as soon as possible.

@WalterRan
Copy link
Contributor

Hi weeweetan,

There is a test scenario that will fail after using your patch - Nginx reload with new connections creating.

Here are my testing steps

  1. Start Nginx
  2. Start a continues handshake test with the command ab -n 1000000000 -c 100 https://127.0.0.1/
  3. Trigger Nginx reload with command ./nginx -c pwd/nginx.conf -s reload
    The workers cannot exit while reloading and keeping in status of - nginx: worker process is shutting down

@WalterRan
Copy link
Contributor

Because when checking timers in function ngx_event_no_timers_left
The handshake event cannot exit with the flag cancelable set to 0
(gdb) p ev->cancelable
$1 = 0
(gdb) p *ev
$2 = {data = 0x7faa07a01110, write = 0, async = 1, accept = 0, instance = 0, active = 1, disabled = 0, ready = 0, oneshot = 0, complete = 0, eof = 0, error = 0, timedout = 1, timer_set
= 1, delayed = 0, deferred_accept = 0, pending_eof = 0, posted = 0, closed = 0, channel = 0, resolver = 0, cancelable = 0, available = 0, handler = 0x55bdb9d885f0 <ngx_ssl_handshake_a
sync_handler>, saved_handler = 0x0, index = 3503345872, log = 0x55bdbbc6ce30, timer = {key = 2523268241, left = 0x55bdba0211e0 <ngx_event_timer_sentinel>, right = 0x7faa05b836c0, paren
t = 0x7faa05b83790, color = 0 '\000', data = 0 '\000'}, queue = {prev = 0x0, next = 0x0}}

@weeweetan
Copy link
Author

Hello, is the shutting down status never exited? I tested it here. After reloading, it completely exited in about 1 minute. According to the gdb information you provided, the shutting down worker is still processing connections. I think this is in line with expectations because reload exits gracefully and the worker will wait for all request processing to be completed.

@WalterRan
Copy link
Contributor

Hi, I tested this case because I found this issue raised several years ago, that is why we added the code.
This is not guaranteed to happen. It is very likely to happen after multiple tests.
The origin worker cannot exit in my test, because the cancelable flag in handshake event is 0.
Do you mean the worker will exit in 1 minutes? that's really strange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants