-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interrupting npm install child process due to timeouts leaves orphaned npm processes #16197
Comments
In short, the problem appears to be that when an I observed that We also need to be aware that this could potentially impact different managers in different ways. |
does it happen on our oss docker images or on ws images? |
It was observed on a WS which was built using buildpack v2. When the problem occurred we could observe that |
working on a reproduction |
minimal reproduction: https://github.com/nabeelys/renovate-16197-orphaned-processes 2 consecutive runs resulting in 2
|
We should probably use https://azimi.me/2014/12/31/kill-child_process-node-js.html solution in renovate |
We are exploring multiple options, the main problem is |
What if we narrow the scope to environments using buildpack? Or perhaps we could expand wider to even say containerized environments with Renovate as the exclusive task? Because when we run within a container, and we're the only application, the process list is pretty short. At the end of each child process we could essentially just check for "processes which shouldn't be there" and kill them all (e.g. PIDs above a certain number). |
We don't need to narrow the scope to specific environments. the general approach is replacing:
with:
after all that being said, since switching to |
Switching from exec to spawn is a pretty big change. We have considered it before so that we can stream logs as we go, and found some challenges. Why not just keep it as it is, and when we catch an err we kill all child processes such as the npm causing the problem? |
first I'm going to test the fix using In addition, using |
I thought the problem was that the child process we create gets successfully killed, but all it's children are not. So then your challenge is identifying which are it's children. |
if we use detach, we can kill the whole process group. pretty easy. we just need to follow the Blogpost linked above |
@viceice ..and also switch from exec to spawn, and implement our own timeout? |
yes |
renovate calls
After the timeout is reached PID 406 is killed, but the sub-tree remains (this is a behavior related to
if we for example change the shell to bash (using
|
Should we start with simply adding Switching to
|
bash can only be used on linux, we should use default on windows/ macos |
but setting she'll seems a good idea |
Assigning to @Gabriel-Ladzaretti , already have the POC code implemented, will open PR as soon as he finishes. |
Correction to:
|
#16414 (comment) contains a POC for steps 1, 2 & 3 from #16197 (comment), while the PR itself is a refactor only. |
Let's try. Can it be controllable via config? e.g. env variable? RENOVATE_X_ for experimental |
shouldn't be a problem, env variable only or a full config option? |
Env variable opt in to begin with |
How are you running Renovate?
Self-hosted
If you're self-hosting Renovate, tell us what version of Renovate you run.
32.94.0
Please select which platform you are using if self-hosting.
github.com
If you're self-hosting Renovate, tell us what version of the platform you run.
No response
Was this something which used to work for you, and then stopped?
I never saw this working
Describe the bug
Renovate has a default
executationTimeout
of15
, i.e. 15 minutes. WhenbinaryMode=install
this means that the Nodechild_process
interrupts theexec
call after 15 minutes if it hasn't returned gracefully and we log arawExec err
. This seems to work, but we found that in a normal Renovatebuildpack
environment (i.e. ubuntu base image, running as a Docker container) then the underlyingnpm
process remains. Worse, it does not eventually finish - instead it uses up CPU forever until the image is restarted. Over time, multiple orphanednpm
processes can accumulate within the same container until eventually CPU hits maximum.Relevant debug logs
Relevant logging was done using system commands like
ps
andtop
, I do not have a copy of those to share.Have you created a minimal reproduction repository?
No reproduction repository
The text was updated successfully, but these errors were encountered: