-
-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jdk11u Alpine linux build failure: gpg keyserver timeout #3518
Comments
PR created here to extend the timeout. If this failure is a simple reaction to a higher server load causing our requests to take too long, this could give us the necessary tolerance for that slowness. |
Hmm we're getting quite a bit of variance in the workspace prep phase in those build jobs overall:
|
@adamfarley Can you take a look at https://ci.adoptium.net/job/build-scripts-pr-tester/job/build-test/job/jobs/job/jdk8u/job/jdk8u-alpine-linux-x64-temurin/139/console which was a test on one of my PRs. This run did happen with your timeout PR fix in it (you can see the output from
|
Wondering if this is a firewall issue? can a different port be used? |
The man page for gpg says that "The keyserver hkp://keys.gnupg.net uses round robin DNS to give a different keyserver each time you use it.". So perhaps we could rerun the gpg command in the event of a timeout, using that gnupg keyserver to prevent us from rerunning on the same, overburdened(?) keyserver. @sxa - What do you think? |
Ok, have added some code to run the gpg command up to 10 times in the event of a failure, using the hkp keyserver I mentioned in the previous comment so the build doesn't fail if we get an overburdened keyserver. PR: #3544 |
Have you seen anything suggesting that it is due to an overburdened keyserver and not our machines i.e. does it happen when the load of the build machine is low instead of very high as in my earlier comment? We should have more data points now that we've added the
A fire wall should reject the connection immediately instead of timing out, so I would suggest that is unlikely, especially if it's limited to our Alpine environments as is showing variations in timing. |
Doesn't look like it. Current data points:
So the failure happens on high and low uptimes.
Not sure. Will take a look. Update: Haven't found any failures on other platforms, even the ones that run on the same machine (x64 Linux). Maybe the failures are related to the alpine container differences (networking, different gpg versions, etc)? I'm also starting some "weekly pipeline" grinders with flawed javaopts (so they die if they get to the config step), so we try to brute-force a reproduction while using debug options in gpg and the surrounding code. Grinder link |
OK that's useful - doesn't seem load-related then ... But also doesn't make sense that it would be just down to that platform - and we build others in docker containers ... And that includes Alpine on aarch64 now. |
I've modified the new PR to rerun the command 10 times with intervals. The previous attempted fixes were the hkp://keys.gnupg.net keyserver (which didn't work on all platforms for some reason), and the "array of keyservers". These have been rejected due to their changes to our security profile, and also the apparent imminent retirement of the SKS network. |
Also, one theory is that this is an Alipine-specific issue which is down to timeout settings in their networking setup, as gpg may simply be trying to parse a return code given to us by the os function it's calling. Plus, it seems to be exactly 70 seconds between an execution and a timeout, and I'm not seeing anything in the gpg setup that happens after 70 seconds. |
Also also, I ran a job a few days back to try and reproduce the bug by rerunning the recv-keys command over and over again. It didn't work, sadly. Ran for 8 hours without any errors, and got killed due to the extra-long job. |
https://ci.adoptium.net/job/build-scripts/job/jobs/job/jdk11u/job/jdk11u-alpine-linux-x64-temurin/277/console
The text was updated successfully, but these errors were encountered: