Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corretto 21 hard crash within Docker container on Apple M4 / macOS 15.2 Processors (aarch64) #85

Open
dylanneild opened this issue Dec 11, 2024 · 15 comments

Comments

@dylanneild
Copy link

dylanneild commented Dec 11, 2024

Describe the bug

Cannot launch Corretto 21 "java" process within a Docker container running on an Apple M4 processor, running macOS 15.2.

To Reproduce

  1. `$ docker exec --rm -ti ubuntu:24.04 bash"
  2. # apt -y update
  3. # apt install -y wget
  4. # wget -P / https://corretto.aws/downloads/latest/amazon-corretto-21-aarch64-linux-jdk.deb
  5. # apt-get install -y /amazon-corretto-21-aarch64-linux-jdk.deb
  6. # java

Output as follows:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000ffff97d3fc5c, pid=2847, tid=2848
#
# JRE version:  (21.0.5+11) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# j  java.lang.System.registerNatives()V+0 [email protected]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# //hs_err_pid2847.log
[0.021s][warning][os] Loading hsdis library failed
#
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted

Note that identical behaviour occurs with Corretto 23; so something appears to be completely broken on macOS 15.2/Docker/aarch64 right now.

Expected behavior

Fairly clear; running java in any capacity should work.

Screenshots

image

Platform information

OS: macOS 15.2 running Docker 
Version 21.0.5+11-LTS

Additional context

Log data attached:

hs_err_pid2866.log.zip

@dylanneild
Copy link
Author

Note that there are no issues running Corretto 21 (or 8, 11, 17, etc) on macOS 15.2 natively (not in the Docker/Linux/aarch64 environment).

$ java -version
openjdk version "21.0.5" 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-21.0.5.11.1 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.5.11.1 (build 21.0.5+11-LTS, mixed mode, sharing)

$ uname -a
Darwin x.local 24.2.0 Darwin Kernel Version 24.2.0: Thu Dec  5 18:26:03 PST 2024; root:xnu-11215.61.4~2/RELEASE_ARM64_T6041 arm64

So this may be a Docker kernel issue?

@benty-amzn
Copy link
Contributor

benty-amzn commented Dec 11, 2024

Hi, thanks for contacting us about this. Based on the crash log, I suspect this is https://bugs.openjdk.org/browse/JDK-8345296.

Do you have a consistent reproducer for this issue? If so, could you test with a nightly build of tip from https://downloads.corretto.aws/#/downloads?build=nightly&version=tip to see if the issue is resolved?

@benty-amzn benty-amzn reopened this Dec 11, 2024
@dylanneild
Copy link
Author

Hi @benty-amzn - per the description, the consistent reproducer is just $ java; causes an immediate crash. Zero functionality is working.

Will test a nightly now.

@benty-amzn
Copy link
Contributor

benty-amzn commented Dec 11, 2024

It seems either macos 15.2 or m4 processor is a key component here, fails to reproduce for me on 15.1 with m3.
Separately, it doesn't look like we have any new enough builds for generic_linux aarch64 which contain the fix, due to https://bugs.openjdk.org/browse/JDK-8343751.

I'll see if I can find another way to test this.

@dylanneild
Copy link
Author

dylanneild commented Dec 11, 2024

@benty-amzn Latest nightly from your link on Alpine (latest):

# bin/java -version
OpenJDK 64-Bit Server VM warning: Unable to get SVE vector length on this system. Disabling SVE. Specify -XX:UseSVE=0 to shun this warning.
openjdk version "25" 2024-12-11
OpenJDK Runtime Environment Corretto-25.0.0.1.1 (build 25+1-Nightly)
OpenJDK 64-Bit Server VM Corretto-25.0.0.1.1 (build 25+1-Nightly, mixed mode, sharing)

So seems to start without issue.

For completeness, here's the current 21-LTS on Alpine:

~/amazon-corretto-21.0.5.11.1-alpine-linux-aarch64/bin # ./java
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000ffff94267c5c, pid=134, tid=135
#
# JRE version:  (21.0.5+11) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# j  java.lang.System.registerNatives()V+0 [email protected]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /root/amazon-corretto-21.0.5.11.1-alpine-linux-aarch64/bin/hs_err_pid134.log
[0.017s][warning][os] Loading hsdis library failed
#
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted

So, identical issue to running on Debian/Ubuntu, etc.

Conclusion: Nightly works, current stables do not (including the LTS downloads, which is problematic).

# cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.21.0
PRETTY_NAME="Alpine Linux v3.21"
HOME_URL="https://alpinelinux.org/"
BUG_REPORT_URL="https://gitlab.alpinelinux.org/alpine/aports/-/issues"

# uname -a
Linux e72b578b67c7 6.10.14-linuxkit #1 SMP Thu Oct 24 19:28:55 UTC 2024 aarch64 Linux

@dylanneild
Copy link
Author

It seems either macos 15.2 or m4 processor is a key component here, fails to reproduce for me on 15.1 with m3. Separately, it doesn't look like we have any new enough builds for generic_linux aarch64 which contain the fix, due to https://bugs.openjdk.org/browse/JDK-8343751.

I had these Docker images running on the same test machine on 15.1 with an M4 Pro this morning.

Looks like 15.2 is the issue.

@benty-amzn
Copy link
Contributor

Resolved by #84 if this is the same bug.

I'm working on getting macos 15.2 installed for testing at the moment.
Could you confirm if the issue still reproduces with -XX:UseSVE=0?

@benty-amzn
Copy link
Contributor

This issue does not reproduce for me with m3 hardware and macos 15.2.

Could you check if it reproduces with a fastdebug build of Corretto 21 from https://downloads.corretto.aws/#/downloads?build=nightly&version=21 and attach the hs_err?

@acradu
Copy link

acradu commented Dec 12, 2024

Hello, I could not reproduce on my M4 Pro with macOS 15.1.1, but after upgrading to 15.2, I can reproduce (in docker only, native seems ok) with release and fastdebug versions (docker_hs_err.zip)

Host:

$ uname -a
Darwin Minithor.local 24.2.0 Darwin Kernel Version 24.2.0: Fri Dec  6 19:03:40 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6041 arm64

Container:

# uname -a
Linux 4f5ac2a15cca 6.10.14-linuxkit #1 SMP Thu Oct 24 19:28:55 UTC 2024 aarch64 aarch64 aarch64 GNU/Linux

Edit:
Seems to work with SVE disabled:

# java -XX:UseSVE=0 -version
openjdk version "21.0.5" 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-21.0.5.11.1 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.5.11.1 (build 21.0.5+11-LTS, mixed mode, sharing)```

@dylanneild
Copy link
Author

@benty-amzn @acradu Seeing the same thing. The fastdebug fails. All versions work fine in native macOS / aarch64; it's just Docker/Linux on the M4 that fails.

I can also confirm that 15.2 on my partner's M3 Pro works fine. Seems like an M4 / 15.2 / Docker problem that didn't affect M4 / 15.1.x / Docker.

I'll report back on -XX:UseSVE=0 for launching my test workloads, but it sounds like patching that into my gradlew and launch scripts is a temporary fix until a code level fix gets potentially back-ported from the 23 nightly.

Certainly a better approach than running in AMD64 / Rosetta 2 emulation mode.

@benty-amzn
Copy link
Contributor

We've now been able to reproduce on m4 hardware running macos 15.2. Confirmed that the issue did not replicate with 15.1.1 before the update, and does after. Does not manifest on m1 or m3 hardware, we have not been able to test m2.

The issue appears to impact openjdk distributions version 21 and 23, and is resolved by JDK-8345296 / #84, or the workaround described on the JBS issue of -XX:UseSVE=0.

@dylanneild
Copy link
Author

Awesome that it can be reproduced and that #84 is the fix.

For confirmation:

# java -XX:UseSVE=0 --version
openjdk 21.0.5 2024-10-15 LTS
OpenJDK Runtime Environment Corretto-21.0.5.11.1 (build 21.0.5+11-LTS)
OpenJDK 64-Bit Server VM Corretto-21.0.5.11.1 (build 21.0.5+11-LTS, mixed mode, sharing)

# uname -a
Linux 404d95a8a26b 6.10.14-linuxkit #1 SMP Thu Oct 24 19:28:55 UTC 2024 aarch64 GNU/Linux

From:

$ uname -a
Darwin xxx-M4.local 24.2.0 Darwin Kernel Version 24.2.0: Fri Dec  6 19:03:40 PST 2024; root:xnu-11215.61.5~2/RELEASE_ARM64_T6041 arm64

I can adapt my Docker staging scripts to add -XX:UseSVE=0 for now and test. I can't imagine SVE having any major hit on my production code, performance wise, though how it's used in the JVM itself is well outside my sphere of knowledge.

@dylanneild
Copy link
Author

Confirmed that my present workaround that works is adding -XX:UseSVE=0 to my gradlew processor for in-container compilation, than also to the JDK invocation to run my workload. My code is presently running fine with these changes on M4/15.2/Docker.

@syscl
Copy link

syscl commented Dec 20, 2024

Hi we saw this happens on 15.2 M4*, M3, M2 does not have this issue. This issues happens on JDK/JRE 18-23, JDK 17 does not have this issue:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGILL (0x4) at pc=0x0000ffffa633fc5c, pid=9, tid=10
#
# JRE version:  (21.0.5+11) (build )
# Java VM: OpenJDK 64-Bit Server VM (21.0.5+11-LTS, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64)
# Problematic frame:
# j  java.lang.System.registerNatives()V+0 [email protected]
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /ingestion/hs_err_pid9.log
[0.075s][warning][os] Loading hsdis library failed
#
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Aborted

@Benji41
Copy link

Benji41 commented Dec 27, 2024

On Mac Os 15.2 24C101, M4 Processor, Docker Desktop 4.37.1 (178610) and gradle:latest im able to reproduce it too(btw i know it's another image), and the -XX:UseSVE=0 didn't help.

Zhykos added a commit to Zhykos/cool-tools that referenced this issue Dec 28, 2024
UserAPI is ok + Workaround for M4, Java 21, MacOS 15.2 bug: corretto/corretto-21#85
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants