Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault on linux after upgrading from .NET 8 to .NET 9 #110835

Open
karmeli87 opened this issue Dec 19, 2024 · 6 comments
Open

Segmentation fault on linux after upgrading from .NET 8 to .NET 9 #110835

karmeli87 opened this issue Dec 19, 2024 · 6 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@karmeli87
Copy link

Description

We recently upgraded from .NET 8 to .NET 9 and start to experience crashes in our tests with messages such as

Crashing thread 100588 signal 11 (000b)
xyz@xyz:~$ sudo dmesg --human | grep seg
[Dec 6 10:13] .NET Tiered Com[635942]: segfault at 94 ip 00007f5be24aa1cb sp 00007f5be2dfd120 error 4 in libclrjit.so[7f5be2299000+30d000] likely on CPU 6 (core 8, socket 0)
[Dec 6 10:27] .NET Tiered Com[639858]: segfault at 94 ip 0000702a85eaa1cb sp 00006fe941ffd120 error 4 in libclrjit.so[702a85c99000+30d000] likely on CPU 0 (core 0, socket 0)
[Dec12 16:05] .NET Tiered Com[2869188]: segfault at 94 ip 00007454c50aa1cb sp 00007454c59fd120 error 4 in libclrjit.so[7454c4e99000+30d000] likely on CPU 19 (core 9, socket 0)
[Dec12 16:26] .NET Tiered Com[2876089]: segfault at 94 ip 0000735a0e2aa1cb sp 00007318c3dfd120 error 4 in libclrjit.so[735a0e099000+30d000] likely on CPU 23 (core 14, socket 0)
[Dec12 16:40] .NET Tiered Com[2889344]: segfault at 94 ip 0000711fa66aa1cb sp 0000711f2dbfd120 error 4 in libclrjit.so[711fa6499000+30d000] likely on CPU 5 (core 6, socket 0)
[Dec12 16:43] .NET Tiered Com[2890410]: segfault at 94 ip 00007338f68aa1cb sp 00007338f71fd120 error 4 in libclrjit.so[7338f6699000+30d000] likely on CPU 16 (core 5, socket 0)
[Dec13 05:34] .NET Tiered Com[3137587]: segfault at 94 ip 00007082b2eaa1cb sp 0000704091ffd120 error 4 in libclrjit.so[7082b2c99000+30d000] likely on CPU 1 (core 1, socket 0)
[Dec15 09:09] .NET Tiered Com[3770736]: segfault at 94 ip 000075f78baaa1cb sp 000075f70edfd120 error 4 in libclrjit.so[75f78b899000+30d000] likely on CPU 14 (core 2, socket 0)

We gathered a dump that was analyzed by MS team and they believe it to be a JIT bug (#110769 (comment))

Here is the call stack at the crash:
# Child-SP          RetAddr               Call Site
00 00007626`123eece0 00007626`12a72727     libc_so+0x1107e3
01 00007626`123eed10 00007626`12a73bdb     libcoreclr!PROCCreateCrashDump+0x287 [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2309] 
02 00007626`123eed70 00007626`12a46bfe     libcoreclr!PROCCreateCrashDumpIfEnabled+0xc9b [/__w/1/s/src/coreclr/pal/src/thread/process.cpp @ 2526] 
03 00007626`123eedf0 00007626`12a460a5     libcoreclr!invoke_previous_action+0x10e [/__w/1/s/src/coreclr/pal/src/exception/signal.cpp @ 430] 
04 00007626`123eee30 00007626`12c45320     libcoreclr!sigsegv_handler+0x1d5 [/__w/1/s/src/coreclr/pal/src/exception/signal.cpp @ 678] 
05 00007626`123efac0 00007626`0d2aa1cb     libc_so+0x45320
06 (Inline Function) --------`--------     libclrjit!FlowGraphDominatorTree::IntersectDom+0x205 [/__w/1/s/src/coreclr/jit/flowgraph.cpp @ 6116] 
07 00007626`0dbfd120 00007626`0d2a8a14     libclrjit!FlowGraphDominatorTree::Build+0x2db [/__w/1/s/src/coreclr/jit/jit.h @ 6251] 
08 00007626`0dbfd1c0 00007626`0d0c6e15     libclrjit!Compiler::optSetBlockWeights+0x44 [/__w/1/s/src/coreclr/jit/optimizer.cpp @ 67] 
09 (Inline Function) --------`--------     libclrjit!Phase::Run+0x17 [/__w/1/s/src/coreclr/jit/phase.cpp @ 61] 
0a (Inline Function) --------`--------     libclrjit!DoPhase+0x5d [/__w/1/s/src/coreclr/jit/inline.cpp @ 143] 
0b (Inline Function) --------`--------     libclrjit!Compiler::compCompile+0x3efb [/__w/1/s/src/coreclr/jit/compiler.cpp @ 4984] 
0c (Inline Function) --------`--------     libclrjit!Compiler::compCompileHelper+0x4710 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 7396] 
0d (Inline Function) --------`--------     libclrjit!Compiler::compCompile::$_0::operator()+0x4710 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 6533] 
0e (Inline Function) --------`--------     libclrjit!Compiler::compCompile+0x48c3 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 6552] 
0f (Inline Function) --------`--------     libclrjit!jitNativeCode::$_0::operator()::{lambda(jitNativeCode(CORINFO_METHOD_STRUCT_ *, CORINFO_MODULE_STRUCT_ *, ICorJitInfo *, CORINFO_METHOD_INFO *, void **, unsigned int *, JitFlags *, void *)::$_0::operator()(jitNativeCode(CORINFO_METHOD_STRUCT_ *, CORINFO_MODULE_STRUCT_ *, ICorJitInfo *, CORINFO_METHOD_INFO *, void **, unsigned int *, JitFlags *, void *)::__JITParam *)::__JITParam *)#1}::operator()+0x4e49 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 8036] 
10 (Inline Function) --------`--------     libclrjit!jitNativeCode::$_0::operator()+0x4e65 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 8060] 
11 00007626`0dbfd260 00007626`0d0c1d94     libclrjit!jitNativeCode+0x5045 [/__w/1/s/src/coreclr/jit/compiler.cpp @ 8062] 
12 00007626`0dbff1d0 00007626`12666f86     libclrjit!CILJit::compileMethod+0x84 [/__w/1/s/src/coreclr/jit/ee_il_dll.cpp @ 291] 
13 00007626`0dbff260 00007626`1266717a     libcoreclr!invokeCompileMethodHelper+0xd6 [/__w/1/s/src/coreclr/vm/jitinterface.cpp @ 12477] 
14 00007626`0dbff2c0 00007626`12667d07     libcoreclr!invokeCompileMethod+0x9a [/__w/1/s/src/coreclr/vm/jitinterface.cpp @ 12537] 
15 00007626`0dbff340 00007626`126a261a     libcoreclr!UnsafeJitFunction+0x917 [/__w/1/s/src/coreclr/vm/jitinterface.cpp @ 12982] 
16 00007626`0dbff700 00007626`126a1efe     libcoreclr!MethodDesc::JitCompileCodeLocked+0xfa [/__w/1/s/src/coreclr/vm/prestub.cpp @ 938] 
17 00007626`0dbff7d0 00007626`126a1665     libcoreclr!MethodDesc::JitCompileCodeLockedEventWrapper+0x3be [/__w/1/s/src/coreclr/vm/prestub.cpp @ 818] 
18 00007626`0dbff8d0 00007626`126a102e     libcoreclr!MethodDesc::JitCompileCode+0x255 [/__w/1/s/src/coreclr/vm/prestub.cpp @ 706] 
19 00007626`0dbff990 00007626`126d38e2     libcoreclr!MethodDesc::PrepareILBasedCode+0x2ae [/__w/1/s/src/coreclr/vm/prestub.cpp @ 439] 
1a 00007626`0dbffa20 00007626`126d2ca0     libcoreclr!TieredCompilationManager::CompileCodeVersion+0x102 [/__w/1/s/src/coreclr/vm/tieredcompilation.cpp @ 964] 
1b (Inline Function) --------`--------     libcoreclr!TieredCompilationManager::OptimizeMethod+0x11 [/__w/1/s/src/coreclr/vm/tieredcompilation.cpp @ 935] 
1c 00007626`0dbffb10 00007626`126d2325     libcoreclr!TieredCompilationManager::DoBackgroundWork+0x270 [/crossrootfs/x64/usr/lib/gcc/x86_64-linux-gnu/5/../../../../include/c++/5/type_traits @ 820] 
1d 00007626`0dbffc20 00007626`126d21a8     libcoreclr!TieredCompilationManager::BackgroundWorkerStart+0xf5 [/__w/1/s/src/coreclr/vm/tieredcompilation.cpp @ 533] 
1e 00007626`0dbffc80 00007626`126cea48     libcoreclr!TieredCompilationManager::BackgroundWorkerBootstrapper1+0x68 [/crossrootfs/x64/usr/lib/gcc/x86_64-linux-gnu/5/../../../../include/c++/5/type_traits @ 483] 
1f (Inline Function) --------`--------     libcoreclr!ManagedThreadBase_DispatchInner+0x2 [/__w/1/s/src/coreclr/vm/threads.cpp @ 7110] 
20 (Inline Function) --------`--------     libcoreclr!ManagedThreadBase_DispatchMiddle+0x51 [/__w/1/s/src/coreclr/vm/threads.h @ 7154] 
21 (Inline Function) --------`--------     libcoreclr!<unnamed-class>::operator()+0x51 [/__w/1/s/src/coreclr/vm/threads.h @ 7312] 
22 (Inline Function) --------`--------     libcoreclr!<unnamed-class>::operator()+0xcc [/__w/1/s/src/coreclr/vm/threads.h @ 7314] 
23 00007626`0dbffcc0 00007626`126ceefd     libcoreclr!ManagedThreadBase_DispatchOuter+0x158 [/__w/1/s/src/coreclr/vm/threads.h @ 7338] 
24 (Inline Function) --------`--------     libcoreclr!ManagedThreadBase_FullTransition+0x18 [/__w/1/s/src/coreclr/vm/threads.cpp @ 7358] 
25 00007626`0dbffdd0 00007626`126d20d0     libcoreclr!ManagedThreadBase::KickOff+0x2d [/__w/1/s/src/coreclr/vm/threads.cpp @ 7394] 
26 00007626`0dbffe00 00007626`12a7533e     libcoreclr!TieredCompilationManager::BackgroundWorkerBootstrapper0+0x20 [/crossrootfs/x64/usr/lib/gcc/x86_64-linux-gnu/5/../../../../include/c++/5/type_traits @ 465] 
27 00007626`0dbffe20 00007626`12c9ca94     libcoreclr!CorUnix::CPalThread::ThreadEntry+0x1fe [/crossrootfs/x64/usr/include/x86_64-linux-gnu/sys/types.h @ 1747] 
28 00007626`0dbffed0 00007626`12d29c3c     libc_so+0x9ca94
29 00007626`0dbfff80 ffffffff`ffffffff     libc_so+0x129c3

It crashes at the last line of the code listed below because the finger2 is NULL:

BasicBlock* FlowGraphDominatorTree::IntersectDom(BasicBlock* finger1, BasicBlock* finger2)
{
    assert((finger1 != nullptr) && (finger2 != nullptr));

    while (finger1 != finger2)
    {
        while (finger1->bbPostorderNum < finger2->bbPostorderNum)
        {
            finger1 = finger1->bbIDom;
            assert(finger1 != nullptr);
        }
        while (finger2->bbPostorderNum < finger1->bbPostorderNum)
        {
            finger2 = finger2->bbIDom;

I have gathered another dump with those env variables

DOTNET_JitEnableInductionVariableOpts=0

DOTNET_StressLog=1 
DOTNET_LogLevel=7 
DOTNET_LogFacility=80103 
DOTNET_StressLogSize=2000000 
DOTNET_TotalStressLogSize=40000000

to rule out hitting an already known issue #109981 with the suggested workaround (#109981 (comment))

Reproduction Steps

No minimal repro yet, but it happened all the time when we run our test suit on linux

Expected behavior

not to crash

Actual behavior

crashing

Regression?

Yes,
It was working fine on .NET 8

Known Workarounds

No response

Configuration

No response

Other information

No response

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 19, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Dec 19, 2024
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jakobbotsch
Copy link
Member

Thanks for the report! It is most likely still the same underlying issue as #109981, just manifesting in a different spot/under different conditions. Sadly there is no simple workaround in this case given the stack trace you shared, beyond disabling optimizations for the affected method.
Do you have the ability to test your application with a nightly build of .NET 10? (If so, you can get it here: https://github.com/dotnet/sdk/blob/main/documentation/package-table.md)

It might also be possible for me to build a .NET 9 version of the JIT that includes the fix, which you would then be able to test with by replacing libclrjit.so in your application folder with that version. Let me know if you're interested in that (and if so, what Linux version you are deploying on).

@karmeli87
Copy link
Author

karmeli87 commented Dec 19, 2024

Thanks for looking into that :-)

If you can provide us the custom build of .NET 9 that would be the easiest for us.

We are testing on Ubuntu 24.04.1 LTS

@jkotas
Copy link
Member

jkotas commented Dec 19, 2024

You may be able to workaround this by disabling TieredPGO (set DOTNET_TieredPGO=0 environment variable or <TieredPGO>false</TieredPGO> property in your .csproj file).

https://learn.microsoft.com/en-us/dotnet/core/runtime-config/compilation#profile-guided-optimization

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Dec 19, 2024
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Dec 19, 2024
@jakobbotsch
Copy link
Member

@karmeli87 I have attached a libclrjit.so that includes the fix for #109981 to your issue at https://developercommunity.visualstudio.com/t/Access-Violation-Exception-after-upgradi/10814326#T-N10815576. Can you please try your app with this JIT and see if it helps?

@karmeli87
Copy link
Author

Thanks! we will try it and let you know if it helps.

I can confirm that setting DOTNET_TieredPGO=0 seems to elevate the constant segment faults that we had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants