Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safer stackallocs #110864

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Safer stackallocs #110864

wants to merge 9 commits into from

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Dec 20, 2024

First batch of changes around various stackalloc expressions in BCL.

The rules are:

  1. Make sure unbound stackallocs have upper (and lower!) bound checks. Add (uint) casts to handle negative/overflows as well (I've checked all places that it's only applied to int, no long, etc.). In many places it's redundant, but it shouldn't affect the performance and reduce number of false-positives reported by automated tooling.
    Also, add Debug.Assert to make it more clear.
    2) For patterns like len > CONST : stackalloc T[CONST] : new T[len] we change stackalloc T[CONST] to stackalloc T[len] if that CONST is >= 1024 bytes in order to consume less stack. It shouldn't hurt performance since our libs are compiled with SkipLocalsInit (although, a small overhead still there, so we leave it as is for small const sizes).
    UPD: although, maybe we should do it only when it's saved to Span to make sure nobody relies on some minimal size..
  2. Flag suspiciously large allocations
  3. Change unmanaged pointers to Spans on the left side of the stackalloc expression (this PR doesn't do it much).

Closes #110843

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Dec 20, 2024
…k-struct

# Conflicts:
#	src/libraries/System.Private.CoreLib/src/System/Globalization/CalendarData.Browser.cs
@vcsjones
Copy link
Member

change stackalloc T[CONST] to stackalloc T[len] if that CONST is >= 512 bytes in order to consume less stack.

I don't know how much of a win it is to use less stack,but in general I think it is preferable to keep a const in for the stack size. It makes it much easier to reason about "how much is going to be stack allocated" during code review and static analysis.

cc @GrabYourPitchforks Since we were just chatting about this.

@EgorBo
Copy link
Member Author

EgorBo commented Dec 20, 2024

change stackalloc T[CONST] to stackalloc T[len] if that CONST is >= 512 bytes in order to consume less stack.

I don't know how much of a win it is to use less stack,but in general I think it is preferable to keep a const in for the stack size. It makes it much easier to reason about "how much is going to be stack allocated" during code review and static analysis.

cc @GrabYourPitchforks Since we were just chatting about this.

I don't have a strong preference here, but in some cases it's possible that e.g. len is 4, but we still unconditionally allocate 1024 bytes for it

@EgorBo
Copy link
Member Author

EgorBo commented Dec 20, 2024

Related discussion: #97895

@vcsjones
Copy link
Member

vcsjones commented Dec 20, 2024

#110843 is still not fully fixed. It changes a potential stack overflow in to an access violation. This computation is done unchecked:

uint CounterSetInfoSize = (uint)sizeof(Interop.PerfCounter.PerfCounterSetInfoStruct)
+ (uint)_idToCounter.Count * (uint)sizeof(Interop.PerfCounter.PerfCounterInfoStruct);

Since it is unsigned and unchecked, can overflow to a small positive value

When we try to access memory here:

CounterInfo = (Interop.PerfCounter.PerfCounterInfoStruct*)(CounterSetBuffer + CounterSetInfoUsed);

The address will be invalid.

Can be reproduced on this PR with

[Fact]
public static void BigNumberOfCounters()
{
    CounterSet counterSet = new(Guid.NewGuid(), Guid.NewGuid(), CounterSetInstanceType.Single);

    for (int i = 0; i <= 134_217_726; i++)
    {
        counterSet.AddCounter(i, CounterType.ElapsedTime);
    }

    counterSet.CreateCounterSetInstance("potato");
}

@EgorBo
Copy link
Member Author

EgorBo commented Dec 20, 2024

#110843 is still not fully fixed. It changes a potential stack overflow in to an access violation. This computation is done unchecked:

Thanks. Just wrapping the compution into checked context should be enough? There are many patterns where stackalloc size is computed like that, presumably, we should checked them all?

@vcsjones
Copy link
Member

Just wrapping the computation into checked context should be enough?

Seems reasonable.

There are many patterns where stackalloc size is computed like that, presumably, we should checked them all?

I don't see a problem with it. Many will probably be unnecessary, but I don't think it introduces any overhead that is problematic.

@jkotas
Copy link
Member

jkotas commented Dec 20, 2024

For patterns like len > CONST : stackalloc T[CONST] : new T[len] we change stackalloc T[CONST] to stackalloc T[len] if that CONST is >= 1024 bytes in order to consume less stack.

Hardcoded ad-hoc policies like this are never going be "right". The proper fix would be an API like #52065 so that one can just ask for a scratch buffer and leave the choice of strategy to the system.

@tannergooding
Copy link
Member

I don't have a strong preference here, but in some cases it's possible that e.g. len is 4, but we still unconditionally allocate 1024 bytes for it

I think this is a case where we should be ensuring our code is correctly tuned.

If we know the vast majority of cases will be <= n bytes then that's a much better threshold to use than the default max threshold (which is generally 1024, which comes from the typical threshold used by native malloca implementations). If we have effectively two thresholds to handle (most are small, some are slightly larger), then I think that doing 2 explicit stackalloc thresholds are better (if (size <= 64) { stackalloc byte[64]; } else if (size <= 1024) { stackalloc byte[1024]; } else { RentOrAllocateNewArray; }).

Dynamic stackalloc is also "expensive", it introduces a loop that has to probe pages. All our usages will be a single page probe, which means that will be mispredicted by the CPU in the default scenario, incurring an approx 20 cycle penalty. Since stackalloc should not be getting used in known recursive functions or directly in loops, it is unlikely the predictor will ever get that trained correctly and this will be a "permanent" cost. Dynamic stackalloc also means the JIT cannot easily optimize around it, like it currently does with some small fixed allocation sizes.

I also don't think the JIT implicitly doing things to the primitive stackalloc pattern is goodness, it makes code that is likely being used in lowlevel/tuned scenarios less predictable, meaning it can't be used for it's intended scenario. I'd rather instead see us invest in actually exposing our own StackallocOrRent helper that can then do such optimizations itself, giving users the option of convenience vs control.


Notably dynamic length stackallocs are often considered dangerous as well. The pattern we're using here is safe (relatively speaking), but as the pattern gets copied around and used its also easy for people to misunderstand and do the wrong thing. Since we're trying to reduce unsafe code, I think being explicit and adding documentation around these areas and why certain thresholds are used/correct is a better investment; as is pushing for the helper intrinsic that generally simplifies the pattern and abstracts away the "right" sizes to use so that it can be correctly tuned, potentially participate in PGO, etc.

@EgorBo
Copy link
Member Author

EgorBo commented Dec 20, 2024

Reverted that part. Added checked everywhere where length is calculated near the stackalloc.
Happy to replace all places with a helper call, but seems like it won't be there soon.

@@ -158,7 +158,7 @@ internal static unsafe bool EvpAeadCipherFinalEx(
notNullOutput = (stackalloc byte[1]).Slice(1);
}

fixed (byte* pOutput = &MemoryMarshal.GetReference(notNullOutput))
fixed (byte* pOutput = notNullOutput)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is introducing a bug. It looks like the code used MemoryMarshal.GetReference to avoid normalization of empty spans to null. You would have to also delete .Slice(1) above to make this change work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@teo-tsirpanis teo-tsirpanis added area-Meta and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Dec 20, 2024
@EgorBo EgorBo marked this pull request as ready for review December 20, 2024 21:13
@EgorBo
Copy link
Member Author

EgorBo commented Dec 20, 2024

This is ready for review. Here is the code analyzer I used to find all unbound stackallocs: https://github.com/EgorBo/UnsafeCodeAnalyzer/blob/main/src/Analyzer/UnboundStackallocAnalyzer.cs (reports 263 places in Libraries and Corelib excluding tests)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CounterSet.CreateCounterSetInstance can stack overflow with excessive counters
6 participants