Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures #1004

konstin · 2024-01-19T09:36:41Z

This is #947 again but this time merging into main instead of downstack, sorry for the noise.

Windows has a default stack size of 1MB, which makes puffin often fail with stack overflows. The PR reduces stack size by three changes:

Boxing File in Dist, reducing the size from 496 to 240.
Boxing the largest futures.
Boxing CachePolicy

Method

Debugging happened on linux using #941 to limit the stack size to 1MB. Used ran the command below.

RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt

The main drawback is top-type-sizes not saying what the __awaitee is, so it requires manually looking up with a future with matching size.

When the brotli features on reqwest is active, a lot of brotli types show up. Toggling this feature however seems to have no effect. I assume they are false positives since the brotli crate has elaborate control about allocation. The sizes are therefore shown with the feature off.

Results

The largest future goes from 12208B to 6416B, the largest type (PrioritizedDistribution, see also #948) from 17448B to 9264B. Full diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the tests passed, then with an 800KB stack limit looked through the backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent

`Dist` was standing out when profiling stack sizes with top-type-sizes. Here, we trade an allocation per `Dist` for a more reasonable stack size.

Looking through the stack trace of `allowed_transitive_url_dependency` with a 800KB stack, i found those two to be the major offenders. These changes make `allowed_transitive_url_dependency` pass with a 800KB stack.

konstin added 4 commits January 19, 2024 10:35

Shrink Dist from 496 bytes to 240 by boxing File

20bf945

`Dist` was standing out when profiling stack sizes with top-type-sizes. Here, we trade an allocation per `Dist` for a more reasonable stack size.

Box large futures to reduce stack sizes

046fc8e

Enforce only upper bound on Dist size

e070754

Box cached client callbacks and CachePolicy

860da87

Looking through the stack trace of `allowed_transitive_url_dependency` with a 800KB stack, i found those two to be the major offenders. These changes make `allowed_transitive_url_dependency` pass with a 800KB stack.

konstin added the windows Specific to the Windows platform label Jan 19, 2024

konstin enabled auto-merge (squash) January 19, 2024 09:37

konstin merged commit 47fc90d into main Jan 19, 2024
3 checks passed

konstin deleted the konsti/reduce-stack-usage branch January 19, 2024 09:38

konstin mentioned this pull request Jan 19, 2024

Revert "Reduce stack usage by boxing File in Dist, CachePolicy and large futures" #1003

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures #1004

Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures #1004

konstin commented Jan 19, 2024

Reduce stack usage by boxing File in Dist, CachePolicy and large futures #1004

Reduce stack usage by boxing File in Dist, CachePolicy and large futures #1004

Conversation

konstin commented Jan 19, 2024

Method

Results

Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures #1004

Reduce stack usage by boxing `File` in `Dist`, `CachePolicy` and large futures #1004