Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Box PrioritizedDistribution #948

Merged
merged 6 commits into from
Jan 19, 2024
Merged

Box PrioritizedDistribution #948

merged 6 commits into from
Jan 19, 2024

Conversation

konstin
Copy link
Member

@konstin konstin commented Jan 17, 2024

On top of #947, we can also box PrioritizedDistribution.

In a simple benchmark, this seems to slightly improve performance when comparing only this commit to main, even though the benchmark is too noisy to establish significance:

$ hyperfine --warmup 30 --runs 300 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent"
  Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
    Time (mean ± σ):      83.6 ms ±   2.0 ms    [User: 77.7 ms, System: 20.0 ms]
    Range (min … max):    81.4 ms …  98.2 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
    Time (mean ± σ):      80.8 ms ±   2.2 ms    [User: 75.4 ms, System: 19.5 ms]
    Range (min … max):    78.6 ms …  98.6 ms    300 runs

    Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

  Summary
    target/profiling/puffin-dev resolve meine_stadt_transparent ran
      1.03 ± 0.04 times faster than target/profiling/main-dev resolve meine_stadt_transparent

The effect on type sizes however is considerable (downstack PR vs. this PR):

--- branch.txt  2024-01-17 14:26:01.826085176 +0100
+++ boxed-prioritized-dist.txt  2024-01-17 14:25:57.101900963 +0100
@@ -1,19 +1,3 @@
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9264 alloc::collections::btree::node::InternalNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9168 data
-     96 edges
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::version::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
-9168 alloc::collections::btree::node::LeafNode<pep440_rs::Version, distribution_types::PrioritizedDistribution> align=8
-   9064 vals
-     88 keys
-
 8992 tokio::sync::mpsc::block::Block<hyper::client::dispatch::Envelope<http::request::Request<reqwest::async_impl::body::ImplStream>, http::response::Response<hyper::body::body::Body>>> align=8
    8960 values
      32 header
@@ -74,10 +58,23 @@
          40 __tracing_attr_span
      64 variant Unresumed, Returned, Panicked

+5648 {async fn body@crates/puffin-client/src/registry_client.rs:224:5: 224:30} align=8
+   5647 variant Suspend0
+       5576 __awaitee align=8
+         40 __tracing_attr_span

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about putting the Box inside of PrioritizedDistribution? Then consumers don't need to care about it.

@BurntSushi
Copy link
Member

Oh, the other concern I had here was on the boxed futures. It seems like it might be easy for those boxed() calls to disappear accidentally. Maybe just a quick comment on each one to say // reduces stack size or something?

Also, how did you pick the futures to box?

@konstin
Copy link
Member Author

konstin commented Jan 17, 2024

What about putting the Box inside of PrioritizedDistribution? Then consumers don't need to care about it.

Good question, my thinking was that this still allows using the plain PrioritizedDistribution if you don't want the Box without enforcing the allocation?

Also, how did you pick the futures to box?

I went from the largest future to their callees until i arrived at one that was large not due to it's callee but due to variant/data and then boxed this one at all of its callsites. It's not an exact methodology, but it seems to have worked. I'm still having trouble understanding the numbers from top-type-sizes and the exact construction of future sizes.

I had also considered splitting some of the large methods into helper methods for easier debugging and maybe more boxing spots but wanted to avoid the churn.

@BurntSushi
Copy link
Member

Good question, my thinking was that this still allows using the plain PrioritizedDistribution if you don't want the Box without enforcing the allocation?

Right. But I don't think it matters. A PrioritizedDistribution is a huge type with a whole mess of things inside of it, including many allocations. Adding one more is what I think of as "marginal." As in, it's unlikely to make a meaningful difference given its definition today.

I went from the largest future to their callees until i arrived at one that was large not due to it's callee but due to variant/data and then boxed this one at all of its callsites. It's not an exact methodology, but it seems to have worked. I'm still having trouble understanding the numbers from top-type-sizes and the exact construction of future sizes.

Gotya. Seems sensible to me I think. It does seem a little tricky. (And sorry, I added the comment about boxed futures on the wrong PR.)

@konstin konstin force-pushed the konsti/box-prioritized-dist branch 2 times, most recently from 4936fd8 to 9741e2a Compare January 17, 2024 14:21
@konstin konstin force-pushed the konsti/reduce-stack-usage branch from 78f1eba to f5f291d Compare January 18, 2024 11:13
@konstin konstin force-pushed the konsti/box-prioritized-dist branch from 9741e2a to 8191a31 Compare January 18, 2024 11:13
@konstin konstin force-pushed the konsti/reduce-stack-usage branch from f5f291d to 4399843 Compare January 18, 2024 11:46
@konstin konstin force-pushed the konsti/box-prioritized-dist branch from 8191a31 to a0b7a34 Compare January 18, 2024 11:46
@konstin konstin force-pushed the konsti/reduce-stack-usage branch from 4399843 to a61a2ab Compare January 18, 2024 14:28
@konstin konstin force-pushed the konsti/box-prioritized-dist branch from a0b7a34 to e416b1a Compare January 18, 2024 14:28
@konstin konstin force-pushed the konsti/reduce-stack-usage branch from a61a2ab to dbc6bfa Compare January 18, 2024 16:30
@konstin konstin force-pushed the konsti/box-prioritized-dist branch from e416b1a to a1c0f33 Compare January 18, 2024 16:30
`Dist` was standing out when profiling stack sizes with top-type-sizes. Here, we trade an allocation per `Dist` for a more reasonable stack size.
Looking through the stack trace of `allowed_transitive_url_dependency` with a 800KB stack, i found those two to be the major offenders. These changes make `allowed_transitive_url_dependency` pass with a 800KB stack.
@konstin konstin force-pushed the konsti/reduce-stack-usage branch from dbc6bfa to 5cf1bc3 Compare January 19, 2024 09:28
Base automatically changed from konsti/reduce-stack-usage to konsti/set-explicit-thread-size January 19, 2024 09:29
konstin added a commit that referenced this pull request Jan 19, 2024
…e futures (#947)

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
#941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
@konstin konstin force-pushed the konsti/box-prioritized-dist branch from a1c0f33 to 22a6599 Compare January 19, 2024 09:29
Base automatically changed from konsti/set-explicit-thread-size to main January 19, 2024 09:34
konstin added a commit that referenced this pull request Jan 19, 2024
…e futures (#1004)

This is #947 again but this time
merging into main instead of downstack, sorry for the noise.

---

Windows has a default stack size of 1MB, which makes puffin often fail
with stack overflows. The PR reduces stack size by three changes:

* Boxing `File` in `Dist`, reducing the size from 496 to 240.
* Boxing the largest futures.
* Boxing `CachePolicy`

## Method

Debugging happened on linux using
#941 to limit the stack size to
1MB. Used ran the command below.

```
RUSTFLAGS=-Zprint-type-sizes cargo +nightly build -p puffin-cli -j 1 > type-sizes.txt && top-type-sizes -w -s -h 10 < type-sizes.txt > sizes.txt
```

The main drawback is top-type-sizes not saying what the `__awaitee` is,
so it requires manually looking up with a future with matching size.

When the `brotli` features on `reqwest` is active, a lot of brotli types
show up. Toggling this feature however seems to have no effect. I assume
they are false positives since the `brotli` crate has elaborate control
about allocation. The sizes are therefore shown with the feature off.

## Results

The largest future goes from 12208B to 6416B, the largest type
(`PrioritizedDistribution`, see also #948) from 17448B to 9264B. Full
diff: https://gist.github.com/konstin/62635c0d12110a616a1b2bfcde21304f

For the second commit, i iteratively boxed the largest file until the
tests passed, then with an 800KB stack limit looked through the
backtrace of a failing test and added some more boxing.

Quick benchmarking showed no difference:

```console
$ hyperfine --warmup 2 "target/profiling/main-dev resolve meine_stadt_transparent" "target/profiling/puffin-dev resolve meine_stadt_transparent" 
Benchmark 1: target/profiling/main-dev resolve meine_stadt_transparent
  Time (mean ± σ):      49.2 ms ±   3.0 ms    [User: 39.8 ms, System: 24.0 ms]
  Range (min … max):    46.6 ms …  63.0 ms    55 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: target/profiling/puffin-dev resolve meine_stadt_transparent
  Time (mean ± σ):      47.4 ms ±   3.2 ms    [User: 41.3 ms, System: 20.6 ms]
  Range (min … max):    44.6 ms …  60.5 ms    62 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Summary
  target/profiling/puffin-dev resolve meine_stadt_transparent ran
    1.04 ± 0.09 times faster than target/profiling/main-dev resolve meine_stadt_transparent
```
@konstin konstin merged commit cd2fb6f into main Jan 19, 2024
3 checks passed
@konstin konstin deleted the konsti/box-prioritized-dist branch January 19, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants