Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve folds over bits for IntSet #1079

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

meooow25
Copy link
Contributor

@meooow25 meooow25 commented Dec 19, 2024

  • Make foldr and foldl short-circuit instead of lazily accumulating thunks.
  • Switch to a non-empty style to avoid unnecessary comparisons. This also helps GHC with arity-analysis (somehow), which greatly improves performance of CPS-style foldr and foldl.
  • Change the bitwise operations used from bitmask = bm .&. -bm; bi = ctz bitmask; bm' = bm `xor` bitmask to bi = ctz bm; bm' = bm .&. (bm-1) which is slightly faster.

Benchmarks with GHC 9.10.1

Name                          Time - - - - - - - -    Allocated - - - - -
                                   A       B     %         A       B     %
dense.foldMap_elem            5.6 μs  5.3 μs   -4%      0 B     0 B
dense.foldMap_traverseSum     6.5 μs  4.5 μs  -31%    1.0 KB  1.0 KB   +0%
dense.foldl'_sum              5.6 μs  4.3 μs  -22%      0 B     0 B
dense.foldl_cpsOneShotSum      19 μs  5.7 μs  -70%    194 KB  2.5 KB  -98%
dense.foldl_cpsSum             20 μs  6.2 μs  -69%    162 KB  5.0 KB  -96%
dense.foldl_elem               12 μs  5.7 μs  -53%    162 KB  2.0 KB  -98%
dense.foldl_traverseSum        20 μs  5.9 μs  -70%    162 KB  5.0 KB  -96%
dense.foldr'_sum              5.8 μs  5.5 μs   -6%      0 B     0 B
dense.foldr_cpsOneShotSum      20 μs  5.6 μs  -72%    194 KB  2.5 KB  -98%
dense.foldr_cpsSum             21 μs  5.9 μs  -71%    162 KB  5.0 KB  -96%
dense.foldr_elem               12 μs  4.8 μs  -60%    162 KB  2.0 KB  -98%
dense.foldr_traverseSum        21 μs  4.9 μs  -76%    162 KB  5.0 KB  -96%
sparse.foldMap_elem            19 μs   18 μs   -5%      0 B     0 B
sparse.foldMap_traverseSum     21 μs   20 μs   -5%     64 KB   64 KB   +0%
sparse.foldl'_sum              16 μs   14 μs  -14%      0 B     0 B
sparse.foldl_cpsOneShotSum     58 μs   38 μs  -34%    319 KB  160 KB  -49%
sparse.foldl_cpsSum            58 μs   60 μs   +4%    287 KB  320 KB  +11%
sparse.foldl_elem              27 μs   29 μs   +6%    287 KB  128 KB  -55%
sparse.foldl_traverseSum       58 μs   55 μs   -5%    287 KB  320 KB  +11%
sparse.foldr'_sum              27 μs   24 μs  -10%      0 B     0 B
sparse.foldr_cpsOneShotSum     66 μs   29 μs  -55%    319 KB  160 KB  -49%
sparse.foldr_cpsSum            65 μs   50 μs  -22%    287 KB  320 KB  +11%
sparse.foldr_elem              37 μs   20 μs  -47%    287 KB  128 KB  -55%
sparse.foldr_traverseSum       65 μs   44 μs  -32%    287 KB  320 KB  +11%

Fixes #666

@meooow25
Copy link
Contributor Author

Related GHC issue for a curious behavior I observed: GHC#25578 (also mentioned in the code).

* Make foldr and foldl short-circuit instead of lazily accumulating
  thunks.
* Switch to a non-empty style to avoid unnecessary comparisons. This
  also helps GHC with arity-analysis (somehow), which greatly improves
  performance of CPS-style foldr and foldl.
* Change the bitwise operations used from
  bitmask = bm .&. -bm; bi = ctz bitmask; bm' = bm `xor` bitmask
  to
  bi = ctz bm; bm' = bm .&. (bm-1)
  which is slightly faster.
@meooow25
Copy link
Contributor Author

Okay, I'm pretty happy with this. Can't think of anything else that might be improved here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

IntSet foldl and foldr smell funny
1 participant