Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roaring: implement ToDense and FromDense #408

Merged
merged 11 commits into from
Dec 18, 2023
Merged

Conversation

tsenart
Copy link
Contributor

@tsenart tsenart commented Dec 16, 2023

This commit introduces ToDense and FromDense as methods in Bitmap. They make conversion to and from dense bitmaps like https://github.com/bits-and-blooms/bitset or https://github.com/kelindar/bitmap performant and simple.

name                                     time/op
FromDense/bitmap-4097-10                   1.62µs ± 0%
FromDense/run-4098-10                      1.25µs ± 0%
FromDense/array-4096-10                    1.60µs ± 0%
FromDense/bitmaps-and-runs-300000-10       3.76µs ± 0%
WriteDenseTo/bitmap-4097-10                17.1ns ± 0%
WriteDenseTo/run-4098-10                   31.2ns ± 0%
WriteDenseTo/array-4096-10                 8.69µs ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10    1.53µs ± 0%

name                                     speed
FromDense/bitmap-4097-10                 5.70GB/s ± 0%
FromDense/run-4098-10                     827MB/s ± 0%
FromDense/array-4096-10                  5.76GB/s ± 0%
FromDense/bitmaps-and-runs-300000-10     14.9GB/s ± 0%
WriteDenseTo/bitmap-4097-10               538GB/s ± 0%
WriteDenseTo/run-4098-10                 33.0GB/s ± 0%
WriteDenseTo/array-4096-10               1.06GB/s ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10  36.7GB/s ± 0%

name                                     alloc/op
FromDense/bitmap-4097-10                   8.22kB ± 0%
FromDense/run-4098-10                      8.22kB ± 0%
FromDense/array-4096-10                    8.22kB ± 0%
FromDense/bitmaps-and-runs-300000-10       8.35kB ± 0%
WriteDenseTo/bitmap-4097-10                 0.00B     
WriteDenseTo/run-4098-10                    0.00B     
WriteDenseTo/array-4096-10                  0.00B     
WriteDenseTo/bitmaps-and-runs-300000-10     0.00B     

name                                     allocs/op
FromDense/bitmap-4097-10                     2.00 ± 0%
FromDense/run-4098-10                        2.00 ± 0%
FromDense/array-4096-10                      2.00 ± 0%
FromDense/bitmaps-and-runs-300000-10         6.00 ± 0%
WriteDenseTo/bitmap-4097-10                  0.00     
WriteDenseTo/run-4098-10                     0.00     
WriteDenseTo/array-4096-10                   0.00     
WriteDenseTo/bitmaps-and-runs-300000-10      0.00     

This commit introduces ToDense and WriteDenseTo as methods in Bitmap.
They make conversion to dense bitmaps like https://github.com/bits-and-blooms/bitset
or https://github.com/kelindar/bitmap performant and simple.

```
name                                     time/op
WriteDenseTo/bitmap-4097-10                 135ns ± 2%
WriteDenseTo/run-4097-10                   40.2ns ± 0%
WriteDenseTo/array-4096-10                 8.74µs ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10    2.29µs ± 0%

name                                     speed
WriteDenseTo/bitmap-4097-10              68.2GB/s ± 2%
WriteDenseTo/run-4097-10                 25.7GB/s ± 0%
WriteDenseTo/array-4096-10               1.05GB/s ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10  24.6GB/s ± 0%

name                                     alloc/op
WriteDenseTo/bitmap-4097-10                 0.00B
WriteDenseTo/run-4097-10                    0.00B
WriteDenseTo/array-4096-10                  0.00B
WriteDenseTo/bitmaps-and-runs-300000-10     0.00B

name                                     allocs/op
WriteDenseTo/bitmap-4097-10                  0.00
WriteDenseTo/run-4097-10                     0.00
WriteDenseTo/array-4096-10                   0.00
WriteDenseTo/bitmaps-and-runs-300000-10      0.00
```
```
name                                     time/op
WriteDenseTo/bitmap-4097-10                17.0ns ± 0%
WriteDenseTo/run-4097-10                   29.4ns ± 0%
WriteDenseTo/array-4096-10                 8.65µs ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10    1.52µs ± 0%

name                                     speed
WriteDenseTo/bitmap-4097-10               542GB/s ± 0%
WriteDenseTo/run-4097-10                 35.1GB/s ± 0%
WriteDenseTo/array-4096-10               1.07GB/s ± 0%
WriteDenseTo/bitmaps-and-runs-300000-10  36.9GB/s ± 0%

name                                     alloc/op
WriteDenseTo/bitmap-4097-10                 0.00B
WriteDenseTo/run-4097-10                    0.00B
WriteDenseTo/array-4096-10                  0.00B
WriteDenseTo/bitmaps-and-runs-300000-10     0.00B

name                                     allocs/op
WriteDenseTo/bitmap-4097-10                  0.00
WriteDenseTo/run-4097-10                     0.00
WriteDenseTo/array-4096-10                   0.00
WriteDenseTo/bitmaps-and-runs-300000-10      0.00
```
@tsenart
Copy link
Contributor Author

tsenart commented Dec 17, 2023

Might add a FromDense method too.

@tsenart tsenart changed the title roaring: implement ToDense and WriteDenseTo roaring: implement ToDense and FromDense Dec 17, 2023
roaring.go Show resolved Hide resolved
@lemire lemire self-requested a review December 17, 2023 19:19
Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. The FromDense function is wrong/unsafe, but it is easy to fix. You can use your current code, but then call toEfficientContainer() which should map the invalid bitmap to a valid container. You also need to make sure to actually copy the slice if we keep a bitmap container because otherwise, it could lead to unexpected behavior.

```
name \ time/op                        before         with-efficient-container  no-to-efficient-container
FromDense/bitmap-4097-10                1.45µs ± 1%               2.15µs ± 0%                1.47µs ± 2%
FromDense/run-4098-10                   1.04µs ± 4%               3.00µs ± 0%                1.04µs ± 2%
FromDense/array-4096-10                 1.45µs ± 1%               6.47µs ± 8%                5.46µs ± 1%
FromDense/bitmaps-and-runs-300000-10    3.53µs ± 0%              13.53µs ± 1%                3.57µs ± 1%

name \ speed                          before         with-efficient-container  no-to-efficient-container
FromDense/bitmap-4097-10              6.35GB/s ± 1%             4.28GB/s ± 0%              6.27GB/s ± 2%
FromDense/run-4098-10                  989MB/s ± 4%              345MB/s ± 0%               990MB/s ± 2%
FromDense/array-4096-10               6.36GB/s ± 1%             1.43GB/s ± 8%              1.69GB/s ± 1%
FromDense/bitmaps-and-runs-300000-10  15.9GB/s ± 0%              4.2GB/s ± 1%              15.8GB/s ± 1%

name \ alloc/op                       before         with-efficient-container  no-to-efficient-container
FromDense/bitmap-4097-10                8.22kB ± 0%               8.22kB ± 0%                8.22kB ± 0%
FromDense/run-4098-10                   8.22kB ± 0%               8.26kB ± 0%                8.22kB ± 0%
FromDense/array-4096-10                 8.22kB ± 0%              16.44kB ± 0%               16.41kB ± 0%
FromDense/bitmaps-and-runs-300000-10    8.35kB ± 0%               8.49kB ± 0%                8.35kB ± 0%

name \ allocs/op                      before         with-efficient-container  no-to-efficient-container
FromDense/bitmap-4097-10                  2.00 ± 0%                 2.00 ± 0%                  2.00 ± 0%
FromDense/run-4098-10                     2.00 ± 0%                 4.00 ± 0%                  2.00 ± 0%
FromDense/array-4096-10                   2.00 ± 0%                 4.00 ± 0%                  3.00 ± 0%
FromDense/bitmaps-and-runs-300000-10      6.00 ± 0%                16.00 ± 0%                  6.00 ± 0%
```
@lemire
Copy link
Member

lemire commented Dec 17, 2023

I assumed that passing the copy on write argument as true would allow us to not copy. Can you clarify what that argument is for if not for that?

It depends. You may want copy-on-write or not. The first problem here is that you keep a reference to the original slice. So suppose someone takes a regular bitmap, creates a roaring bitmap with it. Suppose that the roaring bitmap is smaller (due to compression). That's good, right? Well. Except that you are still holding a reference to the original uncompressed slice, so it does not go out of scope and the memory usage remains. The second problem is that if you keep the original bitset around, and modify it, this will affect the roaring bitmap, but the roaring bitmap is unaware of the changes that occur through the bitset.

One way out of that is to make copy-on-write an option. If you plan on keeping the uncompressed bitmap around in any case, and you know that you want ever modify it, then avoiding the copy is good... but the user should be clear on their responsibility.

You definitively do not want to copy the slice if you are going to convert it to an array or a run container.

@tsenart
Copy link
Contributor Author

tsenart commented Dec 17, 2023

Thanks for elaborating. To share some context, for our use case, we're converting from immutable bitsets to (throw-away) roaring bitmaps for compressed serialization, so copying isn't necessary and a performance hit that isn't warranted.

I understand other use cases might differ, so I think the best path forward is as you suggested: an argument that lets the user control that.

@tsenart tsenart requested a review from lemire December 17, 2023 21:48
@lemire
Copy link
Member

lemire commented Dec 18, 2023

That's good. Merging.

@lemire lemire merged commit cceddf2 into RoaringBitmap:master Dec 18, 2023
4 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants