-
Notifications
You must be signed in to change notification settings - Fork 233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roaring: implement ToDense and FromDense #408
Conversation
This commit introduces ToDense and WriteDenseTo as methods in Bitmap. They make conversion to dense bitmaps like https://github.com/bits-and-blooms/bitset or https://github.com/kelindar/bitmap performant and simple. ``` name time/op WriteDenseTo/bitmap-4097-10 135ns ± 2% WriteDenseTo/run-4097-10 40.2ns ± 0% WriteDenseTo/array-4096-10 8.74µs ± 0% WriteDenseTo/bitmaps-and-runs-300000-10 2.29µs ± 0% name speed WriteDenseTo/bitmap-4097-10 68.2GB/s ± 2% WriteDenseTo/run-4097-10 25.7GB/s ± 0% WriteDenseTo/array-4096-10 1.05GB/s ± 0% WriteDenseTo/bitmaps-and-runs-300000-10 24.6GB/s ± 0% name alloc/op WriteDenseTo/bitmap-4097-10 0.00B WriteDenseTo/run-4097-10 0.00B WriteDenseTo/array-4096-10 0.00B WriteDenseTo/bitmaps-and-runs-300000-10 0.00B name allocs/op WriteDenseTo/bitmap-4097-10 0.00 WriteDenseTo/run-4097-10 0.00 WriteDenseTo/array-4096-10 0.00 WriteDenseTo/bitmaps-and-runs-300000-10 0.00 ```
``` name time/op WriteDenseTo/bitmap-4097-10 17.0ns ± 0% WriteDenseTo/run-4097-10 29.4ns ± 0% WriteDenseTo/array-4096-10 8.65µs ± 0% WriteDenseTo/bitmaps-and-runs-300000-10 1.52µs ± 0% name speed WriteDenseTo/bitmap-4097-10 542GB/s ± 0% WriteDenseTo/run-4097-10 35.1GB/s ± 0% WriteDenseTo/array-4096-10 1.07GB/s ± 0% WriteDenseTo/bitmaps-and-runs-300000-10 36.9GB/s ± 0% name alloc/op WriteDenseTo/bitmap-4097-10 0.00B WriteDenseTo/run-4097-10 0.00B WriteDenseTo/array-4096-10 0.00B WriteDenseTo/bitmaps-and-runs-300000-10 0.00B name allocs/op WriteDenseTo/bitmap-4097-10 0.00 WriteDenseTo/run-4097-10 0.00 WriteDenseTo/array-4096-10 0.00 WriteDenseTo/bitmaps-and-runs-300000-10 0.00 ```
Might add a FromDense method too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. The FromDense function is wrong/unsafe, but it is easy to fix. You can use your current code, but then call toEfficientContainer()
which should map the invalid bitmap to a valid container. You also need to make sure to actually copy the slice if we keep a bitmap container because otherwise, it could lead to unexpected behavior.
``` name \ time/op before with-efficient-container no-to-efficient-container FromDense/bitmap-4097-10 1.45µs ± 1% 2.15µs ± 0% 1.47µs ± 2% FromDense/run-4098-10 1.04µs ± 4% 3.00µs ± 0% 1.04µs ± 2% FromDense/array-4096-10 1.45µs ± 1% 6.47µs ± 8% 5.46µs ± 1% FromDense/bitmaps-and-runs-300000-10 3.53µs ± 0% 13.53µs ± 1% 3.57µs ± 1% name \ speed before with-efficient-container no-to-efficient-container FromDense/bitmap-4097-10 6.35GB/s ± 1% 4.28GB/s ± 0% 6.27GB/s ± 2% FromDense/run-4098-10 989MB/s ± 4% 345MB/s ± 0% 990MB/s ± 2% FromDense/array-4096-10 6.36GB/s ± 1% 1.43GB/s ± 8% 1.69GB/s ± 1% FromDense/bitmaps-and-runs-300000-10 15.9GB/s ± 0% 4.2GB/s ± 1% 15.8GB/s ± 1% name \ alloc/op before with-efficient-container no-to-efficient-container FromDense/bitmap-4097-10 8.22kB ± 0% 8.22kB ± 0% 8.22kB ± 0% FromDense/run-4098-10 8.22kB ± 0% 8.26kB ± 0% 8.22kB ± 0% FromDense/array-4096-10 8.22kB ± 0% 16.44kB ± 0% 16.41kB ± 0% FromDense/bitmaps-and-runs-300000-10 8.35kB ± 0% 8.49kB ± 0% 8.35kB ± 0% name \ allocs/op before with-efficient-container no-to-efficient-container FromDense/bitmap-4097-10 2.00 ± 0% 2.00 ± 0% 2.00 ± 0% FromDense/run-4098-10 2.00 ± 0% 4.00 ± 0% 2.00 ± 0% FromDense/array-4096-10 2.00 ± 0% 4.00 ± 0% 3.00 ± 0% FromDense/bitmaps-and-runs-300000-10 6.00 ± 0% 16.00 ± 0% 6.00 ± 0% ```
It depends. You may want copy-on-write or not. The first problem here is that you keep a reference to the original slice. So suppose someone takes a regular bitmap, creates a roaring bitmap with it. Suppose that the roaring bitmap is smaller (due to compression). That's good, right? Well. Except that you are still holding a reference to the original uncompressed slice, so it does not go out of scope and the memory usage remains. The second problem is that if you keep the original bitset around, and modify it, this will affect the roaring bitmap, but the roaring bitmap is unaware of the changes that occur through the bitset. One way out of that is to make copy-on-write an option. If you plan on keeping the uncompressed bitmap around in any case, and you know that you want ever modify it, then avoiding the copy is good... but the user should be clear on their responsibility. You definitively do not want to copy the slice if you are going to convert it to an array or a run container. |
Thanks for elaborating. To share some context, for our use case, we're converting from immutable bitsets to (throw-away) roaring bitmaps for compressed serialization, so copying isn't necessary and a performance hit that isn't warranted. I understand other use cases might differ, so I think the best path forward is as you suggested: an argument that lets the user control that. |
That's good. Merging. |
This commit introduces ToDense and FromDense as methods in Bitmap. They make conversion to and from dense bitmaps like https://github.com/bits-and-blooms/bitset or https://github.com/kelindar/bitmap performant and simple.