You tried to apply Simple16 to an incompatible set of integers #51

pps83 · 2019-06-08T05:54:09Z

https://github.com/lemire/FastPFor/blob/71d54a9793245ae90e69c86a425d4ee1ee6543d8/headers/simple16.h#L377

so, there is that exception, but no explanation at all what Simple16 considers incompatible.

Also, many codecs are plainly broken, read/write random memory or crash. These are buggy/broken: simple16, simple8b_rle, simple9, simple9_rle, vsencoding, simdfastpfor256, fastpfor128, fastpfor256, simdfastpfor128 (the last for read random memory here and there).

lemire · 2019-06-08T14:03:50Z

This should be better documented, but it should be clear from the code:

if ((*in >> 28) > 0) {
        std::cerr << "Input's out of range: " << *in << std::endl;
        throw std::runtime_error(
            "You tried to apply Simple16 to an incompatible set of integers.");
 }

Integers are expected to be in [0, 2^28).

lemire · 2019-06-08T14:05:34Z

Also, many codecs are plainly broken, read/write random memory or crash. These are buggy/broken: simple16, simple8b_rle, simple9, simple9_rle, vsencoding, simdfastpfor256, fastpfor128, fastpfor256, simdfastpfor128 (the last for read random memory here and there).

Can you open issues with each bug you have found with a reproducing test case? Ideally, please consider issue pull requests to fix the issues you have found.

lemire · 2019-06-08T14:09:58Z

Fixed as per...
19c9e6e

Regarding other issues you allude to, please open separate issues and provide reproducible test cases and/or factual analysis to describe the problem.

pps83 · 2019-06-08T19:46:07Z

If all these simpleNN codecs fail because the inpunt numbers have different range, then this part is cleared.
Regarding simdfastpfor256, fastpfor128, fastpfor256, simdfastpfor128 (these seem to be high quality compressors), I only get errors in these if 1) I enable 32-bit build and, if 2) I redefine __restrict__ to __restrict on VS (which is supported), and if 3) I run some code to compress large arrays with Dr.Memory (it's based on DynamoRIO).
Also many other codecs in my case required way more than 1KB extra space for them not to access random memory.

lemire · 2019-06-08T20:25:32Z

Sure... I trust that the issues you found are real. I am just inviting you to open distinct issues... (otherwise it is hard to keep track of what is what).

pps83 · 2019-06-08T20:42:28Z

I tried to debug and it's not conclusive, I didn't see memory access issues. Without __restrict I didn't get any reported issues though, so hard to even report anything as __restrict isn't used on VS anyways.

pps83 · 2019-06-09T05:41:01Z

Just wanted to let you know that results that I got from FastPFor overcame all my expectations. I was aiming mainly at good compression ratio with very fast decompression speed, and overall from my testing I got best results from simdfastpfor128 (along with simdsimplepfor and simdfastpfor256 that have very close results). Also in my tests on my data simdbinarypacking is roughly 20% faster at decompression speeds, but compresses a bit worse than simdfastpfor128. Are there any "hidden" requirements on the data that simdfastpfor128 expects? Like data is multiples of 8 x uint32_t etc?..

Overall, awesome!
I made a few VS-related fixes and ported some codecs that were disabled for VS: #52

lemire · 2019-06-09T15:00:42Z

@pps83

Well, it is not a hidden requirement: simdfastpfor128 works over blocks of 128 integers. Unless you you have a multiple of 128 values, you should not call simdfastpfor128 directly.

lemire · 2019-06-09T15:01:54Z

@pps83

If you go through the factory call, then you should be fine...

IntegerCODEC &codec = *CODECFactory::getFromName("simdfastpfor128");

lemire · 2019-06-09T15:02:27Z

... because the factory wraps the codec.

pps83 · 2019-06-10T01:41:30Z

Yes, I use it through the wrapper (but not through the codec factory, as I don't want to pull in all other codecs into my binary because of that map). I use FastPForLib::CompositeCodec<FastPForLib::SIMDFastPFor<4>, FastPForLib::VariableByte>, and it seems that there are "hidden" requirements: if the input buffer isn't aligned, then I get asserts: assert(!needPaddingTo128Bits(in));, and if I comment it out, I get outputs that differ in length.

When I diff them, I get almost identical memory contents, except there is one zero inserted and second number differ by 1:

So, the contents compressed with simdfastpfor128 already contains size as part of the compressed data?

This is the test that I'm trying to use: https://gist.github.com/pps83/1fbeed8ec599861075d7b7de7fe1ed9f

In my case I encode blocks of 2731 ints, then encode and store these blocks. When I read them back I decode them. So far, encoding fails because it seems that not only that it expects alignegment, but also produces different output depending on input alignment.

lemire · 2019-06-10T02:46:49Z

it seems that there are "hidden" requirements

You are correct. There are leftover implicit alignment restrictions. We removed them in follow-up work, but they were never lifted in this library... see...

#53

Pull requests are invited to solve this issue.

When I read them back I decode them. So far, encoding fails because it seems that not only that it expects alignegment, but also produces different output depending on input alignment.

They are definitively sensitive to alignment. In fact, it is even documented in the code that you are responsible to maintain alignment prior to decoding.

Obviously, this is not desirable and should be fixed. Please see the issue I just created.

lemire closed this as completed Jun 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

You tried to apply Simple16 to an incompatible set of integers #51

You tried to apply Simple16 to an incompatible set of integers #51

pps83 commented Jun 8, 2019 •

edited

Loading

lemire commented Jun 8, 2019

lemire commented Jun 8, 2019

lemire commented Jun 8, 2019

pps83 commented Jun 8, 2019

lemire commented Jun 8, 2019

pps83 commented Jun 8, 2019

pps83 commented Jun 9, 2019 •

edited

Loading

lemire commented Jun 9, 2019

lemire commented Jun 9, 2019

lemire commented Jun 9, 2019

pps83 commented Jun 10, 2019 •

edited

Loading

lemire commented Jun 10, 2019

You tried to apply Simple16 to an incompatible set of integers #51

You tried to apply Simple16 to an incompatible set of integers #51

Comments

pps83 commented Jun 8, 2019 • edited Loading

lemire commented Jun 8, 2019

lemire commented Jun 8, 2019

lemire commented Jun 8, 2019

pps83 commented Jun 8, 2019

lemire commented Jun 8, 2019

pps83 commented Jun 8, 2019

pps83 commented Jun 9, 2019 • edited Loading

lemire commented Jun 9, 2019

lemire commented Jun 9, 2019

lemire commented Jun 9, 2019

pps83 commented Jun 10, 2019 • edited Loading

lemire commented Jun 10, 2019

pps83 commented Jun 8, 2019 •

edited

Loading

pps83 commented Jun 9, 2019 •

edited

Loading

pps83 commented Jun 10, 2019 •

edited

Loading