Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Input length in Simple8b and Simple16 codecs #69

Open
gustingonzalez opened this issue Apr 21, 2020 · 3 comments
Open

Input length in Simple8b and Simple16 codecs #69

gustingonzalez opened this issue Apr 21, 2020 · 3 comments

Comments

@gustingonzalez
Copy link

gustingonzalez commented Apr 21, 2020

At the end of decoding in the specified codecs, some checks are performed to verify that the pointer of the input data is less than the estimated data to read:
https://github.com/lemire/FastPFor/blob/224fe919760c073566334dde6f510769aff41beb/headers/simple8b.h#L640
https://github.com/lemire/FastPFor/blob/224fe919760c073566334dde6f510769aff41beb/headers/simple16.h#L748

Although this makes sense in debug mode to check consistency, the nvalue parameter is enough to decode. So, if this check does not perform another task than the mentioned, (1) maybe the len parameter could be removed; or (2) the assertions could be modified to allow len = 0, which is useful when the data len is not known , e.g.:

assert(len == 0 || in64 <= finalin64)

@lemire
Copy link
Member

lemire commented Apr 21, 2020

(1) maybe the len parameter could be removed;

The assert is meant to check something that ought to be true. Why remove it?

Note that you can remove asserts at compile-time.

the assertions could be modified to allow len = 0

Can you elaborate? Are you saying that you could have len == 0 and in64 > finalin64 as a valid result?

@gustingonzalez
Copy link
Author

Consider the Simple8b codec. On decoding, the finalin variable is computed using endin as:
https://github.com/lemire/FastPFor/blob/224fe919760c073566334dde6f510769aff41beb/headers/simple8b.h#L511

On the other hand, endin variable is computed as:
https://github.com/lemire/FastPFor/blob/224fe919760c073566334dde6f510769aff41beb/headers/simple8b.h#L495

Anyway, these variables are not involved in the decoding process, that is performed by using only the nvalue parameter. This means that the decode will works regardless of the value of endin and finalin variables.

More specifically, if we don't know the input lenght, but we know the nvalue, we could specify the len as 0. So, the decoding process will be correct anyway. However, the endin variable would have been computed at first as:
endin = in + len => endin = in + 0 => endin = in

Since endin is equal to the in pointer in its initial state, and at the end of the decoding the in pointer has a greater value, the involved assert will fails.

@lemire
Copy link
Member

lemire commented Apr 22, 2020

@gustingonzalez Please submit a PR. In your PR, document that "len = 0" means "len is unknown". I will merge it and you will be credited for the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants