You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Although the code and paper suggest that 64-bit hashes are being used, the Java Object.hashCode() function only returns 32 bits. The good news is that the bug in #19 has no effect since the upper 16-bits are always 0 (or perhaps all 1s, depending on sign extension effects).
The bad news is that because bits 32-47 are either all zero (or perhaps evenly divided between all zero & all one), I suspect all (or at least half) of the documents will end up being clustered together, making for a very expensive O(n^2) comparison.
You can probably ignore PR #20 for now. It'll get subsumed into the larger rework necessary.
The text was updated successfully, but these errors were encountered:
Oops, ignore the part about word 2 being all zero/one. It'll actually be the same as word 0 because the 32-bit hashcode gets shifted through twice to test "all 64" bits, so the upper 32 bits will be duplicates of the lower 32 bits.
Although the code and paper suggest that 64-bit hashes are being used, the Java Object.hashCode() function only returns 32 bits. The good news is that the bug in #19 has no effect since the upper 16-bits are always 0 (or perhaps all 1s, depending on sign extension effects).
The bad news is that because bits 32-47 are either all zero (or perhaps evenly divided between all zero & all one), I suspect all (or at least half) of the documents will end up being clustered together, making for a very expensive O(n^2) comparison.
You can probably ignore PR #20 for now. It'll get subsumed into the larger rework necessary.
The text was updated successfully, but these errors were encountered: