-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize ReadCacheLookup
, 6.7 speed-up
#517
base: main
Are you sure you want to change the base?
Conversation
a464e4d
to
8671a04
Compare
Pull Request Test Coverage Report for Build 12429902792Details
💛 - Coveralls |
} | ||
|
||
fn write_u64(&mut self, _i: u64) { | ||
panic!("This hasher only takes bytes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this is necessary. If you want to use a standalone hasher I tried a couple years back, you could steal it from richardkiss/clvm_rs@50f6c94#diff-b696b12532e34a5d93b4bf96927a9049a2ad3d7c04bfde786611f6a560622e2f ; I don't think it'd be much slower, if at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this "hasher" depends on the input already having hash-properties. If anyone, for some reason, puts a u64
into it, that assumption likely no longer holds. I think it's preferable to panic in that case.
That said, the one risk I see with this is that there's no salt, so an attacker can grind hashes that causes worst-case performance of the hash set/map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible solution; don't always take the first eight bytes, but randomly select an offset between 0 and 24 to start taking the bytes.
True, that only makes it 24 times as hard. But we could choose, say, five random offsets between 0 and 24 and eor them together. Then it's 24**5 times as hard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One attack I see would be, create a legit spend that has so many hash tree collisions that the hash tree gets O(n^2) performance. If we only have a 1/24 chance of having that happen, it won't be long before it gets included in a block by someone who chooses a failing offset, and then the spend is gone forever.
That's really the only attack I can think of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I'd be curious to know if the u8
-> bool
change does anything. But it seems at worst harmless.
There are still some worst-case scenarios where this isn't fast enough to serialize a full block. I'm working on a more ambitions PR to speed this up more. |
a material portion of the speed-up in this patch comes from not using a salt in the hasher. Which is a bit surprising, but I suppose we initialize new |
This is a major hot-spot when serializing with back references. This patch speeds it up by:
IdentityHash
, to avoid hashing the tree-hashes againBitVec
instead ofVec<u8>
to store paths (which only contain 0 and 1 anyway).Compared to
main
, the relevant benchmarks indicate a time reduction of between 80-90%, of serializing clvm with back references.One concern is that, not mixing in any random state in the hasher, opens up for hash-collision attacks. I'm not sure if it's worth addressing or what the best way would be. Maybe add a random value to the
IdentityHash
object, and add it to the hash value.