Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental serialization #514

Merged
merged 2 commits into from
Dec 21, 2024
Merged

Incremental serialization #514

merged 2 commits into from
Dec 21, 2024

Conversation

arvidn
Copy link
Contributor

@arvidn arvidn commented Dec 17, 2024

this PR is best reviewed one commit at a time.

This introduces a new way of serializing compressed CLVM incrementally. This relies on a new feature in ObjectCache, to provide a "stop_token". This is the first commit. The second commit introduces a Serializer object that can be fed CLVM structures, incrementally, via its add() function.

Use case

This is intended to be used for serializing block generators in compressed form until we fill the block cost. Compression introduces a challenge in predicting the size of the generator, and hence the byte cost. The Serializer in this PR allows us to add one spend bundle at a time until we exceed the cost limit, and then undo adding the last spend bundle.

ObjectCache

The object cache maps NodePtr to serialized length and tree hash for every node we serialize. This let us look up duplicate sub-trees and reference them rather than repeating them in the serialized form. With the incremental serialization, we need a way to indicate that we can't compute the tree hash (or serialized length) because the sub tree contains the sentinel (called stop_token in the context of ObjectCache).

We simply return None in this case, as if the node is not in the cache. The serializer also need to support some nodes not having a tree hash (or serialized length).

Serializer

The optional sentinel NodePtr indicates where the serialization should suspend and return. The serialization is resumed by calling add() again, with the subtree that goes in the location of the sentinel in the previous call.

The call to add() can be undone by using the UndoState returned by add() and call restore() on the Serializer.

Limited compression

Since we can't compute tree hashes of structures where parts of it were added incrementally, there are some (contrived) cases where the one-pass serialization does a better job. However, I don't believe this would ever happen in a block generator (only if we had identical spends repeated multiple times). There is a test case covering this.

fuzzers

The incremental serialization has a fuzzer that:

  • generates a random CLVM tree (based on the fuzzer input)
  • for each node in the tree:
    • split the tree into two parts, one where the node is replaced by the sentinel, and one of the subtree where the sentinel was placed
    • serialize the two parts with add(), in two steps
    • ensure the serialization is valid and matches the original tree

@arvidn arvidn requested review from richardkiss and Rigidity and removed request for richardkiss December 17, 2024 12:50
Copy link

coveralls-official bot commented Dec 17, 2024

Pull Request Test Coverage Report for Build 12393985045

Details

  • 286 of 291 (98.28%) changed or added relevant lines in 4 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 93.606%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/serde/incremental.rs 211 216 97.69%
Totals Coverage Status
Change from base Build 12357082827: 0.2%
Covered Lines: 5841
Relevant Lines: 6240

💛 - Coveralls

@arvidn arvidn requested a review from richardkiss December 17, 2024 13:04
@arvidn arvidn force-pushed the incremental-serialization branch 2 times, most recently from c1888b7 to 42583b1 Compare December 18, 2024 00:55
@arvidn arvidn marked this pull request as draft December 18, 2024 02:21
@arvidn arvidn force-pushed the incremental-serialization branch from 42583b1 to 4e968aa Compare December 18, 2024 13:29
@arvidn arvidn marked this pull request as ready for review December 18, 2024 13:47
Copy link
Contributor

@richardkiss richardkiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I completely understand this but even though it's consensus critical, it's pretty easy to deploy in a way where we can verify it's doing as we expected (ie. decompress and compare).

@arvidn arvidn merged commit 110d69c into main Dec 21, 2024
30 checks passed
@arvidn arvidn deleted the incremental-serialization branch December 21, 2024 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants