Incremental serialization #514

arvidn · 2024-12-17T12:50:32Z

this PR is best reviewed one commit at a time.

This introduces a new way of serializing compressed CLVM incrementally. This relies on a new feature in ObjectCache, to provide a "stop_token". This is the first commit. The second commit introduces a Serializer object that can be fed CLVM structures, incrementally, via its add() function.

Use case

This is intended to be used for serializing block generators in compressed form until we fill the block cost. Compression introduces a challenge in predicting the size of the generator, and hence the byte cost. The Serializer in this PR allows us to add one spend bundle at a time until we exceed the cost limit, and then undo adding the last spend bundle.

ObjectCache

The object cache maps NodePtr to serialized length and tree hash for every node we serialize. This let us look up duplicate sub-trees and reference them rather than repeating them in the serialized form. With the incremental serialization, we need a way to indicate that we can't compute the tree hash (or serialized length) because the sub tree contains the sentinel (called stop_token in the context of ObjectCache).

We simply return None in this case, as if the node is not in the cache. The serializer also need to support some nodes not having a tree hash (or serialized length).

Serializer

The optional sentinel NodePtr indicates where the serialization should suspend and return. The serialization is resumed by calling add() again, with the subtree that goes in the location of the sentinel in the previous call.

The call to add() can be undone by using the UndoState returned by add() and call restore() on the Serializer.

Limited compression

Since we can't compute tree hashes of structures where parts of it were added incrementally, there are some (contrived) cases where the one-pass serialization does a better job. However, I don't believe this would ever happen in a block generator (only if we had identical spends repeated multiple times). There is a test case covering this.

fuzzers

The incremental serialization has a fuzzer that:

generates a random CLVM tree (based on the fuzzer input)
for each node in the tree:
- split the tree into two parts, one where the node is replaced by the sentinel, and one of the subtree where the sentinel was placed
- serialize the two parts with add(), in two steps
- ensure the serialization is valid and matches the original tree

…uting a value for any tree containing it

coveralls-official · 2024-12-17T12:59:18Z

Pull Request Test Coverage Report for Build 12393985045

Details

286 of 291 (98.28%) changed or added relevant lines in 4 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.2%) to 93.606%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/serde/incremental.rs	211	216	97.69%

Totals
Change from base Build 12357082827:	0.2%
Covered Lines:	5841
Relevant Lines:	6240

💛 - Coveralls

fuzz/fuzz_targets/incremental_serializer.rs

… back-refs

richardkiss

Not sure I completely understand this but even though it's consensus critical, it's pretty easy to deploy in a way where we can verify it's doing as we expected (ie. decompress and compare).

extend ObjectCache to support an optional stop_token, to prevent comp…

9e2bd10

…uting a value for any tree containing it

arvidn requested review from richardkiss and Rigidity and removed request for richardkiss December 17, 2024 12:50

arvidn requested a review from richardkiss December 17, 2024 13:04

arvidn commented Dec 17, 2024

View reviewed changes

fuzz/fuzz_targets/incremental_serializer.rs Show resolved Hide resolved

arvidn force-pushed the incremental-serialization branch 2 times, most recently from c1888b7 to 42583b1 Compare December 18, 2024 00:55

arvidn marked this pull request as draft December 18, 2024 02:21

Serializer, for serializing incrementally built CLVM structures, with…

4e968aa

… back-refs

arvidn force-pushed the incremental-serialization branch from 42583b1 to 4e968aa Compare December 18, 2024 13:29

arvidn marked this pull request as ready for review December 18, 2024 13:47

richardkiss approved these changes Dec 20, 2024

View reviewed changes

arvidn merged commit 110d69c into main Dec 21, 2024
30 checks passed

arvidn deleted the incremental-serialization branch December 21, 2024 21:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental serialization #514

Incremental serialization #514

arvidn commented Dec 17, 2024 •

edited

Loading

coveralls-official bot commented Dec 17, 2024 •

edited

Loading

richardkiss left a comment

Incremental serialization #514

Incremental serialization #514

Conversation

arvidn commented Dec 17, 2024 • edited Loading

Use case

ObjectCache

Serializer

Limited compression

fuzzers

coveralls-official bot commented Dec 17, 2024 • edited Loading

Pull Request Test Coverage Report for Build 12393985045

Details

💛 - Coveralls

richardkiss left a comment

Choose a reason for hiding this comment

arvidn commented Dec 17, 2024 •

edited

Loading

coveralls-official bot commented Dec 17, 2024 •

edited

Loading