Skip to content

Commit

Permalink
change(state): Write non-finalized blocks to the state in a separate …
Browse files Browse the repository at this point in the history
…thread, to avoid network and RPC hangs (#5257)

* Add a new block commit task and channels, that don't do anything yet

* Add last_block_hash_sent to the state service, to avoid database accesses

* Update last_block_hash_sent regardless of commit errors

* Rename a field to StateService.max_queued_finalized_height

* Commit finalized blocks to the state in a separate task

* Check for panics in the block write task

* Wait for the block commit task in tests, and check for errors

* Always run a proptest that sleeps once

* Add extra debugging to state shutdowns

* Work around a RocksDB shutdown bug

* Close the finalized block channel when we're finished with it

* Only reset state queue once per error

* Update some TODOs

* Add a module doc comment

* Drop channels and check for closed channels in the block commit task

* Close state channels and tasks on drop

* Remove some duplicate fields across StateService and ReadStateService

* Try tweaking the shutdown steps

* Update and clarify some comments

* Clarify another comment

* Don't try to cancel RocksDB background work on drop

* Fix up some comments

* Remove some duplicate code

* Remove redundant workarounds for shutdown issues

* Remode a redundant channel close in the block commit task

* Remove a mistaken `!force` shutdown condition

* Remove duplicate force-shutdown code and explain it better

* Improve RPC error logging

* Wait for chain tip updates in the RPC tests

* Wait 2 seconds for chain tip updates before skipping them

* Remove an unnecessary block_in_place()

* Fix some test error messages that were changed by earlier fixes

* Expand some comments, fix typos

Co-authored-by: Marek <[email protected]>

* Actually drop children of failed blocks

* Explain why we drop descendants of failed blocks

* Clarify a comment

* Wait for chain tip updates in a failing test on macOS

* Clean duplicate finalized blocks when the non-finalized state activates

* Send an error when receiving a duplicate finalized block

* Update checkpoint block behaviour, document its consensus rule

* Wait for chain tip changes in inbound_block_height_lookahead_limit test

* Wait for the genesis block to commit in the fake peer set mempool tests

* Disable unreliable mempool verification check in the send transaction test

* Appease rustfmt

* Use clear_finalized_block_queue() everywhere that blocks are dropped

* Document how Finalized and NonFinalized clones are different

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <[email protected]>

* updates comments, renames send_process_queued, other minor cleanup

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* updates comments, renames send_process_queued, other minor cleanup

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <[email protected]>

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* removes duplicate field definitions on StateService that were a result of a bad merge

* update NotReadyToBeCommitted error message

* Appear rustfmt

* Fix doc links

* Rename a function to initial_contextual_validity()

* Do error tasks on Err, and success tasks on Ok

* Simplify parent_error_map truncation

* Rewrite best_tip() to use tip()

* Rename latest_mem() to latest_non_finalized_state()

```sh
fastmod latest_mem latest_non_finalized_state zebra*
cargo fmt --all
```

* Simplify latest_non_finalized_state() using a new WatchReceiver API

* Expand some error messages

* Send the result after updating the channels, and document why

* wait for chain_tip_update before cancelling download in mempool_cancel_mined

* adds `sent_non_finalized_block_hashes` field to StateService

* adds batched sent_hash insertions and checks sent hashes in queue_and_commit_non_finalized before adding a block to the queue

* check that the `curr_buf` in SentHashes is not empty before pushing it to the `sent_bufs`

* Apply suggestions from code review

Co-authored-by: teor <[email protected]>

* Fix rustfmt

* Check for finalized block heights using zs_contains()

* adds known_utxos field to SentHashes

* updates comment on SentHashes.add method

* Apply suggestions from code review

Co-authored-by: teor <[email protected]>

* return early when there's a duplicate hash in QueuedBlocks.queue instead of panicking

* Make finalized UTXOs near the final checkpoint available for full block verification

* Replace a checkpoint height literal with the actual config

* Update mainnet and testnet checkpoints - 7 October 2022

* Fix some state service init arguments

* Allow more lookahead in the downloader, but less lookahead in the syncer

* Add the latest config to the tests, and fix the latest config check

* Increase the number of finalized blocks checked for non-finalized block UTXO spends

* fix(log): reduce verbose logs for block commits (#5348)

* Remove some verbose block write channel logs

* Only warn about tracing endpoint if the address is actually set

* Use CloneError instead of formatting a non-cloneable error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Increase block verify timeout

* Work around a known block timeout bug by using a shorter timeout

Co-authored-by: teor <[email protected]>
Co-authored-by: Marek <[email protected]>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
  • Loading branch information
4 people authored Oct 11, 2022
1 parent 036a982 commit a28350e
Show file tree
Hide file tree
Showing 35 changed files with 1,230 additions and 639 deletions.
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

27 changes: 18 additions & 9 deletions zebra-consensus/src/chain.rs
Original file line number Diff line number Diff line change
Expand Up @@ -235,15 +235,7 @@ where
let transaction = Buffer::new(BoxService::new(transaction), VERIFIER_BUFFER_BOUND);

// block verification

let list = CheckpointList::new(network);

let max_checkpoint_height = if config.checkpoint_sync {
list.max_height()
} else {
list.min_height_in_range(network.mandatory_checkpoint_height()..)
.expect("hardcoded checkpoint list extends past canopy activation")
};
let (list, max_checkpoint_height) = init_checkpoint_list(config, network);

let tip = match state_service
.ready()
Expand Down Expand Up @@ -275,3 +267,20 @@ where
max_checkpoint_height,
)
}

/// Parses the checkpoint list for `network` and `config`.
/// Returns the checkpoint list and maximum checkpoint height.
pub fn init_checkpoint_list(config: Config, network: Network) -> (CheckpointList, Height) {
// TODO: Zebra parses the checkpoint list twice at startup.
// Instead, cache the checkpoint list for each `network`.
let list = CheckpointList::new(network);

let max_checkpoint_height = if config.checkpoint_sync {
list.max_height()
} else {
list.min_height_in_range(network.mandatory_checkpoint_height()..)
.expect("hardcoded checkpoint list extends past canopy activation")
};

(list, max_checkpoint_height)
}
1 change: 1 addition & 0 deletions zebra-state/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ dirs = "4.0.0"
displaydoc = "0.2.3"
futures = "0.3.24"
hex = "0.4.3"
indexmap = "1.9.1"
itertools = "0.10.5"
lazy_static = "1.4.0"
metrics = "0.20.1"
Expand Down
4 changes: 4 additions & 0 deletions zebra-state/src/error.rs
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,10 @@ pub struct CommitBlockError(#[from] ValidateContextError);
#[non_exhaustive]
#[allow(missing_docs)]
pub enum ValidateContextError {
#[error("block parent not found in any chain")]
#[non_exhaustive]
NotReadyToBeCommitted,

#[error("block height {candidate_height:?} is lower than the current finalized height {finalized_tip_height:?}")]
#[non_exhaustive]
OrphanedBlock {
Expand Down
Loading

0 comments on commit a28350e

Please sign in to comment.