Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

Open
michaelsproul opened this issue Nov 25, 2024 · 2 comments
Labels
bug Something isn't working consensus An issue/PR that touches consensus code, such as state_processing or block verification. database

Comments

@michaelsproul
Copy link
Member

Description

Similar to other database crash safety bugs, this one relates to a bad assumption about state after an unclean shutdown. Specifically the pre_finalization_cache assumes that a block present on disk but not in fork choice must be a finalized block that has been pruned from fork choice:

// 2. Check on disk.
if self.store.get_blinded_block(&block_root)?.is_some() {
cache.block_roots.put(block_root, ());
return Ok(true);
}

This is false in the case where:

  1. A block B arrives and is processed & written to disk.
  2. The node is uncleanly killed (sigkill, OOM, power outage) before the fork choice data structure containing B is written to disk.

The result is that after a restart B is on disk but not in fork choice. This is OK from a DB invariant PoV, because the invariant we maintain is (intentionally) one directional:

block in fork_choice --> block in database

The pre-finalization cache is making an additional bad assumption that the block must only be on disk because it was fully imported and then pruned from fork choice.

A similar assumption is made here:

// If fork choice does *not* consider the parent to be a descendant of the finalized block,
// then there are two more cases:
//
// 1. We have the parent stored in our database. Because fork-choice has confirmed the
// parent is *not* in our post-finalization DAG, all other blocks must be either
// pre-finalization or conflicting with finalization.
// 2. The parent is unknown to us, we probably want to download it since it might actually
// descend from the finalized root.
if chain
.store
.block_exists(&block.parent_root())
.map_err(|e| BlockError::BeaconChainError(e.into()))?
{
Err(BlockError::NotFinalizedDescendant {
block_parent_root: block.parent_root(),
})
} else {

It's not true that the parent must be pre-finalization or conflicting with finalization if it is present on disk. It could be the same as above where the parent was imported fully and written to disk, but forgotten from fork choice.

Version

Lighthouse v5.3.0

Steps to resolve

Rather than making assumptions about blocks based on their presence on disk, we could use a weaker and more defensive check to catch blocks that are definitely finalized or conflicting with finalization.

One such check for the pre-finalization cache would be:

  • If the block_root at the block's slot on the canonical chain equals the block root of the block, then it is finalized.

This should just involve a single database lookup from the freezer database in most cases, although it could be slow during periods of non-finality when we must iterate backwards across multiple beacon states. This check has the downside of not catching blocks from side chains which have been pruned due to finality. Let's enumerate all the cases so we don't miss any.

Case A: block is canonical and finalized

This is the "easy" case. We know that a block is finalized if its block root matches the block root from the canonical chain at that slot. We can reject these blocks if they are received on gossip or RPC, because we have already imported them.

Case B: parent is canonical and finalized

If a block's parent is a canonical block from the finalized chain (like case A), and is not the finalized block i.e. the most recently finalized block, then it is in conflict with finalization and should be rejected. This case could be checked in a similar way to (A) by using the canonical block roots iterator. However it is potentially inefficient, because we do not know how much older

Case C: multi-block sidechain starting prior to finalization

Case B is a special case of a sidechain starting from a pre-finalization block. In the case of B we have some chance of checking quickly because we can look up the parent. However if the parent is not known because the sidechain consisted of multiple blocks (and the parent has since been pruned), then there's no check we can do unless we download the sidechain again and reject it (from sync -- see alternative 1), or remember all rejected & pruned block roots (see alternative 2).

Alternative 1: use sync

It feels like we're duplicating some logic from sync here. Perhaps we could offload this to sync and let it blacklist old side chains which fail to process.

Alternative 2: remember block roots of pruned/rejected blocks on disk and in memory

A different approach would be to store the block roots of all pruned & rejected blocks on disk. This would allow us to quickly reject any block which has previously been rejected, or builds on a rejected block, without having do complicated inference based on the canonical chain.

@michaelsproul michaelsproul added bug Something isn't working consensus An issue/PR that touches consensus code, such as state_processing or block verification. database labels Nov 25, 2024
@michaelsproul michaelsproul changed the title Block verification checks for p Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky Nov 25, 2024
@michaelsproul
Copy link
Member Author

cc @dapplion this may interest you

@michaelsproul
Copy link
Member Author

thanks to @pawanjay176 for flagging this bug too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working consensus An issue/PR that touches consensus code, such as state_processing or block verification. database
Projects
None yet
Development

No branches or pull requests

1 participant