Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

michaelsproul · 2024-11-25T01:13:44Z

Description

Similar to other database crash safety bugs, this one relates to a bad assumption about state after an unclean shutdown. Specifically the pre_finalization_cache assumes that a block present on disk but not in fork choice must be a finalized block that has been pruned from fork choice:

lighthouse/beacon_node/beacon_chain/src/pre_finalization_cache.rs

Lines 75 to 79 in 6329042

    
           // 2. Check on disk. 
        
           if self.store.get_blinded_block(&block_root)?.is_some() { 
        
               cache.block_roots.put(block_root, ()); 
        
               return Ok(true); 
        
           }

This is false in the case where:

A block B arrives and is processed & written to disk.
The node is uncleanly killed (sigkill, OOM, power outage) before the fork choice data structure containing B is written to disk.

The result is that after a restart B is on disk but not in fork choice. This is OK from a DB invariant PoV, because the invariant we maintain is (intentionally) one directional:

block in fork_choice --> block in database

The pre-finalization cache is making an additional bad assumption that the block must only be on disk because it was fully imported and then pruned from fork choice.

A similar assumption is made here:

lighthouse/beacon_node/beacon_chain/src/block_verification.rs

Lines 1765 to 1781 in 6329042

    
           // If fork choice does *not* consider the parent to be a descendant of the finalized block, 
        
           // then there are two more cases: 
        
           // 
        
           // 1. We have the parent stored in our database. Because fork-choice has confirmed the 
        
           //    parent is *not* in our post-finalization DAG, all other blocks must be either 
        
           //    pre-finalization or conflicting with finalization. 
        
           // 2. The parent is unknown to us, we probably want to download it since it might actually 
        
           //    descend from the finalized root. 
        
           if chain 
        
               .store 
        
               .block_exists(&block.parent_root()) 
        
               .map_err(|e| BlockError::BeaconChainError(e.into()))? 
        
           { 
        
               Err(BlockError::NotFinalizedDescendant { 
        
                   block_parent_root: block.parent_root(), 
        
               }) 
        
           } else {

It's not true that the parent must be pre-finalization or conflicting with finalization if it is present on disk. It could be the same as above where the parent was imported fully and written to disk, but forgotten from fork choice.

Version

Lighthouse v5.3.0

Steps to resolve

Rather than making assumptions about blocks based on their presence on disk, we could use a weaker and more defensive check to catch blocks that are definitely finalized or conflicting with finalization.

One such check for the pre-finalization cache would be:

If the block_root at the block's slot on the canonical chain equals the block root of the block, then it is finalized.

This should just involve a single database lookup from the freezer database in most cases, although it could be slow during periods of non-finality when we must iterate backwards across multiple beacon states. This check has the downside of not catching blocks from side chains which have been pruned due to finality. Let's enumerate all the cases so we don't miss any.

Case A: block is canonical and finalized

This is the "easy" case. We know that a block is finalized if its block root matches the block root from the canonical chain at that slot. We can reject these blocks if they are received on gossip or RPC, because we have already imported them.

Case B: parent is canonical and finalized

If a block's parent is a canonical block from the finalized chain (like case A), and is not the finalized block i.e. the most recently finalized block, then it is in conflict with finalization and should be rejected. This case could be checked in a similar way to (A) by using the canonical block roots iterator. However it is potentially inefficient, because we do not know how much older

Case C: multi-block sidechain starting prior to finalization

Case B is a special case of a sidechain starting from a pre-finalization block. In the case of B we have some chance of checking quickly because we can look up the parent. However if the parent is not known because the sidechain consisted of multiple blocks (and the parent has since been pruned), then there's no check we can do unless we download the sidechain again and reject it (from sync -- see alternative 1), or remember all rejected & pruned block roots (see alternative 2).

Alternative 1: use `sync`

It feels like we're duplicating some logic from sync here. Perhaps we could offload this to sync and let it blacklist old side chains which fail to process.

Alternative 2: remember block roots of pruned/rejected blocks on disk and in memory

A different approach would be to store the block roots of all pruned & rejected blocks on disk. This would allow us to quickly reject any block which has previously been rejected, or builds on a rejected block, without having do complicated inference based on the canonical chain.

The text was updated successfully, but these errors were encountered:

michaelsproul · 2024-11-25T01:19:24Z

cc @dapplion this may interest you

michaelsproul · 2024-11-25T01:20:13Z

thanks to @pawanjay176 for flagging this bug too!

michaelsproul added bug Something isn't working consensus An issue/PR that touches consensus code, such as state_processing or block verification. database labels Nov 25, 2024

michaelsproul changed the title ~~Block verification checks for p~~ Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

michaelsproul commented Nov 25, 2024

michaelsproul commented Nov 25, 2024

michaelsproul commented Nov 25, 2024

Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

Block verification checks for pre-finalized/conflicting-with-finalized blocks are flaky #6606

Comments

michaelsproul commented Nov 25, 2024

Description

Version

Steps to resolve

Case A: block is canonical and finalized

Case B: parent is canonical and finalized

Case C: multi-block sidechain starting prior to finalization

Alternative 1: use sync

Alternative 2: remember block roots of pruned/rejected blocks on disk and in memory

michaelsproul commented Nov 25, 2024

michaelsproul commented Nov 25, 2024

Alternative 1: use `sync`