Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stage-extensions): New abstraction for non-core stages and tables #4325

Closed
wants to merge 15 commits into from

Conversation

perama-v
Copy link

@perama-v perama-v commented Aug 23, 2023

Description

Progress toward

Changes made

Adds a mechanism to add new stages and tables with minimal footprint on existing crates.
As any new stage is likely to be used via a custom endpoint, changes are introduced as extensions to the existing MyRethCliExt.

Composed of:

  • New methods to the RethNodeCommandConfig trait.
    • One for adding a custom stage. Modified the pipeline builder to accept &mut rather than mut
    • One for adding additional tables.
    • An associated type for the new tables
pub trait RethNodeCommandConfig: fmt::Debug {
    type TableExt: TableMetadata + 'static;
    /*snip*/
    fn add_custom_stage<'a, DB>(
        &self,
        pipeline_builder: &'a mut PipelineBuilder<DB>,
    ) -> eyre::Result<&'a mut PipelineBuilder<DB>>
    where
        DB: Database;

    fn get_custom_tables(&self) -> Option<Vec<Self::TableExt>>;
  • Modification of the RethNodeCommandConfig method extend_rpc_modules():
    • This method is passed the database via .start_servers(blockchain_db.clone(), .... and is used via provider traits.
    • A new provider trait is added: DatabaseProvider (crates/storage/provider/src/traits/database.rs). The intuition is: if already passing the database, can one grant generic database read access via this provider trait. This includes all future methods and core/non-core tables.
  • Modifications to the table-creation macros to permit re-use for non-core tables.
    • Includes the name of the tables enum and the count of its variants.
    • Grouping of the existing tables! methods under a TableMetadata trait so that non-core tables can be passed generically via that trait.
// examples/additional-stage/src/table.rs
tables!(NonCoreTable, 1, [(MyTable, TableType::Table)]);

The main changes to ./bin/reth/src/node.mod.rs are as follows:

Stages: After adding core Reth stages, the builder is passed to add a new stage(s).

self.ext.add_custom_stage(&mut builder)?; // new
let pipeline = builder.build(db, self.chain.clone());

Tables: Before the database is created, new non-core tables are fetched and passed for creation after core table creation.

let optional_tables = self.ext.get_custom_tables(); // new
let db = Arc::new(init_db(&db_path, self.db.log_level, optional_tables)?);

Discussion

Presented for conceptual feedback. In the main tracking issue, it was raised that some new abstractions are likely required to enable additional indices. This PR is an attempt at such an abstraction.

The implementation is incomplete, and the intention is to see if the idea is a good direction before proceeding more. Perhaps there are obvious flaws or preferred alternative directions?

Apologies for touching so many files: many are just involved on the way to get the number of optional new tables to the set_max_dbs(max_tables) function.

Copy link
Member

@gakonst gakonst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice start. Need to think about this a little more, definitely directionally supportive of the custom stages, and thinking about what the best way to integrate would be.

Maybe one idea for you is to try to configure the abstraction via the CLI like https://github.com/jacobkaufmann/evangelion/blob/x-reth-builder/src/main.rs#L57, so we can experiment with how one can define custom stages OUTSIDE the main node/mode.rs, when they'd import most of the stuff for the node and want to just add an extra stage or two.

Comment on lines -76 to +77
inner_env.set_max_dbs(Tables::ALL.len());
inner_env.set_max_dbs(max_tables);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Nice! @shekhirin @onbjerg @rkrasiuk we've talked about this before, lifts the limit from having a comp-time defined set of stages

@perama-v
Copy link
Author

Started to play around with the RethNodeCommandConfig trait to see where it could be used. Removed the stage-extensions crate and instead created a new ./examples stage-extensions binary. Still a work in progress.

@gakonst
Copy link
Member

gakonst commented Aug 30, 2023

@onbjerg if you have some time - could you please give a vibecheck on the overall design here?

Comment on lines +155 to +158
pub struct MyNamespace<'a, DB: Database> {
/// A reference to the reth database. Used by MyNamespace to construct responses.
database: Arc<DatabaseProviderRO<'a, &'a DB>>,
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Current uncertainty around my approach here. The goal is to provide any method with the ability to access the database, as shown with my_method(). Have not yet worked out the exact lifetimes mechanism have the database as a member here. Perhaps a different design is possible

@perama-v
Copy link
Author

@onbjerg if you have some time - could you please give a vibecheck on the overall design here?

I've just updated the summary describing the design

@onbjerg
Copy link
Member

onbjerg commented Aug 31, 2023

Thanks for taking a stab at this! I think there are a few unresolved things with this approach

  1. This approach will not allow you to add custom stages in between 2 default ones, which might be desireable in some cases. This might not be too big of a deal at first
  2. This approach only allows you to extend historical sync - the live sync portion ("blockchain tree") will not be extended at all, meaning a custom stage that indexes e.g. address interactions will only work as long as the pipeline is used - as soon as reth is syncing the tip, nothing new would be indexed

Do you have some ideas for point 2?

@perama-v
Copy link
Author

perama-v commented Aug 31, 2023

  1. This approach only allows you to extend historical sync - the live sync portion ("blockchain tree") will not be extended at all, meaning a custom stage that indexes e.g. address interactions will only work as long as the pipeline is used - as soon as reth is syncing the tip, nothing new would be indexed

Ok that is interesting.

From the docs (docs/crates/stages.md):

Then, the Pipeline::run function is called, which starts the pipeline, 
executing all of the stages continuously in an infinite loop. 
This process syncs the chain, keeping everything up to date with the chain tip.

I took this to mean that the stages restarted, looping as new blocks arrive:

## initial sync
A (genesis to tip) -> B (genesis to tip) -> C (genesis to tip) ... 
## maintenance
- Block `n`: A -> B -> C ...,
- Block `n + 1`: A -> B -> C ...,
- Block `n + 2`: A -> B -> C ...,

I'll look more closely to understand better how it actually works

@gakonst
Copy link
Member

gakonst commented Aug 31, 2023

PTAL at the BlockchainTree, it is responsible for the tip/live-sync, whereas the pipeline is more like a "batch load" process for the history.

@perama-v
Copy link
Author

perama-v commented Sep 1, 2023

Do you have some ideas for point 2?

cefec0a

Here would be one approach I could see:

  • Trigger extra stages at the end of the blockchain-tree AppendableChain::validate_and_execute()
  • Get data from stages and represent as a user-defined struct
  • Tack that struct inside PostState
  • Call a user-defined db write function on the data when PostState calls db write.

@perama-v
Copy link
Author

  • Tack that struct inside PostState

Hit an impasse with this approach as introducing generics in that struct was infeasible.

Have played with a few other approaches. Nothing compelling identified yet

@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Oct 12, 2023
@rkrasiuk rkrasiuk added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Oct 19, 2023
@gakonst gakonst closed this Jan 9, 2024
@gakonst
Copy link
Member

gakonst commented Jan 9, 2024

Closing this as stale, we'll re-incorporate these ideas as we venture into custom stages with Reth CLI Extensions and the Declarative Node Builder API that @mattsse is spearheading.

@gakonst
Copy link
Member

gakonst commented Jan 9, 2024

Appreciate you takign the tiem on these @perama-v, it's effort that we'll take into account seriously as I think this is a good idea

@perama-v
Copy link
Author

perama-v commented Jan 9, 2024

Thanks for the high level of engagement and openness in my exploration of this. I'm happy for it to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants