Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STR-809 Fix Bridge Client Bug and add functional Test #564

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

voidash
Copy link
Contributor

@voidash voidash commented Dec 26, 2024

Description

The nature of websocket client is that it requires restart once the connection drops. Due to this bridge-client stops if the connection to strata-client drops which is not the desired and must be handled. This PR fixes the bug by using managed connection pool calleddeadpool.

Note

The functional test deliberately contains duplicated code which can be fixed after STR-734 is merged. STR-734 contains BridgeTestBase class that will be utilized by this functional test after rebase. I opened this PR to discuss the bug fix part

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature/Enhancement (non-breaking change which adds functionality or enhances an existing one)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactor
  • New or updated tests
  • Dependency Update

Notes to Reviewers

Checklist

  • I have performed a self-review of my code.
  • I have commented my code where necessary.
  • I have updated the documentation if needed.
  • My changes do not introduce new warnings.
  • I have added tests that prove my changes are effective or that my feature works.
  • New and existing tests pass with my changes.

Related Issues

STR-809

@voidash voidash requested review from a team as code owners December 26, 2024 15:27
Copy link
Contributor

@delbonis delbonis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way errors are reasoned about here needs to be improved, it's written assuming they can't really happen.

Comment on lines 37 to +38
/// The RPC client to connect to the RPC full node.
pub l2_rpc_client: L2Client,
pub l2_rpc_client_pool: Pool<WsClientManager>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make a type alias like type WsClientPool = Pool<WsClientManager>; that can be re-exported from where WsClientManager is defined instead of having to make downstream consumers import the Pool type.

info!(scope=?scope, "getting messages from the L2 Client");
let received_payloads = self
.l2_rpc_client
info!(scope = ?scope, "getting messages from the L2 Client");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can just be ?scope.

Comment on lines +77 to +78
Err(_) => WsClientState::NotWorking,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to create a new instance and it starts off as being NotWorking that should be an error right? When do we update the state from Working to NotWorking?

Comment on lines +124 to +156
WsClientState::NotWorking => Err(ClientError::Transport(BoxError::from(
"Client is Not Working".to_string(),
))),
}
}

/// Sends a method call request.
async fn request<R, Params>(&self, method: &str, params: Params) -> Result<R, ClientError>
where
R: DeserializeOwned,
Params: ToRpcParams + Send,
{
match &self.0 {
WsClientState::Working(inner) => inner.request(method, params).await,
WsClientState::NotWorking => Err(ClientError::Transport(BoxError::from(
"Client is Not Working".to_string(),
))),
}
}

/// Sends a batch request.
async fn batch_request<'a, R>(
&self,
batch: BatchRequestBuilder<'a>,
) -> Result<BatchResponse<'a, R>, ClientError>
where
R: DeserializeOwned + fmt::Debug + 'a,
{
match &self.0 {
WsClientState::Working(inner) => inner.batch_request(batch).await,
WsClientState::NotWorking => Err(ClientError::Transport(BoxError::from(
"Client is Not Working".to_string(),
))),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a cleanliness thing you could have a function like make_notworking_err to dedup constructing this complex error instance.

Comment on lines +19 to +28
# Local constants
# D BTC
DEPOSIT_AMOUNT = DEFAULT_ROLLUP_PARAMS["deposit_amount"]
# Gas for the withdrawal transaction
WITHDRAWAL_GAS_FEE = 22_000 # technically is 21_000
# Ethereum Private Key
# NOTE: don't use this private key in production
ETH_PRIVATE_KEY = "0x0000000000000000000000000000000000000000000000000000000000000001"
# BTC Operator's fee for withdrawal
OPERATOR_FEE = DEFAULT_ROLLUP_PARAMS["operator_fee"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces between each of these after each assignment before the next comment to make them easier to read.

Comment on lines +125 to +129
let l2_rpc_client = self
.l2_rpc_client_pool
.get()
.await
.map_err(|err| ExecError::WsPool(err))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make an accessor on the type that wraps this repeated pattern so you can just do something like self.get_ready_rpc_client().await?.

Comment on lines +39 to +44
if let Ok(RpcBridgeDuties {
duties,
start_index,
stop_index,
} = self.poll_duties().await?;

let mut handles = JoinSet::new();
for duty in duties {
let exec_handler = self.exec_handler.clone();
let bridge_duty_ops = self.bridge_duty_db_ops.clone();
let broadcaster = self.broadcaster.clone();
handles.spawn(async move {
process_duty(exec_handler, bridge_duty_ops, broadcaster, &duty).await
});
}
}) = self.poll_duties().await
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this to an if let like this is a minor change but it makes the structure much different. This should reason about the error conditions more explicitly.

I don't even see the error case being handled here, we need to do that. From what I can tell, if this errors immediately (like because the connection is broken) then we just spin in a hot loop eating a CPU core until we don't encounter the error anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants