-
Notifications
You must be signed in to change notification settings - Fork 972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial support for parallel Soroban phase XDR. #4364
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an initial review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, parallel application of Soroban txs is exciting! I did an initial preliminary review. My main question is around scope and motivation:
- It's a bit hard to imagine how the refactored tx set is going to be used, because ledger close logic isn't wired up yet. As a result, I can't tell if the access/usage patterns are right in the context of multi-threading. Is there a reason to introduce apply-order changes before parallel application?
- (related) can you clarify the scope of this PR? if it's just refactoring tx set to be more general, then it'd be easier to make TxSetFrame more generalized internally, but continue exposing old API to make it easy for consumers. Changes to application order should probably be an isolated protocol change. This should help ship this work incrementally.
- There are some changes that I think should be factored out into a separate PR (surge pricing, test cleanup). There is no reason to block those changes on tx set refactor.
@@ -154,11 +179,10 @@ TxSetUtils::buildAccountTxQueues(TxSetTransactions const& txs) | |||
return queues; | |||
} | |||
|
|||
TxSetTransactions | |||
TxSetUtils::getInvalidTxList(TxSetTransactions const& txs, Application& app, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment here that one transaction per account is expected (or potentially pass AccountID->TxFrame map)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not the expectation here (yet), but probably that should be changed. I'll work on this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the cleanup we only support nomination of 1 tx/account, so I don't think the comment is necessary in this particular place.
src/herder/TxSetFrame.cpp
Outdated
auto const& txSetV1 = txSet.v1TxSet(); | ||
// There was no protocol with 1 phase, so checking for 2 phases only | ||
if (txSetV1.phases.size() != static_cast<size_t>(TxSetPhase::PHASE_COUNT)) | ||
// There was no protocol with 1 phase, so checking for 2 perPhaseTxs only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
phases
was a bit more clear than perPhaseTxs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's just find-and-replace gone too far, sorry.
FWIW concerning the variable naming, perPhaseTxs
is semantically different from phases
. Specifically we have transactions that may be included into a phase and we also have the 'phase' itself (TxSetPhaseFrame
) and phase might have additional internal structure, additional data etc.
src/herder/TxSetFrame.h
Outdated
|
||
Iterator(std::variant<TxFrameList, TxStageFrameList> const& txs, | ||
size_t stageIndex, size_t txIndex); | ||
std::variant<TxFrameList, TxStageFrameList> const& mTxs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a variant here? Legacy main-thread application is just a special case of multi-thread parallel application (i.e. it's a single stage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not quite the same conceptually, but I think you're right and we could just generalize the implementation (it helps that the phase frame is immutable and we shouldn't need to make the runtime assertions about the phase structure). I'll give it a shot after taking out the surge pricing changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to not use variant.
src/herder/TxSetFrame.cpp
Outdated
} | ||
|
||
bool | ||
validateNonParallelPhaseXDRStructure(TransactionPhase const& phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit on naming: we're introducing "parallel" concepts when there's no parallelization yet, which is kind of confusing for the reader. Can we make terminology implementation-agnostic? Really the concept we want here is conflict-free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to name that TBH. I guess I could just say 'v0' phase, would that be more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v0 is fine here, but my comment was more about the naming throughout the code. Might be nice to replace all mentions of parallel with conflict-free, and also add a comment that there isn't any parallelization yet to make it clear for the readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it's not quite conflict-free. I'm fine with using something like v0 or maybe 'sequential' for the existing phases, but I'm not sure if there is a better term than 'parallel' for the new phase, because it is designed to be applicable in parallel (unlike the existing phase). I'm not sure we need to obfuscate that fact just because there is no followup work merged yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used 'sequential' for v0 phase. Still not sure if there is a better way to name the parallel phase.
@@ -519,15 +519,15 @@ closeLedgerOn(Application& app, uint32 ledgerSeq, TimePoint closeTime, | |||
} | |||
else | |||
{ | |||
TxSetTransactions classic; | |||
TxSetTransactions soroban; | |||
TxFrameList classic; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding XDR changes:
- I feel uneasy about
ledgerMaxParallelThreads
: it's baking an implementation detail into the protocol (there's already enough complexity in config settings). I'm not certain how we plan to enforce this: a validator can bypass this check. Also, there are other ways to fall off the network even with this requirement satisfied (e.g. using CPUs that aren't powerful enough, etc). I think node quality shouldn't be enforced via protocol like this, but rather via qsets/social pressure. - Where is ParallelTxExecutionStage defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel uneasy aboutledgerMaxParallelThreads: it's baking an implementation detail into the protocol
It is not an implementation detail, but a part of the protocol. We have to define the lowest number of threads that we should be sure that every tier 1 validator can afford running in parallel, otherwise there is really not much point to all the parallelization work.
We indeed can't (and don't need to) enforce the number of threads to use for applying the tx set (less can definitely be used and more threads could be used depending on tx set composition, e.g. to speedup catchup). This is just the number of 'logical' threads in the tx set. Transactions in different 'logical' threads may not have data dependencies on each other and no logical thread can exceed the ledger-wide resource limit.
FWIW the actual config plumbing will be in a separate PR; this should be an XDR-only change for now.
Where is ParallelTxExecutionStage defined?
All the txset-related changes are defined in next branch of stellar-ledger.h: https://github.com/stellar/stellar-xdr/blob/619d33e14013712415e5218ca7f5fadb5524c948/Stellar-ledger.x#L210
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still pretty confused about this parameter.
It is not an implementation detail, but a part of the protocol. We have to define the lowest number of threads that we should be sure that every tier 1 validator can afford running in parallel, otherwise there is really not much point to all the parallelization work.
What I mean is that we're adding something that is an implementation detail to the protocol. There are ways Tier1 can lose sync due to wrong hardware today. This is why we have recommended SKUs and validators collectively decide what hardware makes sense for them.
We indeed can't (and don't need to) enforce the number of threads to use for applying the tx set (less can definitely be used and more threads could be used depending on tx set composition, e.g. to speedup catchup). This is just the number of 'logical' threads in the tx set. Transactions in different 'logical' threads may not have data dependencies on each other and no logical thread can exceed the ledger-wide resource limit,
So it this config about hardware or tx set properties? You mention above that the config is needed to make sure nodes can support parallelization and not fall out of sync, but also the config isn't really enforced and it's only supposed to indicate the number of conflict-free lanes in a tx set (and the tx set itself can be applied with more or less threads anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that we're adding something that is an implementation detail to the protocol.
This is as much of an implementation detail as a transaction count limit or per-ledger instruction limit for Soroban. Of course, tier 1 validators can lose sync if they're under the necessary spec, but that doesn't mean that we can apply arbitrarily large tx sets.
This is why we have recommended SKUs and validators collectively decide what hardware makes sense for them.
Right, that's why this is a config setting - network votes on the minimum necessary parallelization support (similarly to how it can vote on the operation count/instruction limits).
So it this config about hardware or tx set properties?
The config defines the number of logical threads in tx set. We could name it in a more obfuscated fashion, but I don't see much value in that, because in the end the value still maps to the minimal number of physical threads a validator must support.
but also the config isn't really enforced
It will be enforced - this PR is already quite large and config plumbing is not a part of it, as well as building tx sets for nomination.
the tx set itself can be applied with more or less threads anyway)
Sure, but the default implementation that we expect validators to use is to spawn as many threads as we have in the tx set and these threads should be able to run on the physical cores. Again, in theory a validator can sleep for 100 milliseconds after applying every transaction, but we do expect every reasonable validator to try and apply transactions as fast as possible. Threads are similar - slow hardware will still be able to apply the tx sets, but it will be lagging behind the network and there is really no way around that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as much of an implementation detail as a transaction count limit or per-ledger instruction limit for Soroban. Of course, tier 1 validators can lose sync if they're under the necessary spec, but that doesn't mean that we can apply arbitrarily large tx sets.
Sorry, I don't agree with this at all. ledger-wide limits are about bounding the amount of work needed to produce a single ledger, and preventing things like inserting an infinite loop into your contract. Number of threads is a performance tuning knob. It's not any different from any other performance knobs we have (do overlay in the background, do evictions faster, etc), and can cause nodes to be faster or slower. So the question is why should it be in the protocol?
It will be enforced - this PR is already quite large and config plumbing is not a part of it, as well as building tx sets for nomination.
How do we plan to enforce this? I don't quite understand how validators are going to prove that they meet the hardware requirements. I'm also not sure what happens if there's a protocol vote and some validators don't satisfy the new requirement (but that validator is still in everyone's quorum)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ledger-wide limits are about bounding the amount of work needed to produce a single ledger
Exactly, the thread limit has the same purpose of bounding the amount of work per-ledger on a reasonable validator. Currently 'reasonable' means 'reasonably modern CPU', with parallelization we obviously have to require a certain number of CPU cores as well (otherwise there is really no point in doing that).
Number of threads is a performance tuning knob.
It is not a performance tuning knob, it's a ledger capacity tuning knob that works in the same way as ledger-wide instruction limit.
So the question is why should it be in the protocol?
Because we need to be able to bound the ledger capacity at the protocol level. The point of parallelization is not to improve the performance by an unknown factor, it is about increasing the ledger instruction capacity without (significantly) changing the apply (and thus ledger close) time.
How do we plan to enforce this?
This configuration setting regulates the tx set validation - a valid tx set has to have no more than that many logical threads of independent transactions.
I don't quite understand how validators are going to prove that they meet the hardware requirements.
They don't need to prove that per-se, but we'll need to figure out the lowest common value that everyone thinks is reasonable. That value will need to be accounted for when setting the minimal hardware requirements for a validator. Again, there is no point in any parallelization work if we expect majority of the network to run on a single CPU. But we don't think that's the case, do we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion is around why this parameter is about bounding work. Tuning the number of threads means tuning how work is executed, but not changing the amount of it. So depending on this number, you can execute a block faster or slower. For tx set construction, I suppose the limit is about bounding total parallel work threads? In practice, this doesn't really mean much, does it? We can't enforce it on validators. And there are all the weird scenarios I was asking about earlier where the network changes the parameter, and now some Tier1 nodes aren't configured "correctly" (another scenario is a tier1 node being rebuilt on a different instance type transparently, for example).
Again, there is no point in any parallelization work if we expect majority of the network to run on a single CPU
The way I think about this is that of course the network will eventually get to a place where most nodes are running the right hardware to take full advantage of parallelization (otherwise, they get removed from quorum or lose sync). It just feels wrong (and not really enforceable) to have these things in the protocol.
Also, these types of things should be discussed and debated at avenues like the protocol meeting, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the confusion is around why this parameter is about bounding work.
Yeah, the parameter is not bounding work directly, but it's bounding the amount of modelled instructions per ledger given a machine with at least that many physical cores. From the protocol standpoint it's definition is 'tx set should have at most that many data-independent logical threads', i.e. a value of N guarantees that a machine with N cores can always execute a tx set in such a way that every core executes at most ledgerMaxInstructions
modelled instructions. It does not (and can not) guarantee the real apply time, but the expectation is that given a machine with sufficient number of cores the apply time will always be within a low-variance factor of the 'modelled' instruction count. Machines with lower number of cores may sometimes have a factor that is 2+ times larger than the modelled time, which likely will result in 2+ times longer apply time than in machines with sufficient number of cores.
In order for parallelization to work properly, majority of nodes will need to have at least a few (2-3) physical cores more than this setting specifies. If the network votes for a value that is too high, the ledger close time will go up and likely will result in re-vote for a lower value, which is why it would make sense to reach consensus on the minimum requirement for a number of validator cores offline before voting for this.
Also, these types of things should be discussed and debated at avenues like the protocol meeting, right?
Indeed, but this is an experiment/prototype. If it is successful, we'll present the CAP as well as the initial results. If not, we'll keep iterating on this. We'll still have plenty of time for protocol discussions as this is not planned for protocol 22.
src/herder/test/HerderTests.cpp
Outdated
@@ -647,10 +647,6 @@ testTxSetWithFeeBumps(uint32 protocolVersion) | |||
|
|||
TEST_CASE("txset", "[herder][txset]") | |||
{ | |||
SECTION("protocol 13") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking: this is fine to remove because the test is only testing block construction (so none of this logic is used in the replay)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we shouldn't need to create tx sets for protocol 13 anymore, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't create new protocol 13 tx sets, but we need to be careful as to not accidentally remove replay test coverage (replay also uses TxSetFrame and the whole ledgerClose flow). That being said, I agree all tests covering surge pricing and block construction can be removed because replay just uses ready tx sets directly from the archives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra tests should have been removed in the cleanup PR, I only removed one helper here that is no longer used.
// to determine a stable index of every transaction in the phase, even if | ||
// the phase is parallel and can have certain transaction applied in | ||
// arbitrary order. | ||
class Iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels a bit premature: ledger close isn't yet hooked up to do anything in parallel, so it's hard to tell how this is going to be used and whether any synchronization needs to be taken into account. I'm curious about the motivation and scope of these changes, given that they seem to be preceding actual parallel application changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changeset in general has to precede the actual application changes, because in order to apply a tx set one needs to have an appropriate schedule. Moreover, the purpose of iterator is to actually hide the internal structure of the phase for the scenarios where we don't care about the parallel execution at all (e.g. when we just want to check if every tx in the phase is valid etc.).
(I should mention that this is my first time tapping into soroban parallelization work, so please bear with me if I'm missing context) |
This PR actually has to go first, because we can't properly wire up the parallel application logic if we don't have a proper schedule to use. FWIW there is no multi-threading here at all; the only meaningful interface change is that we'll expose the 'parallel' application schedule from the respective phase (instead of a transaction sequence). I probably mentioned this elsewhere, but to be clear we have to build a tx set in a proper fashion in order to be able to apply it in parallel (specifically we need to run the transactions with data dependencies in the same thread or different stages).
The idea here is to provide support for a new phase type in the TxSetFrame; I've tried to limit the API changes as much as possible and basically the main change is that instead of exposing all the transactions at once we expose the phases separately (that just required an additional loop in a few places). I realize that some stuff could be moved to separate PRs; I'll at least move all the surge-pricing related stuff. That said, I'm not sure the interface changes would make much sense without at least the stub for supporting the new XDR (otherwise it's hard to tell what's the motivation of the changes and whether they look reasonable). I could try separating that as well, but I'm not sure it will help reviewing the design decisions. |
f38272f
to
fcf19f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rebased and updated the PR. There is no rush to merge it into the minor release, so I suppose it'll need to wait more.
@@ -154,11 +179,10 @@ TxSetUtils::buildAccountTxQueues(TxSetTransactions const& txs) | |||
return queues; | |||
} | |||
|
|||
TxSetTransactions | |||
TxSetUtils::getInvalidTxList(TxSetTransactions const& txs, Application& app, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the cleanup we only support nomination of 1 tx/account, so I don't think the comment is necessary in this particular place.
src/herder/TxSetFrame.h
Outdated
|
||
Iterator(std::variant<TxFrameList, TxStageFrameList> const& txs, | ||
size_t stageIndex, size_t txIndex); | ||
std::variant<TxFrameList, TxStageFrameList> const& mTxs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to not use variant.
src/herder/TxSetFrame.cpp
Outdated
} | ||
|
||
bool | ||
validateNonParallelPhaseXDRStructure(TransactionPhase const& phase) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used 'sequential' for v0 phase. Still not sure if there is a better way to name the parallel phase.
src/herder/test/HerderTests.cpp
Outdated
@@ -647,10 +647,6 @@ testTxSetWithFeeBumps(uint32 protocolVersion) | |||
|
|||
TEST_CASE("txset", "[herder][txset]") | |||
{ | |||
SECTION("protocol 13") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra tests should have been removed in the cleanup PR, I only removed one helper here that is no longer used.
ac3b7db
to
93c9541
Compare
8bd3d36
to
b930f32
Compare
This PR should be rebased on top of the most recent changes, including the tx set nomination tests that should add some confidence in the refactoring. |
8ce0064
to
6afbc17
Compare
6afbc17
to
ba8470e
Compare
Do you have a testcase that pushes a vnext valid parallel tx set in v22? |
Parallel tx sets are not supported in v22, but if you meant v23, there are several tests covering it: |
Once we deploy a version with v23 xdr, but before the v23 protocol upgrade, it'll be possible to receive a parallel tx set over the wire that we should toss right? That's the scenario I'm referring to. |
Yeah, good catch, I've added a test that verifies the phase format<->protocol version correspondence. |
77e922b
to
053aa57
Compare
I've tried to minimize the scope of the changes; specifically this doesn't contain any actual logic for the parallel execution (such as data dependency validation and building parallel stages). However, there is still some refactoring that needed to happen in order to support new, more complex tx sets.
053aa57
to
7ec454d
Compare
… as we never reference the renamed fields in code.
Description
I've tried to minimize the scope of the changes; specifically this doesn't contain any actual logic for the parallel execution (such as data dependency validation and building parallel stages). However, there is still some refactoring that needed to happen in order to support new, more complex tx sets.
I've also removed some legacy surge pricing logic and the corresponding tests - we should no longer need the surge pricing logic that covers more than one source account per tx set.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)