-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test kernel GC code for large-cardinality operations #8402
Comments
It really depends. If this gc processing occurs during a chain upgrade, and we can arrange it to run to completion during the upgrade block (unlimited run policy), I think it may actually be preferable to do it then instead of spreading it over multiple later blocks. |
I agree it's a schema change, but we can perform schema migrations on upgrades such that the new kernel code only needs to deal with the new schema (only the migration code needs to be aware of the old schema) |
I don't quite follow what requires special backwards compatibility. The way I see it, there are 2 level of changes:
I actually do not understand why we have this higher prioritization. The way I see it, |
Yeah, just doing more+smaller syscalls is easy.
I think you're right, and it's not necessary. I remember wanting to get GC out of the way promptly, and thus draining the GC-action queue before looking at the run-queue. But I also remember being defensive against the block ending (due to the runPolicy) and device inputs getting a chance to inject messages (changing refcounts) while there were still items on the GC-action queue. So each GC-action is examined to see whether it is still relevant: if the queue says to tell vat
I'm not sure I want our hypothetical scheduler to be obligated to drain a vat's outbound queue before allowing the deliveries of more inbound queue items. But, the deeper concern I had is the "ships in the night" problem, and I think there are changes we might make which could introduce this problem. But I'd have to think about it more carefully. I know the comms protocol has this, and I remember discussing solutions with Mark and Dean that included incref/decref messages and counters on both sides. In the swingset context, the concern would be a vat which announces Basically I'm just reminding myself that it could be tricky, and to think about it further if we consider deeper changes. |
The "process outbound queue items before processing the next inbound queue item" is a current design consequence of the 2 queues per vat approach (#5025), motivated in part by this problem of "ships in the night". dropImport and re-introduction is not the only case, but also promise forwarding and pipelining (which are not currently a concern for liveslots vats, but there is no reason to keep this limitation long term). I think we need to solve this problem for all these cases, not just for GC related messages. A simple way to prevent this problem right now is by disallowing any concurrent crossing at the vat/kernel boundary. It's definitely something we need to figure out by the time we tackle parallel execution (#6447). Luckily the problem is simpler than comms because the boundary between kernel / vat can be made half duplex without much penalty, even when vats live in a separate process.
Yes that is likely the correct approach to remove the vat queues processing order restriction. The way I see it, the per vat queues are a part of the "vat", but not a part of liveslots. That means we need some vat logic outside the xsnap worker that is able to remove |
I wrote a quick test of kernel behavior in the face of large GC syscalls, and it appears to take linear time. A few things are implemented better than I feared:
My test of just the kernel-side uses a fake vat (no liveslots) that calls
|
.. however, when doing a ghost-replay mainnet simulation of deleting v29-ATOM-USD_price_feed (which, in run-5, participates in 50k #8401 cycles), the process took over 30 minutes (the drop itself took 7min, and the subsequent
When I performed the same test with run-9 (in which there are 79k cycles), the kernel process crashed (OOM) after four hours. I don't know what exactly went wrong, but the BOYD syscalls should have scaled linearly (so maybe 4M syscalls), and joining the 358MB transcript entry might have blown past some memory limits, or triggered some superlinear allocation/string-copying behavior in V8. Even if we find+fix the time/OOM problem, we're still talking about a massive transcript entry. So I think that puts a practical limit on the number of syscalls that any single crank can make, just to manage the resulting transcript entry size (unless/until we move to a That means we either need to limit the number of vrefs we tell a vat about before asking it to BOYD (rate-limiting in the kernel), or ask liveslots to limit the number of vrefs it processes during each BOYD (rate-limiting in the vat, maybe expanding the scope of #8417 to include |
I built a test that uses a non-liveslots local-worker vat, which just does a large number of trivial syscalls ( The runtime is loosely proportional to the number of syscalls, but:
In the output below, we emit
So 1.12M syscalls took one second to serialize, and wrote a 58MB item to |
I'd rather the kernel kept that list. In the vat it would be stored in the heap (what about upgrades?), and it wouldn't actionable until another BYOD delivery from the kernel anyway. |
I built another test to emulate the #8401 cycles: it makes two vats, each with a The setup involves temporaries (RAM pillars) that need to be collected before we move to the termination phase, so I do a BOYD at the end of setup. Setup runs a lot faster when I create the cycles in batches (sending an array of handles/holders instead of a single one). Batches of 1k seemed to work well, I noticed a slight slowdown for 2k. The setup phase is quite linear, as long as I force a BOYD after each batch (otherwise the temporaries accumulate and the final BOYD is very large, and runs into the same accumulate-many-syscalls problem that the termination phase triggers). In the the teardown phase, I terminate the left vat and force a BOYD. The kernel drops all the terminated vat's imports, which propagate to the right vat as I observed the beginnings of superlinearity in the terminate phase between 32k and 64k cycles (2.2x instead of 2.0x), and clear problems at 128k (30x instead of 2.0x). In this table, all times are in seconds, and
To make this a more useful test, I'd like it to fail earlier. The actual Zoe vat has a number of other objects referenced by the cycle (but not supporting it), which will increase the number of syscalls it does during the BOYD. My next step will be to add this "ballast" and see if I can get a test+count that can be set up in a reasonable time but which both completes in a tolerable time and shows clear signs of superlinearity, so we can confidently use it to validate our code fix (#8417). |
I think I've done enough testing here. The kernel-side transcript limitations means that we need to avoid large cranks (lots of syscalls), and the earliest way to accomplish that is to delete vats slowly (#8928), for which most of the code has just landed (just the cosmic-swingset integration part is left). That's not a complete fix, we still need defenses against vats which do to much, but we can't survive deleting the large price-feed vats (that resulted from #8400 / #8401) without slow deletion. So I'm going to close this ticket now, I think it's done its job. |
What is the Problem Being Solved?
Some of the remediations for #8400 and #8401 might cause a vat to drop/retire 100k or more vrefs in a single crank. We'll try to avoid that, but the kernel should be prepared to cope with GC operations on that scale.
When I implemented the kernel GC system, I expected most vats would release a few dozen objects at a time, so I'm worried about things like:
syscall.dropImport
/etc implementation have any quadratic behavior in processing the vrefs?JSON.parse(); pop(); JSON.stringify()
, which is quadratic, and I don't know if I ever fixed thatIn addition to fixing the algorithmic complexity, we should implement some rate-limiting, and interleave GC work with "real"/productive vat deliveries. The current implementation seeks to drain the GC queue before touching the run-queue. GC actions are not limited by a meter, but I think their deliveries will still count against the
runPolicy
, so we might experience a series of full blocks that still perform no deliveries because all the computrons are spent on GC. (This might not be a bad thing if it only took a minute, but I'm sure folks would be upset if the chain took an hour to get through the backlog).Description of the Design
The main task is to implement a unit test that uses fakeGC to build a vat with e.g. 10k imports, and then drop them all in a single syscall. The test should finish in a reasonable amount of time. The test should include some instrumentation (perhaps count the number of kvStore accesses) and assert some kind of limit. We should manually change the 10k to other values and measure usage, graph it, and convince ourselves that it's linear, at least over the range of sizes we care about.
Then we should identify and fix any problems that the test reveals.
If we find that it's too hard to achieve that, we should document a practical upper limit that vats are subject to, and consider adding defenses at the kernel-side syscall handlers to reject/ignore attempts to exceed them.
Security Considerations
Vats must not be able to impact the kernel by dropping a lot of objects at the same time.
Scaling Considerations
Kernel operations should scale well with the number of objects being dropped or retired.
Test Plan
The final PR should include a test that runs in a reasonable amount of time, which exercises a sizable GC event, and asserts that some measurable consequence (e.g. number of kvStore reads) is adequately small.
Upgrade Considerations
One outcome might be that we change the way we store the GC action queue, which amounts to a kvStore schema change. The kernel needs to tolerate multiple versions of the kvStore, both the old and new approaches. We must have unit tests to verify this, which will probably involve manually editing a blank
kernelStorage
to create the old-style keys.It might also create deeper changes in the GC relationship between vats and the kernel, such as deferring
syscall.dropImport
consequences until later. This would be a significant change to the protocol (e.g. pipelining drops, which needs protection against "ships crossing in the night" desync problems), and the new kernel would need to handle both versions (e.g. be backwards-compatible with old liveslots, as running in any existing vats).The text was updated successfully, but these errors were encountered: