-
Notifications
You must be signed in to change notification settings - Fork 214
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
price-feed vats hold old QuotePayments in recovery set, causing storage leak #8400
Comments
@warner wrote:
|
Note to self: to measure the size of this problem, use
|
refs: #8740 closes: #8918 refs: #8400 ## Description Add a new Auction instance in A3P, so #8757 can make use of it. Also provides upgrade proposals which can be be applied to MainNet and other chains. ### Security Considerations N/A ### Scaling Considerations This is largely in service of #8400, which reports that priceFeed vats are accumulating garbage. This PR gives a new auction which can rely on new priceFeeds. The existing auction is not upgradeable and its pricefeeds can't be updated. ### Documentation Considerations No user-visible changes to behavior. ### Testing Considerations Tested in A3P ### Upgrade Considerations Auctions are not upgradeable, so we have to replace them and update their clients.
closes: #8049 closes: #8740 closes: #8868 closes: #8918 closes: #8981 closes: #8079 refs: #8400 closes: #8735 closes: #7873 closes: #8726 closes: #7954 closes: #8757 closes: #8728 closes: #8789 ## Description Upgrade **VaultFactory** in A3P, relying on the new PriceFeeds, and auctions. The actual upgrade waits for the priceFeeds to start supplying before doing the upgrade, so there won't be any gap in priceUpdates. When the upgrade is finished, we also update the auctioneerKit and Auction instance in the bootstrap environment. This PR demonstrates that VaultFactory can be upgraded even though governance is not persistent (#8123). ### Security Considerations N/A ### Scaling Considerations This is largely in service of #8400, which reports that priceFeed vats are accumulating garbage. This PR switches to new priceFeeds, which won't have that problem, though cleaning up the existing vats is a task for the future. ### Documentation Considerations No changes to user-visible behavior. ### Testing Considerations A3P tests that verify that vaultFactory has been upgraded, that a new Auctioneer is running and is receiving prices. Verify that when prices drop, assets are sold via the auction, the bidder gets the proceeds, and the vaults are liquidated or reconstituted appropriately. ### Upgrade Considerations Upgrade all the vats!
refs: #8400 refs: #8498 closes: #9371 ## Description #8400 reported that the priceFeed vats hold onto old quotes in their recoverySet, preventing them from being collected. #9283 applied the fixes to master. These fixes address the growth in priceFeed vats and Zoe, but scaledPriceAuthorities were still growing. This resolves that problem by upgrading scaledPriceAuthority contracts to not use their recoverySets. <details> <summary><b>Expand</b> for performance graphs</summary> <img width="809" alt="image" src="https://github.com/Agoric/agoric-sdk/assets/13372636/889bb283-89c8-434f-8a67-efa56d0382ad"> Kernel allocation is in blue, and the scale is on the left. It varies from 48862 to 49743, with a small amount of long-term growth.The other active vats (v9=Zoe, v29=ATOM-USD_priceFeed, v43=wallet, 72=ATOM-USD_priceFeed, 74=auctioneer) use the scale on the right, with Zoe varying tightly around 3600, and the others low and stable. scaledPriceAuthority-ATOM doesn't have enough variation to be worth graphing. </details> ### Security Considerations Upgrade existing contracts. No new vulnerabilities. ### Scaling Considerations This addresses the largest known category of growth on the chain. ### Documentation Considerations Add some documentation on creating proposals. ### Testing Considerations Tested in A3P. We should exercise all the clients of priceFeeds in our test environments. ### Upgrade Considerations This PR includes a proposal that will upgrade all vats with `scaledPriceAuthority` in their label. That should work on or test chains as well as MainNet. These changes should be included in the next release.
@warner to make another ticket describing the planned remediation. We will close this ticket once the sub tickets are landed on 16. |
refs: #9886 ## Description Some tests in DevNet indicate that the Oracles need the roundID in vstorage to be reset so they can sync. ### Security Considerations Not a security issue. ### Scaling Considerations Not directly a scaling concern. (see #8400). ### Documentation Considerations not user visible. It would be nice if the protocol between oracles and fluxAggregators were documented somewhere ### Testing Considerations updated the a3p tests to increment the roundId before the upgrade and show that this upgrade resets it. ### Upgrade Considerations This change is to the fluxAggregator which is already being started afresh in this proposal.
closes: #9370 ## Description The priceFeed proposal has been tested as an independent coreEval, but the current plan calls for it to be included in Upgrade 18. This adds it to `upgrade.go` and moves the tests to `n:upgrade-next`. ### Security Considerations No additional security implications. ### Scaling Considerations Addresses pre-existing scaling concerns (see #8400), and adds no new scaling issues. ### Documentation Considerations None ### Testing Considerations An existing test is moved to a new location. The current version seems to tickle #10292, so tests may not pass at this point. ### Upgrade Considerations Targeted for upgrade 18.
Describe the bug
I spent a few days investigating storage usage on mainnet, and learned that about 60% of our kvstore entries are the result of a bug in the price feed contracts, and this count grows by about 10 rows for every new price quote created.
External oracles submit messages (signed txns) into the chain with claims about the e.g. ATOM/USD price at a given time. These get executed as contract operations (create an invitation with args that include the price claim, make an offer, get a seat, etc). The price feed contract then publishes a price quote to subscribers, such as an aggregator that averages/etc quotes from multiple sources to drive Vault creation or liquidation.
While the authority to produce a quote is limited by virtue of the wallets which have access to the price feed's objects, the quote objects themselves might be delivered to arbitrary subscribers (over untrusted paths), and those subscribers want to verify that the quote object they received is genuine. The current implementation uses ERTP
Payment
objects to represent the price quotes, and subscribers can callE(priceQuoteIssuer).getAmountOf(allegedQuote)
to confirm their authenticity.generating a quote object:
agoric-sdk/packages/zoe/src/contracts/priceAggregator.js
Lines 103 to 109 in 92b6cd7
verifying one:
agoric-sdk/packages/zoe/src/contractSupport/priceAuthorityTransform.js
Lines 90 to 92 in 92b6cd7
However, subscribers like the
scaledPriceAuthority
stop there: the quote payment is dropped, without being deposited or burned.This interacts badly with a safety feature of ERTP: the Recovery Set. To guard against assets being lost, each time you call
payment = await E(purse).withdraw()
, the payment is added to the Purse's "recovery set": a strong SetStore. If you send the Payment to a correspondent, but they fail to deposit it, you can callgetRecoverySet()
to access all these Payments, and then deposit them back into a Purse. Or, you can callrecoverAll()
to sweep them back into the original Purse. Brand new tokens, created byE(mint).mintPayment()
, are held in a specialmintRecoveryPurse
, available from the IssuerKit. All Payments are removed from their Recovery Set as soon as they are deposited in some other Purse, or burned.Recovery normally isn't necessary, because well-written code will deposit a Payment as soon as it receives it, and only some sort of error would prevent that from happening. However, quotes-as-payments aren't assets to be hoarded: the code only uses ERTP Payments for their authenticity-verification properties, not their exclusive-transfer properties. So it is not surprising that the price contracts fail to
deposit()
orburn()
them.The consequence is that the Payment remains in the recovery set forever.
The contract and ERTP code uses a variety of SetStores, WeakMapStores, and Exos to express this logic. Swingset (specifically liveslots) supports those with "virtual collections" and "virtual objects", all of which bottoms out in a per-vat key-value table named the "vatstore". This holds rows to represent collection entries, ordinals to enable consistent iteration through collections, reference counts, weak-key reference-graph edges, export statuses, and finally the virtual-object state itself.
The particular data structures used by ERTP means that each Payment costs about 10 vatstore entries. In the following diagram, each circle is a vatstore entry, of various types.
And this is an example of one particular (still-extant) payment, where each gray box is a separate vatstore entry:
Since each vatstore entry is really stored in the kernel's
kvStore
(a table inside the "SwingStore" SQLite DB), they cost 10 kvStore entries each. And because state-sync integrity requires us to mirror all kvStore items into the IAVL database, they also cost 10 IAVL entries each. We've been creating quotes once every five minutes since the Vaults feature was released last June, and by now we have about 120k quote payments kept alive in v29 (the ATOM-USD price-feed vat). A similar problem is keeping 90k payments alive in v46 (scaledPriceAuthority). As we add new denominations in the future, their price feed vats will have the same issue. I estimate this currently costs 440MB of space (combining both swingstore and IAVL), growing at 5MB/day.Additional Costs
v46 (scaledPriceAuthority) has 90k quote payments in this state, however it has also exported about 30k of them to v7-board. That vat is still holding weak references (c-list entries in the "recognizable" state, as opposed to "reachable"). This causes additional kvstore entries for each one: two for the export-side c-list, two in the kernel object table, two in the import-side clist, and some amount of RAM in v7-board.
We believe that, at some point, v46 received the payment and sent it to
E(board.readOnlyMarshaller).toCapData()
. The serialization process submits the received quote-payment Presence object topassStyleOf
, which would have concluded that the Presence waspassable
, and added the object to it's WeakMap cache. This WeakMap is virtualized by liveslots (to prevent userspace from sensing GC, as usual), andVirtualObjectAwareWeakMap
remembers a real Map keyed by vref strings. So v7-board will spend some (small) amount of RAM on each such object. These WeakMap references cause the c-list to remain in the "recognizable" state: they do not represent a strong ("reachable") reference.Both the c-list/kvstore entries and the RAM costs will go away if/when the quote payment objects are released.
We've been considering allowing the Endo/
passStyleOf
code to use a real WeakMap instead of the virtualized one. This requires the permission of liveslots (otherwise arbitrary user code could do it, violating our GC-sensing rules), so coordination between Swingset and Endo is ongoing. If implemented, this would remove the separatepassStyleOf
cache and would probably allow v9-board to avoid spending either RAM or c-list imports on the object.Potential Fixes
To flatten this growth rate, at today's Zoe meeting we discussed one short-term fix named "burn after reading": clients should burn their quote payment objects after verifying them. For example, the v29 price-feed quotes are received by v46-scaledPriceAuthority, where
priceAuthorityTransform.js
, could do:We would need similar changes in all other places that receive quote payments. In particular, the scaled/aggregated quotes produced by v46-scaledPriceAuthority are delivered to v48-vaultFactory, where something needs its own
burn()
call.Another approach would be to have the producing code rotate through recovery purses every hour or so:
.withdraw()
a new one from that purse and send the second payment to the clientmintRecoveryPurse
, which is closed over bymintPayment()
and cannot be changed or destroyed short of a vat upgrade which first modifiesbaggage
to remove the referenceIn the longer term, we should find a lighter-weight means to verify the authenticity of price quotes. ERTP Payments offer both verification (through
issuer.getAmountOf()
) and exclusive transfer (throughpurse.deposit(payment)
), but this use case does not require exclusive transfer. The ocap community provides other tools for mere verification, like the Branding pattern. The "issuer" replacement can retain a WeakMap of quote objects, and offer an API to performvalidQuotes.has(allegedQuote)
.Remediation
In addition to fixing new price quotes, we also need to dispose of the hundreds of thousands of existing ones. One idea from today's meeting:
mintRecoveryPurse
, fetch its recovery set, iterate through a few dozen payments, andburn
them allAnother approach would be to have an upgraded producer vat immediately perform
mintRecoveryPurse.recoverAll()
(perhaps assuming that no quote payments are outstanding at the moment of upgrade, or accepting one quote getting invalidated before reading). However we need to be cognizant of the size of the recovery set: this would cause v29 to immediately delete about a million items from its vatstore, requiring maybe ten million syscalls, all in a single crank. Who knows how long that might take. In addition, that would remove a million rows from the IAVL tree, and we know that IAVL pruning takes a long time when there has been a lot of churn, so each validator might experience a significant (vote-missing) delay some arbitrary number of blocks after the upgrade (depending upon their individual pruning configuration).Note that different approaches require upgrades to different vats, and other bugs make some vats are more practical to upgrade than others. In particular, we'd like to fix the price-feed code before making a lot of new deployments, to avoid adding more load to the system, but eventually we need to fix (and also remediate) the existing vats.
cc @Chris-Hibbert @erights
The text was updated successfully, but these errors were encountered: