You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I run RocksDB 9.7.4 on BlueFS based on remote cloud disks. When I disconnect the disks and reconnect again, I found that I cannot open the DB because the SST size is wrong:
Corruption: file is too short (3582 bytes) to be an sstableblue/000105.sst
Then I open the MANIFEST file and find that it already marked the SST as 3582 bytes, but its actual size is only 0 bytes. It seems that the RocksDB writes the MANIFEST file BEFORE it writes the SST file.
It never happens when I used RocksDB 6.11.4 before for several months.
The text was updated successfully, but these errors were encountered:
It seems that the RocksDB writes the MANIFEST file BEFORE it writes the SST file.
Hi @NeverMore2744, for flush and compaction, RocksDB always writes the SST file first before adding them to the MANIFEST. I would suspect that this is more likely to be a file system or storage issue.
Hi @NeverMore2744, for flush and compaction, RocksDB always writes the SST file first before adding them to the MANIFEST. I would suspect that this is more likely to be a file system or storage issue.
Hi @cbi42 , thank you for your comment. It makes sense to me, and I wonder if it is related to some write options of writing SSTs and MANIFEST. For example, the DB writes asynchronously to SST (or looks to be synchronous but the storage deals with it asynchronously) so the SST write returns immediately, and then the DB updates the MANIFEST.
In my RocksDB options, we should be using direct I/O, i.e., use_direct_io_for_flush_and_compaction=true. It may not guarantee synchronous writes, since "direct" is different from "synchronous"?
@NeverMore2744 Sorry for the late reply. We don't do asynchronous writes to SST during flush or compactions, and we finish the writes to SST files and synced the file before we update the MANIFEST.
I run RocksDB 9.7.4 on BlueFS based on remote cloud disks. When I disconnect the disks and reconnect again, I found that I cannot open the DB because the SST size is wrong:
Corruption: file is too short (3582 bytes) to be an sstableblue/000105.sst
Then I open the MANIFEST file and find that it already marked the SST as 3582 bytes, but its actual size is only 0 bytes. It seems that the RocksDB writes the MANIFEST file BEFORE it writes the SST file.
It never happens when I used RocksDB 6.11.4 before for several months.
The text was updated successfully, but these errors were encountered: