• C
    Do not track obsolete WALs in MANIFEST even if they are synced (#7725) · 07030c6f
    Cheng Chang 提交于
    Summary:
    Consider the case:
    1. All column families are flushed, so all WALs become obsolete, but no WAL is removed from disk yet because the removal is asynchronous, a VersionEdit is written to MANIFEST indicating that WALs before a certain WAL number are obsolete, let's say this number is 3;
    2. `SyncWAL` is called, so all the on-disk WALs are synced, and if track_and_verify_wal_in_manifest=true, the WALs will be tracked in MANIFEST, let's say the WAL numbers are 1 and 2;
    3. DB crashes;
    4. During DB recovery, when replaying MANIFEST, we first see that WAL with number < 3 are obsolete, then we see that WAL 1 and 2 are synced, so according to current implementation of `WalSet`, the `WalSet` will be recovered to include WAL 1 and 2;
    5. WAL 1 and 2 are asynchronously deleted from disk, then the WAL verification algorithm fails with `Corruption: missing WAL`.
    
    The above case is reproduced in a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal`.
    
    The fix is to maintain the upper bound of the obsolete WAL numbers, any WAL with number less than the maintained number is considered to be obsolete, so shouldn't be tracked even if they are later synced. The number is maintained in `WalSet`.
    
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/7725
    
    Test Plan:
    1. a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal` is added.
    2. run `make crash_test` on devserver.
    
    Reviewed By: riversand963
    
    Differential Revision: D25238914
    
    Pulled By: cheng-chang
    
    fbshipit-source-id: f5dccd57c3d89f19565ec5731f2d42f06d272b72
    07030c6f
db_impl_open.cc 67.8 KB