• C
    Define WAL related classes to be used in VersionEdit and VersionSet (#7164) · cd48ecaa
    Cheng Chang 提交于
    Summary:
    `WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
    `WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
    `WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).
    
    `WalSet` is the set of alive WALs kept in `VersionSet`.
    
    1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs
    
    On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.
    
    But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.
    
    We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.
    
    In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.
    
    2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`
    
    `VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.
    
    But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
    Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
    So we keep the WALs in `VersionSet`  for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.
    
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164
    
    Test Plan:
    make version_edit_test && ./version_edit_test
    make wal_edit_test && ./wal_edit_test
    
    Reviewed By: ltamasi
    
    Differential Revision: D22677936
    
    Pulled By: cheng-chang
    
    fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
    cd48ecaa
Makefile 79.7 KB