• A
    Auto recovery from out of space errors (#4164) · a27fce40
    Anand Ananthabhotla 提交于
    Summary:
    This commit implements automatic recovery from a Status::NoSpace() error
    during background operations such as write callback, flush and
    compaction. The broad design is as follows -
    1. Compaction errors are treated as soft errors and don't put the
    database in read-only mode. A compaction is delayed until enough free
    disk space is available to accomodate the compaction outputs, which is
    estimated based on the input size. This means that users can continue to
    write, and we rely on the WriteController to delay or stop writes if the
    compaction debt becomes too high due to persistent low disk space
    condition
    2. Errors during write callback and flush are treated as hard errors,
    i.e the database is put in read-only mode and goes back to read-write
    only fater certain recovery actions are taken.
    3. Both types of recovery rely on the SstFileManagerImpl to poll for
    sufficient disk space. We assume that there is a 1-1 mapping between an
    SFM and the underlying OS storage container. For cases where multiple
    DBs are hosted on a single storage container, the user is expected to
    allocate a single SFM instance and use the same one for all the DBs. If
    no SFM is specified by the user, DBImpl::Open() will allocate one, but
    this will be one per DB and each DB will recover independently. The
    recovery implemented by SFM is as follows -
      a) On the first occurance of an out of space error during compaction,
    subsequent
      compactions will be delayed until the disk free space check indicates
      enough available space. The required space is computed as the sum of
      input sizes.
      b) The free space check requirement will be removed once the amount of
      free space is greater than the size reserved by in progress
      compactions when the first error occured
      c) If the out of space error is a hard error, a background thread in
      SFM will poll for sufficient headroom before triggering the recovery
      of the database and putting it in write-only mode. The headroom is
      calculated as the sum of the write_buffer_size of all the DB instances
      associated with the SFM
    4. EventListener callbacks will be called at the start and completion of
    automatic recovery. Users can disable the auto recov ery in the start
    callback, and later initiate it manually by calling DB::Resume()
    
    Todo:
    1. More extensive testing
    2. Add disk full condition to db_stress (follow-on PR)
    Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
    
    Differential Revision: D9846378
    
    Pulled By: anand1976
    
    fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
    a27fce40
error_handler_test.cc 20.9 KB