1. 24 8月, 2018 1 次提交
  2. 04 5月, 2018 1 次提交
    • S
      Skip deleted WALs during recovery · d5954929
      Siying Dong 提交于
      Summary:
      This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      
      Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)
      
      This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
      Closes https://github.com/facebook/rocksdb/pull/3765
      
      Differential Revision: D7747618
      
      Pulled By: siying
      
      fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
      d5954929
  3. 24 4月, 2018 1 次提交
    • S
      Revert "Skip deleted WALs during recovery" · d5afa737
      Siying Dong 提交于
      Summary:
      This reverts commit 73f21a7b.
      
      It breaks compatibility. When created a DB using a build with this new change, opening the DB and reading the data will fail with this error:
      
      "Corruption: Can't access /000000.sst: IO error: while stat a file for size: /tmp/xxxx/000000.sst: No such file or directory"
      
      This is because the dummy AddFile4 entry generated by the new code will be treated as a real entry by an older build. The older build will think there is a real file with number 0, but there isn't such a file.
      Closes https://github.com/facebook/rocksdb/pull/3762
      
      Differential Revision: D7730035
      
      Pulled By: siying
      
      fbshipit-source-id: f2051859eff20ef1837575ecb1e1bb96b3751e77
      d5afa737
  4. 31 3月, 2018 1 次提交
    • M
      Skip deleted WALs during recovery · 73f21a7b
      Maysam Yabandeh 提交于
      Summary:
      This patch record the deleted WAL numbers in the manifest to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      Closes https://github.com/facebook/rocksdb/pull/3488
      
      Differential Revision: D6967893
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 13119feb155a08ab6d4909f437c7a750480dc8a1
      73f21a7b
  5. 01 12月, 2017 1 次提交
    • M
      WritePrepared Txn: PreReleaseCallback · 18dcf7f9
      Maysam Yabandeh 提交于
      Summary:
      Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
      These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
      Closes https://github.com/facebook/rocksdb/pull/3205
      
      Differential Revision: D6438959
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
      18dcf7f9
  6. 11 11月, 2017 1 次提交
  7. 23 9月, 2017 1 次提交
    • Z
      Add test kPointInTimeRecoveryCFConsistency · 1d6700f9
      Zhongyi Xie 提交于
      Summary:
      Context/problem:
      
      - CFs may be flushed at different times
      - A WAL can only be deleted after all CFs have flushed beyond end of that WAL.
      - Point-in-time recovery might stop upon reaching the first corruption.
      - Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs.
      Closes https://github.com/facebook/rocksdb/pull/2900
      
      Differential Revision: D5863281
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e
      1d6700f9
  8. 16 9月, 2017 1 次提交
    • M
      Use the default copy constructor in Options · c57050b7
      Maysam Yabandeh 提交于
      Summary:
      Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options.
      
      As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect.
      Closes https://github.com/facebook/rocksdb/pull/2888
      
      Differential Revision: D5846368
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683
      c57050b7
  9. 16 7月, 2017 1 次提交
  10. 25 6月, 2017 1 次提交
    • M
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh 提交于
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  11. 28 4月, 2017 1 次提交
  12. 06 4月, 2017 1 次提交
  13. 01 3月, 2017 1 次提交
  14. 07 2月, 2017 1 次提交
    • D
      Windows thread · 0a4cdde5
      Dmitri Smirnov 提交于
      Summary:
      introduce new methods into a public threadpool interface,
      - allow submission of std::functions as they allow greater flexibility.
      - add Joining methods to the implementation to join scheduled and submitted jobs with
        an option to cancel jobs that did not start executing.
      - Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
      - introduce pimpl for a drop in replacement of the implementation
      - Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
      - Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
      - should be no functionality changes.
      Closes https://github.com/facebook/rocksdb/pull/1823
      
      Differential Revision: D4492902
      
      Pulled By: siying
      
      fbshipit-source-id: c74cb11
      0a4cdde5
  15. 11 11月, 2016 1 次提交
    • R
      Fix 2PC Recovery SeqId Miscount · 1ca5f6d1
      Reid Horuff 提交于
      Summary:
      Originally sequence ids were calculated, in recovery, based off of the first seqid found if the first log recovered. The working seqid was then incremented from that value based on every insertion that took place. This was faulty because of the potential for missing log files or inserts that skipped the WAL. The current recovery scheme grabs sequence from current recovering batch and increments using memtableinserter to track how many actual inserts take place. This works for 2PC batches as well scenarios where some logs are missing or inserts that skip the WAL.
      Closes https://github.com/facebook/rocksdb/pull/1486
      
      Differential Revision: D4156064
      
      Pulled By: reidHoruff
      
      fbshipit-source-id: a6da8d9
      1ca5f6d1
  16. 15 10月, 2016 1 次提交
    • A
      Handle WAL deletion when using avoid_flush_during_recovery · f4705401
      Andrew Kryczka 提交于
      Summary:
      Previously the WAL files that were avoided during recovery would never
      be considered for deletion. That was because alive_log_files_ was only
      populated when log files are created. This diff further populates
      alive_log_files_ with existing log files that aren't flushed during recovery,
      such that FindObsoleteFiles() can find them later.
      
      Depends on D64053.
      
      Test Plan: new unit test, verifies it fails before this change and passes after
      
      Reviewers: sdong, IslamAbdelRahman, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: leveldb, dhruba, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D64059
      f4705401
  17. 13 10月, 2016 1 次提交
  18. 08 10月, 2016 1 次提交
    • R
      Add facility to write only a portion of WriteBatch to WAL · 2c1f9529
      Reid Horuff 提交于
      Summary:
      When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement.
      
      ```
      RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off
      INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628
      
      RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off
      INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054
      ```
      
      Test Plan: Two unit tests added.
      
      Reviewers: sdong, yiwu, IslamAbdelRahman
      
      Reviewed By: yiwu
      
      Subscribers: hermanlee4, dhruba, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D64599
      2c1f9529
  19. 24 9月, 2016 2 次提交
    • Y
      Split DBOptions into ImmutableDBOptions and MutableDBOptions · 9ed928e7
      Yi Wu 提交于
      Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.
      
      Test Plan:
        make all check
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64065
      9ed928e7
    • Y
      Recover same sequence id from WAL (#1350) · 4bc8c88e
      yiwu-arbug 提交于
      Summary:
      Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later.
      
      This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id.
      
      Test Plan: Added unit test.
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64275
      4bc8c88e
  20. 20 9月, 2016 1 次提交
  21. 16 9月, 2016 3 次提交
    • Y
      Fix DBWALTest.RecoveryWithLogDataForSomeCFs with mac · 40cfa3e0
      Yi Wu 提交于
      Summary: Seems there's no std::array on mac+clang. Use raw array instead.
      
      Test Plan: run ./db_wal_test on mac.
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64005
      40cfa3e0
    • A
      Fix recovery for WALs without data for all CFs · 06b4785f
      Andrew Kryczka 提交于
      Summary:
      if one or more CFs had no data in the WAL, the log number that's used
      by FindObsoleteFiles() wasn't updated. We need to treat this case the same as
      if the data for that WAL had been flushed.
      
      Test Plan: new unit test
      
      Reviewers: IslamAbdelRahman, yiwu, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63963
      06b4785f
    • A
      Fix GetSortedWalFiles when log recycling enabled · d7242ff4
      Andrew Kryczka 提交于
      Summary:
      Previously the sequence number was mistakenly passed in an argument
      where the log number should go. This caused the reader to assume the old WAL
      format was used, which is incompatible with the WAL recycling format.
      
      Test Plan:
      new unit test, verified it fails before this change and passes
      afterwards.
      
      Reviewers: yiwu, lightmark, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63987
      d7242ff4
  22. 06 7月, 2016 1 次提交
  23. 25 6月, 2016 1 次提交
  24. 14 6月, 2016 1 次提交
    • Y
      add option to not flush memtable on open() · bc8af90e
      Yi Wu 提交于
      Summary:
      Add option to not flush memtable on open()
      In case the option is enabled, don't delete existing log files by not updating log numbers to MANIFEST.
      Will still flush if we need to (e.g. memtable full in the middle). In that case we also flush final memtable.
      If wal_recovery_mode = kPointInTimeRecovery, do not halt immediately after encounter corruption. Instead, check if seq id of next log file is last_log_sequence + 1. In that case we continue recovery.
      
      Test Plan: See unit test.
      
      Reviewers: dhruba, horuff, sdong
      
      Reviewed By: sdong
      
      Subscribers: benj, yhchiang, andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57813
      bc8af90e
  25. 20 4月, 2016 1 次提交
  26. 19 4月, 2016 1 次提交
    • Y
      Split db_test.cc · 792762c4
      Yi Wu 提交于
      Summary: Split db_test.cc into several files. Moving several helper functions into DBTestBase.
      
      Test Plan: make check
      
      Reviewers: sdong, yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, andrewkr, kradhakrishnan, yhchiang, leveldb, sdong
      
      Differential Revision: https://reviews.facebook.net/D56715
      792762c4
  27. 10 2月, 2016 1 次提交
  28. 23 10月, 2015 1 次提交
  29. 13 10月, 2015 1 次提交
  30. 06 8月, 2015 1 次提交
    • S
      Add two unit tests for SyncWAL() · 7ccd1c80
      sdong 提交于
      Summary:
      Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed.
      
      Create a new test file db_wal_test and move two WAL related tests from db_test to here.
      
      Test Plan: Run the new tests
      
      Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43605
      7ccd1c80