1. 14 10月, 2015 2 次提交
    • I
      Make db_test_util compile under ROCKSDB_LITE · f55d3009
      Islam AbdelRahman 提交于
      Summary: db_test_util is used in multiple test files but it dont compile under ROCKSDB_LITE
      
      Test Plan:
      make check
      make static_lib
      OPT=-DROCKSDB_LITE make db_wal_test
      
      Reviewers: igor, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48579
      f55d3009
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  2. 13 10月, 2015 1 次提交
  3. 10 10月, 2015 2 次提交
    • A
      Passing table properties to compaction callback · 3d07b815
      Alexey Maykov 提交于
      Summary: It would be nice to have and access to table properties in compaction callbacks. In MyRocks project, it will make possible to update optimizer statistics online.
      
      Test Plan: ran the unit test. Ran myrocks with the new way of collecting stats.
      
      Reviewers: igor, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48267
      3d07b815
    • S
      Pass column family ID to table property collector · 776bd8d5
      sdong 提交于
      Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.
      
      Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48411
      776bd8d5
  4. 07 10月, 2015 1 次提交
    • D
      Support for LevelDB SST with .ldb suffix · 02675026
      dyniusz 提交于
      Summary:
      	Handle SST files with both ".sst" and ".ldb" suffix.
      	This enables user to migrate from leveldb to rocksdb.
      
      Test Plan:
              Added unit test with DB operating on SSTs with names schema.
              See db/dc_test.cc:SSTsWithLdbSuffixHandling for details
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D48003
      02675026
  5. 03 10月, 2015 1 次提交
  6. 24 9月, 2015 1 次提交
    • I
      Add experimental DB::AddFile() to plug sst files into empty DB · f03b5c98
      Islam AbdelRahman 提交于
      Summary:
      This is an initial version of bulk load feature
      
      This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
      (1) Memtables are empty
      (2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
      (3) Added sst files values are not overlapping
      
      Test Plan: unit testing
      
      Reviewers: igor, ott, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, ott, dhruba
      
      Differential Revision: https://reviews.facebook.net/D39081
      f03b5c98
  7. 23 9月, 2015 1 次提交
  8. 18 9月, 2015 2 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
    • V
      Do not flag error if file to be deleted does not exist · 51e1c112
      Venkatesh Radhakrishnan 提交于
      Summary:
      Some users have observed errors in the log file when
      the log file or sst file is already deleted.
      
      Test Plan:
      Make sure that the errors do not appear for already deleted
      files.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: anthony, kradhakrishnan, yhchiang, rven, igor, IslamAbdelRahman, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47115
      51e1c112
  9. 17 9月, 2015 3 次提交
  10. 15 9月, 2015 1 次提交
    • S
      DBImpl::FindObsoleteFiles() shouldn't release mutex between getting... · f3170b6f
      sdong 提交于
      DBImpl::FindObsoleteFiles() shouldn't release mutex between getting min_pending_output and scanning files
      
      Summary:
      Releasing mutex between getting min_pending_output and scanning files may cause min_pending_output to be max but some non-final files are found in file scanning, ending up with deleting wrong files.
      As a recent regression, mutex can be released while waiting for log sync. We move it to after file scanning.
      
      Test Plan: Run all existing tests. Don't think it is easy to write a unit test. Maybe we should find a way to assert lock not released so that we can have some test verification for similar cases.
      
      Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, kolmike, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D46899
      f3170b6f
  11. 11 9月, 2015 2 次提交
    • K
      Relaxing consistency detection to include errors while inserting to memtable as WAL recovery error. · 11266440
      krad 提交于
      Summary: The current code, considers data to be consistent if the record
      checksum passes. We do have customer issues where the record checksum passed but
      the data was incomprehensible. There is no way to get out of this error case
      since all WAL recovery model will consider this error as unrelated to WAL.
      
      Relaxing the definition and including errors while inserting to memtable as WAL
      errors and handing them as per the recovery level.
      
      Test Plan: Used customer dump to verify the fix for different level. The db
      opens for kSkipAnyCorruptedRecords and kPointInTimeRecovery, but fails for
      kAbsoluteConsistency and kTolerateCorruptedTailRecords.
      
      Reviewers: sdon igor
      
      CC: leveldb@
      
      Task ID: #7918721
      
      Blame Rev:
      11266440
    • I
      Set max_open_files based on ulimit · ac9bcb55
      Igor Canadi 提交于
      Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566
      
      Test Plan:
      make check
      
      I will also verify that max_max_open_files is reasonable.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46551
      ac9bcb55
  12. 03 9月, 2015 1 次提交
    • A
      Unified maps with Comparator for sorting, other cleanup · 3c9cef1e
      Andres Noetzli 提交于
      Summary:
      This diff is a collection of cleanups that were initially part of D43179.
      Additionally it adds a unified way of defining key-value maps that use a
      Comparator for sorting (this was previously implemented in four different
      places).
      
      Test Plan: make clean check all
      
      Reviewers: rven, anthony, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45993
      3c9cef1e
  13. 29 8月, 2015 1 次提交
    • A
      Fix deadlock in WAL sync · effd9dd1
      Andres Noetzli 提交于
      Summary:
      MarkLogsSynced() was doing `logs_.erase(it++);`. The standard is saying:
      
      ```
      all iterators and references are invalidated, unless the erased members are at an end (front or back) of the deque (in which case only iterators and references to the erased members are invalidated)
      ```
      
      Because `it` is an iterator to the first element of the container, it is
      invalidated, only one iteration is executed and `log.getting_synced = false;`
      is not being done, so `while (logs_.front().getting_synced)` in `WriteImpl()`
      is not terminating.
      
      Test Plan: make db_bench && ./db_bench --benchmarks=fillsync
      
      Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang, sdong, tnovak
      
      Reviewed By: tnovak
      
      Subscribers: kolmike, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45807
      effd9dd1
  14. 27 8月, 2015 1 次提交
    • I
      ReadaheadRandomAccessFile -- userspace readahead · 5f4166c9
      Igor Canadi 提交于
      Summary:
      ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.
      
      We add ReadaheadRandomAccessFile layer only when file is read during compactions.
      
      D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45123
      5f4166c9
  15. 26 8月, 2015 1 次提交
    • A
      Fix compact_files_example · 09d982f9
      Andres Notzli 提交于
      Summary:
      See task #7983654. The example was triggering an assert in compaction job
      because the compaction was not marked as manual. With this patch,
      CompactionPicker::FormCompaction() marks compactions as manual. This patch
      also fixes a couple of typos, adds optimistic_transaction_example to
      .gitignore and librocksdb as a dependency for examples. Adding librocksdb as
      a dependency makes sure that the examples are built with the latest changes
      in librocksdb.
      
      Test Plan: make clean && cd examples && make all && ./compact_files_example
      
      Reviewers: rven, sdong, anthony, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45117
      09d982f9
  16. 25 8月, 2015 2 次提交
    • A
      Fixing race condition in DBTest.DynamicMemtableOptions · 20508329
      Andres Noetzli 提交于
      Summary:
      This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases,
      it was possible that the main thread would fill up both memtables before the flush
      job acquired its work. Then, the flush job was flushing both memtables together,
      producing only one L0 file while the test expected two. Now, the test waits for
      flushes to finish earlier, to make sure that the memtables are flushed in separate
      flush jobs.
      
      Test Plan:
      Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush()
      to make the issue more likely. Then test with:
      make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45429
      20508329
    • I
      Smarter purging during flush · 4ab26c5a
      Igor Canadi 提交于
      Summary:
      Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
      
      This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
      
      I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
      
      Test Plan:
      make check
      I had to adjust some unit tests to understand this new behavior
      
      Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
      
      Reviewed By: noetzli
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42087
      4ab26c5a
  17. 20 8月, 2015 1 次提交
  18. 19 8月, 2015 1 次提交
    • A
      [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State · f0da6977
      Ari Ekmekji 提交于
      Summary:
      In prepration for running multiple threads at the same time during
      a compaction job, this patch assigns each subcompaction its own state
      (instead of sharing the one global CompactionState). Each subcompaction then
      uses this state to update its statistics, keep track of its snapshots, etc.
      during the course of execution. Then at the end of all the executions the
      statistics are aggregated across the subcompactions so that the final result
      is the same as if only one larger compaction had run.
      
      Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test
      
      Reviewers: sdong, anthony, igor, noetzli, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43239
      f0da6977
  19. 15 8月, 2015 2 次提交
    • S
      Measure file read latency histogram per level · 72613657
      sdong 提交于
      Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
      
      Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
      
      Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44193
      72613657
    • N
      reduce db mutex contention for write batch groups · b7198c3a
      Nathan Bronson 提交于
      Summary:
      This diff allows a Writer to join the next write batch group
      without acquiring any locks. Waiting is performed via a per-Writer mutex,
      so all of the non-leader writers never need to acquire the db mutex.
      It is now possible to join a write batch group after the leader has been
      chosen but before the batch has been constructed. This diff doesn't
      increase parallelism, but reduces synchronization overheads.
      
      For some CPU-bound workloads (no WAL, RAM-sized working set) this can
      substantially reduce contention on the db mutex in a multi-threaded
      environment.  With T=8 N=500000 in a CPU-bound scenario (see the test
      plan) this is good for a 33% perf win.  Not all scenarios see such a
      win, but none show a loss.  This code is slightly faster even for the
      single-threaded case (about 2% for the CPU-bound scenario below).
      
      Test Plan:
      1. unit tests
      2. COMPILE_WITH_TSAN=1 make check
      3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      
      Reviewers: sdong, igor, rven, ljin, yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43887
      b7198c3a
  20. 14 8月, 2015 1 次提交
    • S
      Add options.compaction_measure_io_stats to print write I/O stats in compactions · 603b6da8
      sdong 提交于
      Summary:
      Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
      
      2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
      
      Add two more counters in iostats_context.
      
      Also add a parameter of db_bench.
      
      Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44115
      603b6da8
  21. 12 8月, 2015 2 次提交
    • A
      Transaction error statuses · 0db807ec
      agiardullo 提交于
      Summary:
      Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.
      
      https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954
      
      Test Plan: unit tests
      
      Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43323
      0db807ec
    • A
      Pessimistic Transactions · c2f2cb02
      agiardullo 提交于
      Summary:
      Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
      
      MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
      
      Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.
      
      Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
      
      Test Plan: Unit tests, db_bench parallel testing.
      
      Reviewers: igor, rven, sdong, yhchiang, yoshinorim
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40869
      c2f2cb02
  22. 07 8月, 2015 1 次提交
    • S
      Avoid type unique_ptr in LogWriterNumber::writer for Windows build break · 6a4aaadc
      sdong 提交于
      Summary:
      Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free.
      It is less safe but I can't think of a better way to unblock it.
      
      Test Plan: valgrind check test
      
      Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43647
      6a4aaadc
  23. 05 8月, 2015 2 次提交
    • M
      [wal changes 3/3] method in DB to sync WAL without blocking writers · e06cf1a0
      Mike Kolupaev 提交于
      Summary:
      Subj. We really need this feature.
      
      Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
      
      Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
      
      Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40905
      e06cf1a0
    • I
      Support delete rate limiting · c45a57b4
      Islam AbdelRahman 提交于
      Summary:
      Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
      Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.
      
      I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile
      
      Test Plan:
      added delete_scheduler_test
      existing unit tests
      
      Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43221
      c45a57b4
  24. 28 7月, 2015 1 次提交
  25. 22 7月, 2015 1 次提交
    • M
      [wal changes 2/3] write with sync=true syncs previous unsynced wals to prevent illegal data loss · fe09a6da
      Mike Kolupaev 提交于
      Summary:
      I'll just copy internal task summary here:
      
      "
      This sequence will cause data loss in the middle after an sync write:
      
      non-sync write key 1
      flush triggered, not yet scheduled
      sync write key 2
      system crash
      
      After rebooting, users might see key 2 but not key 1, which violates the API of sync write.
      
      This can be reproduced using unit test FaultInjectionTest::DISABLED_WriteOptionSyncTest.
      
      One way to fix it is for a sync write, if there is outstanding unsynced log files, we need to syc them too.
      "
      
      This diff should be considered together with the next diff D40905; in isolation this fix probably could be a little simpler.
      
      Test Plan: `make check`; added a test for that (DBTest.SyncingPreviousLogs) before noticing FaultInjectionTest.WriteOptionSyncTest (keeping both since mine asserts a bit more); both tests fail without this diff; for D40905 stacked on top of this diff, ran tests with ASAN, TSAN and valgrind
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40899
      fe09a6da
  26. 21 7月, 2015 1 次提交
    • A
      Improved FileExists API · 06429408
      agiardullo 提交于
      Summary: Add new CheckFileExists method.  Considered changing the FileExists api but didn't want to break anyone's builds.
      
      Test Plan: unit tests
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42003
      06429408
  27. 18 7月, 2015 3 次提交
    • S
      Move rate_limiter, write buffering, most perf context instrumentation and most... · 6e9fbeb2
      sdong 提交于
      Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
      
      Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
      
      Test Plan: Run all existing unit tests.
      
      Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42321
      6e9fbeb2
    • I
      Don't let flushes preempt compactions · 35ca5936
      Igor Canadi 提交于
      Summary:
      When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
      
      We have a special case that when you set max_background_flushes to 0, we
      schedule the flush to execute on the compaction thread.
      
      Test Plan: make check (there might be some unit tests that depend on this behavior)
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41931
      35ca5936
    • I
      Deprecate CompactionFilterV2 · a96fcd09
      Igor Canadi 提交于
      Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature.
      
      Test Plan: make check
      
      Reviewers: noetzli, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42405
      a96fcd09
  28. 17 7月, 2015 1 次提交
    • S
      Fix data loss after DB recovery by not allowing flush/compaction to be scheduled until DB opened · 6c0c8dee
      sdong 提交于
      Summary:
      Previous run may leave some SST files with higher file numbers than manifest indicates.
      Compaction or flush may start to run while DB::Open() is still going on. SST file garbage collection may happen interleaving with compaction or flush, and overwrite files generated by compaction of flushes after they are generated. This might cause data loss. This possibility of interleaving is recently introduced.
      Fix it by not allowing compaction or flush to be scheduled before DB::Open() finishes.
      
      Test Plan: Add a unit test. This verification will have a chance to fail without the fix but doesn't fix without the fix.
      
      Reviewers: kradhakrishnan, anthony, yhchiang, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42399
      6c0c8dee