1. 11 11月, 2015 1 次提交
    • Y
      Enable RocksDB to persist Options file. · e114f0ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch allows rocksdb to persist options into a file on
      DB::Open, SetOptions, and Create / Drop ColumnFamily.
      Options files are created under the same directory as the rocksdb
      instance.
      
      In addition, this patch also adds a fail_if_missing_options_file in DBOptions
      that makes any function call return non-ok status when it is not able to
      persist options properly.
      
        // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
        // / SetOptions will fail if options file is not detected or properly
        // persisted.
        //
        // DEFAULT: false
        bool fail_if_missing_options_file;
      
      Options file names are formatted as OPTIONS-<number>, and RocksDB
      will always keep the latest two options files.
      
      Test Plan:
      Add options_file_test.
      
      options_test
      column_family_test
      
      Reviewers: igor, IslamAbdelRahman, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48285
      e114f0ab
  2. 10 11月, 2015 1 次提交
    • N
      Switch to thread-local random for skiplist · b81b4309
      Nathan Bronson 提交于
      Summary:
      Using a TLS random instance for skiplist makes it smaller
      (useful for hash_skiplist_rep) and prepares skiplist for concurrent
      adds.  This diff also modifies the branching factor math to avoid an
      unnecessary division.
      
      This diff has the effect of changing the sequence of skip list node
      height choices made by tests, so it has the potential to cause unit
      test failures for tests that implicitly rely on the exact structure
      of the skip list.  Tests that try to exactly trigger a compaction are
      likely suspects for this problem (these tests have always been brittle to
      changes in the skiplist details).  I've minimizes this risk by reseeding
      the main thread's Random at the beginning of each test, increasing the
      universal compaction size_ratio limit from 101% to 105% for some tests,
      and verifying that the tests pass many times.
      
      Test Plan: for i in `seq 0 9`; do make check; done
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D50439
      b81b4309
  3. 23 10月, 2015 1 次提交
  4. 19 10月, 2015 1 次提交
  5. 16 10月, 2015 1 次提交
  6. 14 10月, 2015 2 次提交
    • I
      Make db_test_util compile under ROCKSDB_LITE · f55d3009
      Islam AbdelRahman 提交于
      Summary: db_test_util is used in multiple test files but it dont compile under ROCKSDB_LITE
      
      Test Plan:
      make check
      make static_lib
      OPT=-DROCKSDB_LITE make db_wal_test
      
      Reviewers: igor, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48579
      f55d3009
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  7. 13 10月, 2015 2 次提交
  8. 07 10月, 2015 1 次提交
    • D
      Support for LevelDB SST with .ldb suffix · 02675026
      dyniusz 提交于
      Summary:
      	Handle SST files with both ".sst" and ".ldb" suffix.
      	This enables user to migrate from leveldb to rocksdb.
      
      Test Plan:
              Added unit test with DB operating on SSTs with names schema.
              See db/dc_test.cc:SSTsWithLdbSuffixHandling for details
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D48003
      02675026
  9. 24 9月, 2015 1 次提交
    • S
      PlainTableReader to support non-mmap mode · df34aea3
      sdong 提交于
      Summary:
      PlainTableReader now only allows mmap-mode. Add the support to non-mmap mode for more flexibility.
      Refactor the codes to move all logic of reading data to PlainTableKeyDecoder, and consolidate the calls to Read() call and ReadVarint32() call. Implement the calls for both of mmap and non-mmap case seperately. For non-mmap mode, make copy of keys in several places when we need to move the buffer after reading the keys.
      
      Test Plan: Add the mode of non-mmap case in plain_table_db_test. Run it in valgrind mode too.
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D47187
      df34aea3
  10. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  11. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  12. 01 9月, 2015 2 次提交
    • A
      Add Subcompactions to Universal Compaction Unit Tests · 8b689546
      Ari Ekmekji 提交于
      Summary:
      Now that the approach to parallelizing L0-L1 level-based
      compactions by breaking the compaction job into subcompactions is
      being extended to apply to universal compactions as well, the unit
      tests need to account for this and run the universal compaction
      tests with subcompactions both enabled and disabled.
      
      Test Plan: make all && make check
      
      Reviewers: sdong, igor, noetzli, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D45657
      8b689546
    • S
      Arena usage to be calculated using malloc_usable_size() · 3d78eb66
      sdong 提交于
      Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too.
      
      Test Plan: Run all unit tests
      
      Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43317
      3d78eb66
  13. 27 8月, 2015 1 次提交
    • I
      ReadaheadRandomAccessFile -- userspace readahead · 5f4166c9
      Igor Canadi 提交于
      Summary:
      ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.
      
      We add ReadaheadRandomAccessFile layer only when file is read during compactions.
      
      D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45123
      5f4166c9
  14. 26 8月, 2015 1 次提交
    • A
      Fix compact_files_example · 09d982f9
      Andres Notzli 提交于
      Summary:
      See task #7983654. The example was triggering an assert in compaction job
      because the compaction was not marked as manual. With this patch,
      CompactionPicker::FormCompaction() marks compactions as manual. This patch
      also fixes a couple of typos, adds optimistic_transaction_example to
      .gitignore and librocksdb as a dependency for examples. Adding librocksdb as
      a dependency makes sure that the examples are built with the latest changes
      in librocksdb.
      
      Test Plan: make clean && cd examples && make all && ./compact_files_example
      
      Reviewers: rven, sdong, anthony, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45117
      09d982f9
  15. 22 8月, 2015 1 次提交
    • A
      Changed 'num_subcompactions' to the more accurate 'max_subcompactions' · b6def58f
      Ari Ekmekji 提交于
      Summary:
      Up until this point we had DbOptions.num_subcompactions, but
      it is semantically more correct to call this max_subcompactions since
      we will schedule *up to* DbOptions.max_subcompactions smaller compactions
      at a time during a compaction job.
      
      I also added a --subcompactions option to db_bench
      
      Test Plan: make all   make check
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D45069
      b6def58f
  16. 21 8月, 2015 1 次提交
    • S
      Add options.new_table_reader_for_compaction_inputs · 9130873a
      sdong 提交于
      Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.
      
      Test Plan: Add the option.
      
      Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang
      
      Reviewed By: igor
      
      Subscribers: igor, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43311
      9130873a
  17. 05 8月, 2015 2 次提交
    • A
      Update Tests To Enable Subcompactions · 5dc3e688
      Ari Ekmekji 提交于
      Summary:
      Updated DBTest DBCompactionTest and CompactionJobStatsTest
      to run compaction-related tests once with subcompactions enabled and
      once disabled using the TEST_P test type in the Google Test suite.
      
      Test Plan: ./db_test  ./db_compaction-test  ./compaction_job_stats_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43443
      5dc3e688
    • Y
      Add DBOptions::skip_sats_update_on_db_open · 14d0bfa4
      Yueh-Hsuan Chiang 提交于
      Summary:
      UpdateAccumulatedStats() is used to optimize compaction decision
      esp. when the number of deletion entries are high, but this function
      can slowdown DBOpen esp. in disk environment.
      
      This patch adds DBOptions::skip_sats_update_on_db_open, which skips
      UpdateAccumulatedStats() in DB::Open() time when it's set to true.
      
      Test Plan: Add DBCompactionTest.SkipStatsUpdateTest
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: tnovak, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42843
      14d0bfa4
  18. 04 8月, 2015 1 次提交
    • A
      Parallelize L0-L1 Compaction: Restructure Compaction Job · 40c64434
      Ari Ekmekji 提交于
      Summary:
      As of now compactions involving files from Level 0 and Level 1 are single
      threaded because the files in L0, although sorted, are not range partitioned like
      the other levels. This means that during L0-L1 compaction each file from L1
      needs to be merged with potentially all the files from L0.
      
      This attempt to parallelize the L0-L1 compaction assigns a thread and a
      corresponding iterator to each L1 file that then considers only the key range
      found in that L1 file and only the L0 files that have those keys (and only the
      specific portion of those L0 files in which those keys are found). In this way
      the overlap is minimized and potentially eliminated between different iterators
      focusing on the same files.
      
      The first step is to restructure the compaction logic to break L0-L1 compactions
      into multiple, smaller, sequential compactions. Eventually each of these smaller
      jobs will be run simultaneously. Areas to pay extra attention to are
      
        # Correct aggregation of compaction job statistics across multiple threads
        # Proper opening/closing of output files (make sure each thread's is unique)
        # Keys that span multiple L1 files
        # Skewed distributions of keys within L0 files
      
      Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test
      
      Reviewers: igor, noetzli, anthony, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42699
      40c64434
  19. 22 7月, 2015 1 次提交
    • S
      Tests to avoid to use TMPDIR directly · 85ac6553
      sdong 提交于
      Summary: Directly using TMPDIR can cause problems when running tests using parallel option. Fix them.
      
      Test Plan: Run all tests in parallel
      
      Reviewers: kradhakrishnan, yhchiang, IslamAbdelRahman, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42807
      85ac6553
  20. 18 7月, 2015 1 次提交
    • I
      Don't let flushes preempt compactions · 35ca5936
      Igor Canadi 提交于
      Summary:
      When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
      
      We have a special case that when you set max_background_flushes to 0, we
      schedule the flush to execute on the compaction thread.
      
      Test Plan: make check (there might be some unit tests that depend on this behavior)
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41931
      35ca5936
  21. 16 7月, 2015 1 次提交
    • P
      Fixing delete files in Trivial move of universal compaction · beb19ad0
      Poornima Chozhiyath Raman 提交于
      Summary:
      Trvial move in universal compaction was failing when trying to move files from levels other than 0.
      This was because the DeleteFile while trivially moving, was only deleting files of level 0 which caused duplication of same file in different levels.
      This is fixed by passing the right level as argument in the call of DeleteFile while doing trivial move.
      
      Test Plan: ./db_test ran successfully with the new test cases.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42135
      beb19ad0
  22. 15 7月, 2015 2 次提交
  23. 14 7月, 2015 2 次提交
    • I
      Deprecate purge_redundant_kvs_while_flush · a9c51095
      Igor Canadi 提交于
      Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
      
      Test Plan: none
      
      Reviewers: sdong, yhchiang, anthony, dhruba
      
      Reviewed By: dhruba
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42063
      a9c51095
    • Y
      Move reusable part of db_test.cc to util/db_test_util.h · 625467a0
      Yueh-Hsuan Chiang 提交于
      Summary:
      Move reusable part of db_test.cc to util/db_test_util.h.
      This makes it more possible to partition db_test.cc into
      multiple smaller test files.
      
      Also, fixed many old lint errors in db_test.
      
      Test Plan: db_test
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong, kradhakrishnan
      
      Reviewed By: sdong, kradhakrishnan
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41973
      625467a0