1. 12 8月, 2015 5 次提交
    • A
      Pessimistic Transactions · c2f2cb02
      agiardullo 提交于
      Summary:
      Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
      
      MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
      
      Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.
      
      Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
      
      Test Plan: Unit tests, db_bench parallel testing.
      
      Reviewers: igor, rven, sdong, yhchiang, yoshinorim
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40869
      c2f2cb02
    • I
      Use manual_compaction for compaction_job_test · c2868cbc
      Islam AbdelRahman 提交于
      Summary:
      Under certain conditions (disable compression) the compactions that are created in compaction_job_test will pass the trivial_move conditions
      This will cause problems since we assert that we dont run a compaction if it's a trivial move
      https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L144-L147
      
      for example when we disable compression, compactions become a valid trivial move and the assert fails
      https://ci-builds.fb.com/view/rocksdb/job/rocksdb_no_compression/180/console
      
      Test Plan: compaction_job_test
      
      Reviewers: sdong, yhchiang, noetzli, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43983
      c2868cbc
    • I
      Parallelize LoadTableHandlers · cee1e8a0
      Islam AbdelRahman 提交于
      Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover
      
      Test Plan:
      make check -j64
      COMPILE_WITH_TSAN=1 make check -j64
      DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)
      
      Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43755
      cee1e8a0
    • A
      Removing duplicate code in db_bench/db_stress, fixing typos · 4249f159
      Andres Notzli 提交于
      Summary:
      While working on single delete support for db_bench, I realized that
      db_bench/db_stress contain a bunch of duplicate code related to
      copmression and found some typos. This patch removes duplicate code,
      typos and a redundant #ifndef in internal_stats.cc.
      
      Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
      
      Reviewers: yhchiang, sdong, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43965
      4249f159
    • N
      reduce comparisons by skiplist · 1ae27113
      Nathan Bronson 提交于
      Summary:
      Key comparison is the single largest CPU user for CPU-bound
      workloads. This diff reduces the number of comparisons in two ways.
      
      The first is that it moves predecessor array gathering from
      FindGreaterOrEqual to FindLessThan, so that FindGreaterOrEqual can
      return immediately if compare_ returns 0.  As part of this change I
      moved the sequential insertion optimization into Insert, to remove the
      undocumented (and smelly) requirement that prev must be equal to prev_
      if it is non-null.
      
      The second optimization is that all of the search functions skip calling
      compare_ when moving to a lower level that has the same Next pointer.
      With a branching factor of 4 we would expect this to happen 1/4 of
      the time.
      
      On a single-threaded CPU-bound workload (-benchmarks=fillrandom -threads=1
      -batch_size=1 -memtablerep=skip_list -value_size=0 --num=1600000
      -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
      -disable_auto_compactions --max_write_buffer_number=8
      -max_background_flushes=8 --disable_wal --write_buffer_size=160000000)
      on my dev server this is good for a 7% perf win.
      
      Test Plan: unit tests
      
      Reviewers: rven, ljin, yhchiang, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43233
      1ae27113
  2. 11 8月, 2015 1 次提交
  3. 08 8月, 2015 1 次提交
    • A
      Better CompactionJob testing · 68f93435
      Andres Notzli 提交于
      Summary:
      Changed compaction_job_test to support better/more thorough
      tests and added two tests. Also changed MockFileContents
      to order using InternalKeyComparator.
      
      Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check
      
      Reviewers: sdong, rven, igor, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42837
      68f93435
  4. 07 8月, 2015 3 次提交
    • A
      simple ManagedSnapshot wrapper · 16ea1c7d
      agiardullo 提交于
      Summary: Implemented this simple wrapper for something else I was working on.  Seemed like it makes sense to expose it instead of burying it in some random code.
      
      Test Plan: added test
      
      Reviewers: rven, kradhakrishnan, sdong, yhchiang
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43293
      16ea1c7d
    • S
      Avoid type unique_ptr in LogWriterNumber::writer for Windows build break · 6a4aaadc
      sdong 提交于
      Summary:
      Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free.
      It is less safe but I can't think of a better way to unblock it.
      
      Test Plan: valgrind check test
      
      Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43647
      6a4aaadc
    • A
      Fixing endless loop if seeking to end of key with seq num 0 · d7314ba7
      Andres Noetzli 提交于
      Summary:
      When seeking to the last occurrence of a key with sequence number 0, db_iter
      ends up in an endless loop because it seeks to type kValueTypeForSeek
      which is larger than kTypeDeletion/kTypeValue. Added test case that triggers
      the behavior.
      
      Test Plan: make clean all check
      
      Reviewers: igor, rven, anthony, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43653
      d7314ba7
  5. 06 8月, 2015 7 次提交
    • I
      Make DeleteScheduler tests more reliable · 29b028b0
      Islam AbdelRahman 提交于
      Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable
      
      Test Plan:
      make -j64 delete_scheduler_test && ./delete_scheduler_test
      COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test
      COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test
      
      make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
      COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
      COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43635
      29b028b0
    • P
      Fix build failure · 7d364d0d
      Poornima Chozhiyath Raman 提交于
      Summary: fix the build failure
      
      Test Plan: make all
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43623
      7d364d0d
    • P
      Add function 'GetInfoLogList()' · 960d936e
      Poornima Chozhiyath Raman 提交于
      Summary: The list of info log files of a db can be obtained using the new function.
      
      Test Plan: New test in db_test.cc passed.
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: IslamAbdelRahman, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D41715
      960d936e
    • S
      Add two unit tests for SyncWAL() · 7ccd1c80
      sdong 提交于
      Summary:
      Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed.
      
      Create a new test file db_wal_test and move two WAL related tests from db_test to here.
      
      Test Plan: Run the new tests
      
      Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43605
      7ccd1c80
    • S
      Add statistic histogram "rocksdb.sst.read.micros" · 3ae386ea
      sdong 提交于
      Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache.
      
      Test Plan:
      Run db_bench and it shows the stats, like:
      
      rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180
      
      Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43275
      3ae386ea
    • I
      Enable DBTest.FlushSchedule under TSAN · 9aec75fb
      Islam AbdelRahman 提交于
      Summary: This patch will fix the false positive of DBTest.FlushSchedule under TSAN, we dont need to disable this test
      
      Test Plan: COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.FlushSchedule"
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43599
      9aec75fb
    • S
      Fix misplaced position for reversing iterator direction while current key is a merge · 8e01bd11
      sdong 提交于
      Summary:
      While doing forward iterating, if current key is merge, internal iterator position is placed to the next key. If Prev() is called now, needs to do extra Prev() to recover the location.
      This is second attempt of fixing after reverting ec70fea4. This time shrink the fix to only merge key is the current key and avoid the reseeking logic for max_iterating skipping
      
      Test Plan: enable the two disabled tests and make sure they pass
      
      Reviewers: rven, IslamAbdelRahman, kradhakrishnan, tnovak, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43557
      8e01bd11
  6. 05 8月, 2015 6 次提交
    • A
      Removing duplicate code · c4650710
      Andres Notzli 提交于
      Summary:
      While working on https://reviews.facebook.net/D43179 , I found
      duplicate code in the tests. This patch removes it.
      
      Test Plan: make clean all check
      
      Reviewers: igor, sdong, rven, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43263
      c4650710
    • M
      [wal changes 3/3] method in DB to sync WAL without blocking writers · e06cf1a0
      Mike Kolupaev 提交于
      Summary:
      Subj. We really need this feature.
      
      Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
      
      Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
      
      Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40905
      e06cf1a0
    • A
      Update Tests To Enable Subcompactions · 5dc3e688
      Ari Ekmekji 提交于
      Summary:
      Updated DBTest DBCompactionTest and CompactionJobStatsTest
      to run compaction-related tests once with subcompactions enabled and
      once disabled using the TEST_P test type in the Google Test suite.
      
      Test Plan: ./db_test  ./db_compaction-test  ./compaction_job_stats_test
      
      Reviewers: sdong, igor, anthony, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43443
      5dc3e688
    • I
      Support delete rate limiting · c45a57b4
      Islam AbdelRahman 提交于
      Summary:
      Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
      Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.
      
      I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile
      
      Test Plan:
      added delete_scheduler_test
      existing unit tests
      
      Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D43221
      c45a57b4
    • Y
      Make DBCompactionTest.SkipStatsUpdateTest more stable. · 241bb2ae
      Yueh-Hsuan Chiang 提交于
      Summary:
      Make DBCompactionTest.SkipStatsUpdateTest more stable by
      removing flaky but unnecessary assertion on the size of db
      as simply checking the random file open count is suffice.
      
      Test Plan: db_compaction_test
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43533
      241bb2ae
    • Y
      Add DBOptions::skip_sats_update_on_db_open · 14d0bfa4
      Yueh-Hsuan Chiang 提交于
      Summary:
      UpdateAccumulatedStats() is used to optimize compaction decision
      esp. when the number of deletion entries are high, but this function
      can slowdown DBOpen esp. in disk environment.
      
      This patch adds DBOptions::skip_sats_update_on_db_open, which skips
      UpdateAccumulatedStats() in DB::Open() time when it's set to true.
      
      Test Plan: Add DBCompactionTest.SkipStatsUpdateTest
      
      Reviewers: igor, anthony, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: tnovak, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42843
      14d0bfa4
  7. 04 8月, 2015 3 次提交
    • V
      Fix CompactFiles by adding all necessary files · 20b244fc
      Venkatesh Radhakrishnan 提交于
      Summary:
      The compact files API had a bug where some overlapping files
      are not added. These are files which overlap with files which were
      added to the compaction input files, but not to the original set of
      input files. This happens only when there are more than two levels
      involved in the compaction. An example will illustrate this better.
      
      Level 2 has 1 input file 1.sst which spans [20,30].
      
      Level 3 has added file  2.sst which spans [10,25]
      
      Level 4 has file 3.sst which spans [35,40] and
              input file 4.sst which spans [46,50].
      
      The existing code would not add 3.sst to the set of input_files because
      it only becomes an overlapping file in level 4 and it wasn't one in
      level 3.
      
      When installing the results of the compaction, 3.sst would overlap with
      output file from the compact files and result in the assertion in
      version_set.cc:1130
      
       // Must not overlap
         assert(level <= 0 || level_files->empty() ||
                  internal_comparator_->Compare(
                      (*level_files)[level_files->size() - 1]->largest, f->smallest) <
                      0);
      This change now adds overlapping files from the current level to the set
      of input files also so that we don't hit the assertion above.
      
      Test Plan:
      d=/tmp/j; rm -rf $d; seq 1000 | parallel --gnu --eta
      'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_compaction_test
      --gtest_filter=*CompactilesOnLevel* --gtest_also_run_disabled_tests >&
      '$d'/log-{}'
      
      Reviewers: igor, yhchiang, sdong
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43437
      20b244fc
    • V
      Make SuggestCompactRangeNoTwoLevel0Compactions deterministic · 87df6295
      Venkatesh Radhakrishnan 提交于
      Summary:
      Made SuggestCompactRangeNoTwoLevel0Compactions by forcing
      a flush after generating a file and waiting for compaction at the end.
      
      Test Plan: Run SuggestCompactRangeNoTwoLevel0Compactions
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43449
      87df6295
    • A
      Parallelize L0-L1 Compaction: Restructure Compaction Job · 40c64434
      Ari Ekmekji 提交于
      Summary:
      As of now compactions involving files from Level 0 and Level 1 are single
      threaded because the files in L0, although sorted, are not range partitioned like
      the other levels. This means that during L0-L1 compaction each file from L1
      needs to be merged with potentially all the files from L0.
      
      This attempt to parallelize the L0-L1 compaction assigns a thread and a
      corresponding iterator to each L1 file that then considers only the key range
      found in that L1 file and only the L0 files that have those keys (and only the
      specific portion of those L0 files in which those keys are found). In this way
      the overlap is minimized and potentially eliminated between different iterators
      focusing on the same files.
      
      The first step is to restructure the compaction logic to break L0-L1 compactions
      into multiple, smaller, sequential compactions. Eventually each of these smaller
      jobs will be run simultaneously. Areas to pay extra attention to are
      
        # Correct aggregation of compaction job statistics across multiple threads
        # Proper opening/closing of output files (make sure each thread's is unique)
        # Keys that span multiple L1 files
        # Skewed distributions of keys within L0 files
      
      Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test
      
      Reviewers: igor, noetzli, anthony, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42699
      40c64434
  8. 31 7月, 2015 1 次提交
    • A
      Fixing dead code in table_properties_collector_test · 193dc977
      Andres Notzli 提交于
      Summary:
      There was a bug in table_properties_collector_test that this patch
      is fixing: `!backward_mode && !test_int_tbl_prop_collector` in
      TestCustomizedTablePropertiesCollector was never true, so the code
      in the if-block never got executed. The reason is that the
      CustomizedTablePropertiesCollector test was skipping tests with
      `!backward_mode_ && !encode_as_internal`. The reason for skipping
      the tests is unknown.
      
      Test Plan: make table_properties_collector_test && ./table_properties_collector_test
      
      Reviewers: rven, igor, yhchiang, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43281
      193dc977
  9. 30 7月, 2015 1 次提交
    • A
      WriteBatch Save Points · 8161bdb5
      agiardullo 提交于
      Summary:
      Support RollbackToSavePoint() in WriteBatch and WriteBatchWithIndex.  Support for partial transaction rollback is needed for MyRocks.
      
      An alternate implementation of Transaction::RollbackToSavePoint() exists in D40869.  However, the other implementation is messier because it is implemented outside of WriteBatch.  This implementation is much cleaner and also exposes a potentially useful feature to WriteBatch.
      
      Test Plan: Added unit tests
      
      Reviewers: IslamAbdelRahman, kradhakrishnan, maykov, yoshinorim, hermanlee4, spetrunia, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42723
      8161bdb5
  10. 29 7月, 2015 2 次提交
  11. 28 7月, 2015 1 次提交
  12. 23 7月, 2015 2 次提交
  13. 22 7月, 2015 3 次提交
    • M
      [wal changes 2/3] write with sync=true syncs previous unsynced wals to prevent illegal data loss · fe09a6da
      Mike Kolupaev 提交于
      Summary:
      I'll just copy internal task summary here:
      
      "
      This sequence will cause data loss in the middle after an sync write:
      
      non-sync write key 1
      flush triggered, not yet scheduled
      sync write key 2
      system crash
      
      After rebooting, users might see key 2 but not key 1, which violates the API of sync write.
      
      This can be reproduced using unit test FaultInjectionTest::DISABLED_WriteOptionSyncTest.
      
      One way to fix it is for a sync write, if there is outstanding unsynced log files, we need to syc them too.
      "
      
      This diff should be considered together with the next diff D40905; in isolation this fix probably could be a little simpler.
      
      Test Plan: `make check`; added a test for that (DBTest.SyncingPreviousLogs) before noticing FaultInjectionTest.WriteOptionSyncTest (keeping both since mine asserts a bit more); both tests fail without this diff; for D40905 stacked on top of this diff, ran tests with ASAN, TSAN and valgrind
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40899
      fe09a6da
    • A
      Report live data size estimate · 06aebca5
      Andres Notzli 提交于
      Summary:
      Fixes T6548822. Added a new function for estimating the size of the live data
      as proposed in the task. The value can be accessed through the property
      rocksdb.estimate-live-data-size.
      
      Test Plan:
      There are two unit tests in version_set_test and a simple test in db_test.
      make version_set_test && ./version_set_test;
      make db_test && ./db_test gtest_filter=GetProperty
      
      Reviewers: rven, igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41493
      06aebca5
    • S
      Fix undeterministic failure of DBTest.GetPropertiesOfAllTablesTest · 02b635fa
      sdong 提交于
      Summary: DBTest.GetPropertiesOfAllTablesTest generates four files and expects four files there, but a L0->L1 comapction can trigger to compact to one single file. Fix it by raising level 0 number of file compaction trigger
      
      Test Plan: Run it many times and see it never fails.
      
      Reviewers: kradhakrishnan, IslamAbdelRahman, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D42789
      02b635fa
  14. 21 7月, 2015 4 次提交