1. 23 12月, 2015 1 次提交
  2. 22 12月, 2015 1 次提交
    • A
      add call to install superversion and schedule work in enableautocompactions · 33e09c0e
      Alex Yang 提交于
      Summary:
      This patch fixes https://github.com/facebook/mysql-5.6/issues/121
      
      There is a recent change in rocksdb to disable auto compactions on startup: https://reviews.facebook.net/D51147. However, there is a small timing window where a column family needs to be compacted and schedules a compaction, but the scheduled compaction fails when it checks the disable_auto_compactions setting. The expectation is once the application is ready, it will call EnableAutoCompactions() to allow new compactions to go through. However, if the Column family is stalled because L0 is full, and no writes can go through, it is possible the column family may never have a new compaction request get scheduled. EnableAutoCompaction() should probably schedule an new flush and compaction event when it resets disable_auto_compaction.
      
      Using InstallSuperVersionAndScheduleWork, we call SchedulePendingFlush,
      SchedulePendingCompaction, as well as MaybeScheduleFlushOrcompaction on all the
      column families to avoid the situation above.
      
      This is still a first pass for feedback.
      Could also just call SchedePendingFlush and SchedulePendingCompaction directly.
      
      Test Plan:
      Run on Asan build
      cd _build-5.6-ASan/ && ./mysql-test/mtr --mem --big --testcase-timeout=36000 --suite-timeout=12000 --parallel=16 --suite=rocksdb,rocksdb_rpl,rocksdb_sys_vars --mysqld=--default-storage-engine=rocksdb --mysqld=--skip-innodb --mysqld=--default-tmp-storage-engine=MyISAM --mysqld=--rocksdb rocksdb_rpl.rpl_rocksdb_stress_crash --repeat=1000
      
      Ensure that it no longer hangs during the test.
      
      Reviewers: hermanlee4, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, yhchiang, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51747
      33e09c0e
  3. 18 12月, 2015 1 次提交
  4. 17 12月, 2015 1 次提交
    • I
      Introduce ReadOptions::pin_data (support zero copy for keys) · aececc20
      Islam AbdelRahman 提交于
      Summary:
      This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted
      
      ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted
      Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted
      
      Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed.
      
      Benchmark results (using https://phabricator.fb.com/P20083553)
      
      ```
      // $ du -h /home/tec/local/normal.4K.Snappy/db10077
      // 6.1G    /home/tec/local/normal.4K.Snappy/db10077
      
      // $ du -h /home/tec/local/zero.8K.LZ4/db10077
      // 6.4G    /home/tec/local/zero.8K.LZ4/db10077
      
      // Benchmarks for shard db10077
      // _build/opt/rocks/benchmark/rocks_copy_benchmark \
      //      --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \
      //      --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077"
      
      // First run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                                 1.73s  576.97m
      // BM_StringPiece                                   103.74%      1.67s  598.55m
      // ============================================================================
      // Match rate : 1000000 / 1000000
      
      // Second run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                              611.99ms     1.63
      // BM_StringPiece                                   203.76%   300.35ms     3.33
      // ============================================================================
      // Match rate : 1000000 / 1000000
      ```
      
      Test Plan: Unit tests
      
      Reviewers: sdong, igor, anthony, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, lovro, adsharma
      
      Differential Revision: https://reviews.facebook.net/D48999
      aececc20
  5. 16 12月, 2015 1 次提交
    • G
      Fix minor bugs in delete operator, snprintf, and size_t usage · 97265f5f
      Gunnar Kudrjavets 提交于
      Summary:
      List of changes:
      
      1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size.
      
      2) Remove unnecessary checks before calling delete operator.
      
      3) Increase code correctness by using size_t type when getting vector's size.
      
      4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage.
      
      5) Fix various lint errors pointed out by 'arc lint'.
      
      Test Plan:
      Code review and build:
      
      git diff
      make clean
      make -j 32 commit-prereq
      arc lint
      
      Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51849
      97265f5f
  6. 15 12月, 2015 1 次提交
    • V
      Running manual compactions in parallel with other automatic or manual... · 030215bf
      Venkatesh Radhakrishnan 提交于
      Running manual compactions in parallel with other automatic or manual compactions in restricted cases
      
      Summary:
      This diff provides a framework for doing manual
      compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
      BackgroundCompactions, so that RunManualCompactions can be reentrant.
      Parallelism is controlled by the two routines
      ConflictingManualCompaction to allow/disallow new parallel/manual
      compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
      I will be adding more tests later.
      
      Test Plan: Rocksdb regression + new tests + valgrind
      
      Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D47973
      030215bf
  7. 12 12月, 2015 2 次提交
    • D
      Enable MS compiler warning c4244. · 236fe21c
      Dmitri Smirnov 提交于
        Mostly due to the fact that there are differences in sizes of int,long
        on 64 bit systems vs GNU.
      236fe21c
    • A
      Use SST files for Transaction conflict detection · 3bfd3d39
      agiardullo 提交于
      Summary:
      Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.
      
      With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).
      
      Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.
      
      Test Plan: unit tests, db bench
      
      Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D50475
      3bfd3d39
  8. 11 12月, 2015 1 次提交
    • A
      Change SingleDelete to support conflict checking · 9e446290
      agiardullo 提交于
      Summary: For Transactions, we want to start using the SST files to do write conflict checking.  To do this, we need to make sure that compaction never removes all writes if an earlier snapshot exists.  So I had to change the way we process SingleDeletes to sometimes leave a SingleDelete behind when we encounter a Put followed by a SingleDelete.  See the comments in this diff for a more detailed explanation.
      
      Test Plan: added more unit tests
      
      Reviewers: rven, igor, kradhakrishnan, IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50295
      9e446290
  9. 09 12月, 2015 3 次提交
  10. 08 12月, 2015 3 次提交
    • A
      Support marking snapshots for write-conflict checking · ec704aaf
      agiardullo 提交于
      Summary:
      D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.
      
      This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.
      
      This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).
      
      Test Plan: no behavior change, ran existing tests
      
      Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51183
      ec704aaf
    • S
      Revert "Fix a race condition in persisting options" · f307036b
      sdong 提交于
      This reverts commit 2fa3ed51. It breaks RocksDB lite build
      f307036b
    • Y
      Fix a race condition in persisting options · 2fa3ed51
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch fix a race condition in persisting options which will cause a crash when:
      
      * Thread A obtain cf options and start to persist options based on that cf options.
      * Thread B kicks in and finish DropColumnFamily and delete cf_handle.
      * Thread A wakes up and tries to finish the persisting options and crashes.
      
      Test Plan: Add a test in column_family_test that can reproduce the crash
      
      Reviewers: anthony, IslamAbdelRahman, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D51609
      2fa3ed51
  11. 04 12月, 2015 1 次提交
    • A
      added public api to schedule flush/compaction, code to prevent race with db::open · e8180f99
      Alex Yang 提交于
      Summary:
      Fixes T8781168.
      
      Added a new function EnableAutoCompactions in db.h to be publicly
      avialable.  This allows compaction to be re-enabled after disabling it via
      SetOptions
      
      Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
      Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
      prevent race condition.
      
      Test Plan:
      Ran make all check
      
      verified fix on myrocks side:
      was able to reproduce the seg fault with
      ../tools/mysqltest.sh --mem --force rocksdb.drop_table
      
      method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
      assigned in transaction_db_impl.cc:
        DB::Open(db_options, dbname, column_families_copy, handles, &db);
        clock_t goal = (60000 * 10) + clock();
        while (goal > clock());
        ...dbptr(aka rdb) gets assigned below
      
      verified my changes fixed the issue.
      
      Also added unit test 'ToggleAutoCompaction' in transaction_test.cc
      
      Reviewers: hermanlee4, anthony
      
      Reviewed By: anthony
      
      Subscribers: alex, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51147
      e8180f99
  12. 01 12月, 2015 1 次提交
    • S
      DB to only flush the column family with the largest memtable while... · db320b1b
      sdong 提交于
      DB to only flush the column family with the largest memtable while option.db_write_buffer_size is hit
      
      Summary: When option.db_write_buffer_size is hit, we currently flush all column families. Move to flush the column family with the largest active memt table instead. In this way, we can avoid too many small files in some cases.
      
      Test Plan: Modify test DBTest.SharedWriteBuffer to work with the updated behavior
      
      Reviewers: kradhakrishnan, yhchiang, rven, anthony, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: march, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51291
      db320b1b
  13. 17 11月, 2015 1 次提交
  14. 14 11月, 2015 1 次提交
  15. 13 11月, 2015 1 次提交
    • N
      Don't merge WriteBatch-es if WAL is disabled · 6ce42dd0
      Nathan Bronson 提交于
      Summary:
      There's no need for WriteImpl to flatten the write batch group
      into a single WriteBatch if the WAL is disabled.  This diff moves the
      flattening into the WAL step, and skips flattening entirely if it isn't
      needed.  It's good for about 5% speedup on a multi-threaded workload
      with no WAL.
      
      This diff also adds clarifying comments about the chance for partial
      failure of WriteBatchInternal::InsertInto, and always sets bg_error_ if
      the memtable state diverges from the logged state or if a WriteBatch
      succeeds only partially.
      
      Benchmark for speedup:
        db_bench -benchmarks=fillrandom -threads=16 -batch_size=1 -memtablerep=skip_list -value_size=0 --num=200000 -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
      
      Test Plan: asserts + make check
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D50583
      6ce42dd0
  16. 11 11月, 2015 1 次提交
    • Y
      Enable RocksDB to persist Options file. · e114f0ab
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch allows rocksdb to persist options into a file on
      DB::Open, SetOptions, and Create / Drop ColumnFamily.
      Options files are created under the same directory as the rocksdb
      instance.
      
      In addition, this patch also adds a fail_if_missing_options_file in DBOptions
      that makes any function call return non-ok status when it is not able to
      persist options properly.
      
        // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
        // / SetOptions will fail if options file is not detected or properly
        // persisted.
        //
        // DEFAULT: false
        bool fail_if_missing_options_file;
      
      Options file names are formatted as OPTIONS-<number>, and RocksDB
      will always keep the latest two options files.
      
      Test Plan:
      Add options_file_test.
      
      options_test
      column_family_test
      
      Reviewers: igor, IslamAbdelRahman, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D48285
      e114f0ab
  17. 06 11月, 2015 1 次提交
    • V
      Prefix-based iterating only shows keys in prefix · 9d50afc3
      Venkatesh Radhakrishnan 提交于
      Summary:
      MyRocks testing found an issue that while iterating over keys
      that are outside the prefix, sometimes wrong results were seen for keys
      outside the prefix. We now tighten the range of keys seen with a new
      read option called prefix_seen_at_start. This remembers the starting
      prefix and then compares it on a Next for equality of prefix. If they
      are from a different prefix, it sets valid to false.
      
      Test Plan: PrefixTest.PrefixValid
      
      Reviewers: IslamAbdelRahman, sdong, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: spetrunia, hermanlee4, yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50211
      9d50afc3
  18. 04 11月, 2015 1 次提交
  19. 31 10月, 2015 1 次提交
    • I
      Update DB::AddFile() to have less restrictions · ff4499e2
      Islam AbdelRahman 提交于
      Summary:
      Update DB::AddFile() restrictions to be
        - Key range in loaded table file don't overlap with existing keys or tombstones in DB.
        - No other writes happen during AddFile call.
      
      The updated AddFile() will verify that the file key range don't overlap with any keys or tombstones in the DB, and then add the file to L0
      
      Test Plan: unit tests
      
      Reviewers: igor, rven, anthony, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: adsharma, ameyag, dhruba
      
      Differential Revision: https://reviews.facebook.net/D49233
      ff4499e2
  20. 30 10月, 2015 2 次提交
    • I
      Clean and expose CreateLoggerFromOptions · 2872e0c8
      Islam AbdelRahman 提交于
      Summary:
      CreateLoggerFromOptions have some parameters like  db_log_dir and env, these parameters are redundant since they already exist in DBOptions
      
      this patch remove the redundant parameters and expose CreateLoggerFromOptions to users
      
      Test Plan: make check
      
      Reviewers: igor, anthony, yhchiang, rven, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D49713
      2872e0c8
    • S
      "make format" in some recent commits · 296c3a1f
      sdong 提交于
      Summary: Run "make format" for some recent commits.
      
      Test Plan: Build and run tests
      
      Reviewers: IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D49707
      296c3a1f
  21. 29 10月, 2015 1 次提交
  22. 27 10月, 2015 1 次提交
  23. 20 10月, 2015 2 次提交
  24. 19 10月, 2015 5 次提交
  25. 18 10月, 2015 1 次提交
  26. 17 10月, 2015 3 次提交
  27. 14 10月, 2015 1 次提交