1. 11 5月, 2016 1 次提交
    • R
      [rocksdb] Memtable Log Referencing and Prepared Batch Recovery · 1b8a2e8f
      Reid Horuff 提交于
      Summary:
      This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC.
      
      modfication of DBImpl::WriteImpl()
      - added two arguments *uint64_t log_used = nullptr, uint64_t log_ref = 0;
      - *log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place.
      -  log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed.
      
      - Recovery/writepath is now aware of prepared batches and commit and rollback markers.
      
      Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D56919
      1b8a2e8f
  2. 10 5月, 2016 1 次提交
    • I
      Add bottommost_compression option · 4b317234
      Islam AbdelRahman 提交于
      Summary:
      Add a new option that can be used to set a specific compression algorithm for bottommost level.
      This option will only affect levels larger than base level.
      
      I have also updated CompactionJobInfo to include the compression algorithm used in compaction
      
      Test Plan:
      added new unittest
      existing unittests
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: lightmark, andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D57669
      4b317234
  3. 30 4月, 2016 1 次提交
    • Y
      Added EventListener::OnTableFileCreationStarted() callback · a92049e3
      Yi Wu 提交于
      Summary: Added EventListener::OnTableFileCreationStarted. EventListener::OnTableFileCreated will be called on failure case. User can check creation status via TableFileCreationInfo::status.
      
      Test Plan: unit test.
      
      Reviewers: dhruba, yhchiang, ott, sdong
      
      Reviewed By: sdong
      
      Subscribers: sdong, kradhakrishnan, IslamAbdelRahman, andrewkr, yhchiang, leveldb, ott, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56337
      a92049e3
  4. 29 4月, 2016 1 次提交
    • S
      Not enable jemalloc status printing if USE_CLANG=1 · 992a8f83
      sdong 提交于
      Summary: Warning is printed out with USE_CLANG=1 when including jemalloc.h. Disable it in that case.
      
      Test Plan: Run db_bench with USE_CLANG=1 and not. Make sure they can all build and jemalloc status is printed out in the case where USE_CLANG is not set.
      
      Reviewers: andrewkr, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57399
      992a8f83
  5. 28 4月, 2016 2 次提交
    • I
      Fix build on machines without jemalloc · 0850bc51
      Islam AbdelRahman 提交于
      Summary: It looks like we mistakenly enable JEMALLOC even if it's not available on the machine, that's why travis is failing
      
      Test Plan:
      check on my devserver
      check on my mac
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57345
      0850bc51
    • S
      Print memory allocation counters · 1c80dfab
      Sergey Makarenko 提交于
      Summary:
      Introduced option to dump malloc statistics using new option flag.
          Added new command line option to db_bench tool to enable this
          funtionality.
          Also extended build to support environments with/without jemalloc.
      
      Test Plan:
      1) Build rocksdb using `make` command. Launch the following command
          `./db_bench --benchmarks=fillrandom --dump_malloc_stats=true
          --num=10000000` end verified that jemalloc dump is present in LOG file.
          2) Build rocksdb using `DISABLE_JEMALLOC=1  make db_bench -j32` and ran
          the same db_bench tool and found the following message in LOG file:
          "Please compile with jemalloc to enable malloc dump".
          3) Also built rocksdb using `make` command on MacOS to verify behavior
          in non-FB environment.
          Also to debug build configuration change temporary changed
          AM_DEFAULT_VERBOSITY = 1 in Makefile to see compiler and build
          tools output. For case 1) -DROCKSDB_JEMALLOC was present in compiler
          command line. For both 2) and 3) this flag was not present.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57321
      1c80dfab
  6. 27 4月, 2016 1 次提交
    • S
      CompactedDB should not be used if there is outstanding WAL files · ac0e54b4
      sdong 提交于
      Summary: CompactedDB skips memtable. So we shouldn't use compacted DB if there is outstanding WAL files.
      
      Test Plan: Change to options.max_open_files = -1 perf context test to create a compacted DB, which we shouldn't do.
      
      Reviewers: yhchiang, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57057
      ac0e54b4
  7. 26 4月, 2016 1 次提交
  8. 16 4月, 2016 1 次提交
  9. 09 4月, 2016 1 次提交
    • J
      Make sure that if use_mmap_reads is on use_os_buffer is also on · 2448f803
      Jay Edgar 提交于
      Summary: The code assumes that if use_mmap_reads is on then use_os_buffer is also on.  This make sense as by using memory mapped files for reading you are expecting the OS to cache what it needs.  Add code to make sure the user does not turn off use_os_buffer when they turn on use_mmap_reads
      
      Test Plan: New test: DBTest.MMapAndBufferOptions
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56397
      2448f803
  10. 07 4月, 2016 1 次提交
    • A
      Embed column family name in SST file · 2391ef72
      Andrew Kryczka 提交于
      Summary:
      Added the column family name to the properties block. This property
      is omitted only if the property is unavailable, such as when RepairDB()
      writes SST files.
      
      In a next diff, I will change RepairDB to use this new property for
      deciding to which column family an existing SST file belongs. If this
      property is missing, it will add it to the "unknown" column family (same
      as its existing behavior).
      
      Test Plan:
      New unit test:
      
        $ ./db_table_properties_test --gtest_filter=DBTablePropertiesTest.GetColumnFamilyNameProperty
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55605
      2391ef72
  11. 02 4月, 2016 1 次提交
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. · 9b519875
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      
      Test Plan:
      'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
      I didn't run the Java tests, I don't have Java set up on my devserver.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56133
      9b519875
  12. 31 3月, 2016 1 次提交
  13. 23 3月, 2016 1 次提交
  14. 19 3月, 2016 1 次提交
  15. 12 3月, 2016 1 次提交
    • B
      fix: handle_fatal_signal (sig=6) in std::vector<std::string,... · e8e6cf01
      Baris Yazici 提交于
      fix: handle_fatal_signal (sig=6) in std::vector<std::string, std::allocator<std::string> >::_M_range_check | c++/4.8.2/bits/stl_vector.h:794 #174
      
      Summary:
      Fix for https://github.com/facebook/mysql-5.6/issues/174
      
      When there is no old files to purge, vector.at(i) function was crashing
      
      if (old_info_log_file_count != 0 &&
            old_info_log_file_count >= db_options_.keep_log_file_num) {
          std::sort(old_info_log_files.begin(), old_info_log_files.end());
          size_t end = old_info_log_file_count - db_options_.keep_log_file_num;
          for (unsigned int i = 0; i <= end; i++) {
            std::string& to_delete = old_info_log_files.at(i);
      
      Added check to old_info_log_file_count be non zero.
      
      Test Plan: run existing tests
      
      Reviewers: gunnarku, vasilep, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: andrewkr, webscalesql-eng, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55245
      e8e6cf01
  16. 11 3月, 2016 2 次提交
    • A
      Cleanup stale manifests outside of full purge · d9620239
      Andrew Kryczka 提交于
      Summary:
      - Keep track of obsolete manifests in VersionSet
      - Updated FindObsoleteFiles() to put obsolete manifests in the JobContext for later use by PurgeObsoleteFiles()
      - Added test case that verifies a stale manifest is deleted by a non-full purge
      
      Test Plan:
        $ ./backupable_db_test --gtest_filter=BackupableDBTest.ChangeManifestDuringBackupCreation
      
      Reviewers: IslamAbdelRahman, yoshinorim, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55269
      d9620239
    • Y
      Update compaction score right after CompactFiles forms a compaction · 765597fa
      Yueh-Hsuan Chiang 提交于
      Summary:
      This is a follow-up patch of https://reviews.facebook.net/D54891.
      As the information about files being compacted will also be used
      when making compaction decision, it is necessary to update the compaction
      score when a compaction plan has been made but not yet execute.
      
      This patch adds a missing call to update the compaction score in
      CompactFiles().
      
      Test Plan: compact_files_test
      
      Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55227
      765597fa
  17. 05 3月, 2016 1 次提交
    • Y
      Fix a bug where flush does not happen when a manual compaction is running · a7d4eb2f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, when rocksdb tries to run manual compaction to refit data into a level,
      there's a ReFitLevel() process that requires no bg work is currently running.
      When RocksDB plans to ReFitLevel(), it will do the following:
      
       1. pause scheduling new bg work.
       2. wait until all bg work finished
       3. do the ReFitLevel()
       4. unpause scheduling new bg work.
      
      However, as it pause scheduling new bg work at step one and waiting for all bg work
      finished in step 2, RocksDB will stop flushing until all bg work is done (which
      could take a long time.)
      
      This patch fix this issue by changing the way ReFitLevel() pause the background work:
      
      1. pause scheduling compaction.
      2. wait until all bg work finished.
      3. pause scheduling flush
      4. do ReFitLevel()
      5. unpause both flush and compaction.
      
      The major difference is that.  We only pause scheduling compaction in step 1 and wait
      for all bg work finished in step 2.  This prevent flush being blocked for a long time.
      Although there's a very rare case that ReFitLevel() might be in starvation in step 2,
      but it's less likely the case as flush typically finish very fast.
      
      Test Plan: existing test.
      
      Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55029
      a7d4eb2f
  18. 04 3月, 2016 1 次提交
  19. 03 3月, 2016 1 次提交
    • S
      Add Iterator Property rocksdb.iterator.version_number · e79ad9e1
      sdong 提交于
      Summary: We want to provide a way to detect whether an iterator is stale and needs to be recreated. Add a iterator property to return version number.
      
      Test Plan: Add two unit tests for it.
      
      Reviewers: IslamAbdelRahman, yhchiang, anthony, kradhakrishnan, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54921
      e79ad9e1
  20. 02 3月, 2016 1 次提交
    • I
      Fix DB::AddFile() issue when PurgeObsoleteFiles() is called · 6743135e
      Islam AbdelRahman 提交于
      Summary:
      In some situations the DB will scan all existing files in the DB path and delete the ones that are Obsolete.
      If this happen during adding an external sst file. this could cause the file to be deleted while we are adding it.
      This diff fix this issue
      
      Test Plan:
      unit test to reproduce the bug
      existing unit tests
      
      Reviewers: sdong, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D54627
      6743135e
  21. 24 2月, 2016 1 次提交
    • S
      Fix assert failure when DBImpl::SyncWAL() conflicts with log rolling · 38201b35
      sdong 提交于
      Summary: DBImpl::SyncWAL() releases db mutex before calling DBImpl::MarkLogsSynced(), while inside DBImpl::MarkLogsSynced() we assert there is none or one outstanding log file. However, a memtable switch can happen in between and causing two or outstanding logs there, failing the assert. The diff adds a unit test that repros the issue and fix the assert so that the unit test passes.
      
      Test Plan: Run the new tests.
      
      Reviewers: anthony, kolmike, yhchiang, IslamAbdelRahman, kradhakrishnan, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54621
      38201b35
  22. 23 2月, 2016 1 次提交
    • M
      Fixed CompactFiles() spuriously failing or corrupting DB · eef63ef8
      Mike Kolupaev 提交于
      Summary:
      We started getting two kinds of crashes since we started using `DB::CompactFiles()`:
      (1) `CompactFiles()` fails saying something like "/data/logdevice/4440/shard12/012302.sst: No such file or directory", and presumably makes DB read-only,
      (2) DB fails to open saying "Corruption: Can't access /267000.sst: IO error: /data/logdevice/4440/shard1/267000.sst: No such file or directory".
      
      AFAICT, both can be explained by background thread deleting compaction output as "obsolete" while it's being written, before it's committed to manifest. If it ends up committed to the manifest, we get (2); if compaction notices the disappearance and fails, we get (1). The internal tasks t10068021 and t10134177 have some details about the investigation that led to this.
      
      Test Plan: `make -j check`; the new test fails to reopen the DB without the fix
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D54561
      eef63ef8
  23. 18 2月, 2016 1 次提交
  24. 17 2月, 2016 2 次提交
    • R
      Fix WriteImpl empty batch hanging issue · 5bcf952a
      reid horuff 提交于
      Summary: There is an issue in DBImpl::WriteImpl where if an empty writebatch comes in and sync=true then the logs will be marked as being synced yet the sync never actually happens because there is no data in the writebatch. This causes the next incoming batch to hang while waiting for the logs to complete syncing. This fix syncs logs even if the writebatch is empty.
      
      Test Plan: DoubleEmptyBatch unit test in transaction_test.
      
      Reviewers: yoshinorim, hermanlee4, sdong, ngbronson, anthony
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54057
      5bcf952a
    • M
      Fixed a segfault when compaction fails · 44371501
      Mike Kolupaev 提交于
      Summary: We've hit it today.
      
      Test Plan: `make -j check`; didn't reproduce the issue
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D54219
      44371501
  25. 10 2月, 2016 2 次提交
  26. 06 2月, 2016 1 次提交
    • R
      Improve perf of Pessimistic Transaction expirations (and optimistic transactions) · 6f71d3b6
      reid horuff 提交于
      Summary:
      copy from task 8196669:
      
      1) Optimistic transactions do not support batching writes from different threads.
      2) Pessimistic transactions do not support batching writes if an expiration time is set.
      
      In these 2 cases, we currently do not do any write batching in DBImpl::WriteImpl() because there is a WriteCallback that could decide at the last minute to abort the write.  But we could support batching write operations with callbacks if we make sure to process the callbacks correctly.
      
      To do this, we would first need to modify write_thread.cc to stop preventing writes with callbacks from being batched together.  Then we would need to change DBImpl::WriteImpl() to call all WriteCallback's in a batch, only write the batches that succeed, and correctly set the state of each batch's WriteThread::Writer.
      
      Test Plan: Added test WriteWithCallbackTest to write_callback_test.cc which creates multiple client threads and verifies that writes are batched and executed properly.
      
      Reviewers: hermanlee4, anthony, ngbronson
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52863
      6f71d3b6
  27. 03 2月, 2016 1 次提交
    • A
      Eliminate duplicated property constants · 284aa613
      Andrew Kryczka 提交于
      Summary:
      Before this diff, there were duplicated constants to refer to properties (user-
      facing API had strings and InternalStats had an enum). I noticed these were
      inconsistent in terms of which constants are provided, names of constants, and
      documentation of constants. Overall it seemed annoying/error-prone to maintain
      these duplicated constants.
      
      So, this diff gets rid of InternalStats's constants and replaces them with a map
      keyed on the user-facing constant. The value in that map contains a function
      pointer to get the property value, so we don't need to do string matching while
      holding db->mutex_. This approach has a side benefit of making many small
      handler functions rather than a giant switch-statement.
      
      Test Plan: db_properties_test passes, running "make commit-prereq -j32"
      
      Reviewers: sdong, yhchiang, kradhakrishnan, IslamAbdelRahman, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D53253
      284aa613
  28. 02 2月, 2016 1 次提交
  29. 01 2月, 2016 1 次提交
  30. 30 1月, 2016 1 次提交
    • V
      Add options.base_background_compactions as a number of compaction threads for low compaction debt · 3b2a1ddd
      Venkatesh Radhakrishnan 提交于
      Summary:
      If options.base_background_compactions is given, we try to schedule number of compactions not existing this number, only when L0 files increase to certain number, or pending compaction bytes more than certain threshold, we schedule compactions based on options.max_background_compactions.
      
      The watermarks are calculated based on slowdown thresholds.
      
      Test Plan:
      Add new test cases in column_family_test.
      Adding more unit tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, kradhakrishnan, rven, anthony
      
      Reviewed By: anthony
      
      Subscribers: leveldb, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D53409
      3b2a1ddd
  31. 29 1月, 2016 1 次提交
  32. 27 1月, 2016 3 次提交
  33. 26 1月, 2016 1 次提交
  34. 19 1月, 2016 1 次提交