1. 08 7月, 2015 6 次提交
  2. 07 7月, 2015 2 次提交
    • A
      Added tests for ExpandWhileOverlapping() · 58d7ab3c
      Andres Notzli 提交于
      Summary:
      This patch adds three test cases for ExpandWhileOverlapping()
      to the compaction_picker_test test suite.
      ExpandWhileOverlapping() only has an effect if the comparison
      function for the internal keys allows for overlapping user
      keys in different SST files on the same level. Thus, this
      patch adds a comparator based on sequence numbers to
      compaction_picker_test for the new test cases.
      
      Test Plan:
      - make compaction_picker_test && ./compaction_picker_test
        -> All tests pass
      - Replace body of ExpandWhileOverlapping() with `return true`
        -> Compile and run ./compaction_picker_test as before
        -> New tests fail
      
      Reviewers: sdong, yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41277
      58d7ab3c
    • I
      Fix compaction_job_test · 155ce60d
      Igor Canadi 提交于
      Summary:
      Two issues:
      * the input keys to the compaction don't include sequence number.
      * sequence number is set to max(seq_num), but it should be set to max(seq_num)+1, because the condition here is strictly-larger (i.e. we will only zero-out sequence number if the DB's sequence number is strictly greater than the key's sequence number): https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L830
      
      Test Plan: make compaction_job_test && ./compaction_job_test
      
      Reviewers: sdong, lovro
      
      Reviewed By: lovro
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41247
      155ce60d
  3. 06 7月, 2015 1 次提交
    • L
      Replace std::priority_queue in MergingIterator with custom heap · b6655a67
      lovro 提交于
      Summary:
      While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison.  Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge.
      
      Keys in our dataset include sequence numbers that increase with time.  Adjacent keys in an L0 file are very likely to be adjacent in the full database.  Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one.  It would be great to avoid the O(log K) operation per row while compacting.
      
      This diff replaces std::priority_queue with a custom binary heap implementation.  It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top).
      
      Test Plan:
      make check
      
      To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number).  I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs).  The exact improvement depends on the number of L0 files and the amount of locality.  Performance on randomly distributed keys seems on par with the old code.
      
      Reviewers: kailiu, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yoshinorim, dhruba, tnovak
      
      Differential Revision: https://reviews.facebook.net/D29133
      b6655a67
  4. 03 7月, 2015 6 次提交
    • A
      Introduce InfoLogLevel::HEADER_LEVEL · 35cd75c3
      Ari Ekmekji 提交于
      Summary:
       Introduced a new category in the enum InfoLogLevel in env.h.
       Modifed Log() in env.cc to use the Header()
       when the InfoLogLevel == HEADER_LEVEL.
       Updated tests in auto_roll_logger_test to ensure
       the header is handled properly in these cases.
      
      Test Plan: Augment existing tests in auto_roll_logger_test
      
      Reviewers: igor, sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41067
      35cd75c3
    • Y
      Fixed endless loop in DBIter::FindPrevUserKey() · acee2b08
      Yueh-Hsuan Chiang 提交于
      Summary: Fixed endless loop in DBIter::FindPrevUserKey()
      
      Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97
      
      Reviewers: tnovak, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41085
      acee2b08
    • M
      [wal changes 1/3] fixed unbounded wal growth in some workloads · 218487d8
      Mike Kolupaev 提交于
      Summary:
      This fixes the following scenario we've hit:
       - we reached max_total_wal_size, created a new wal and scheduled flushing all memtables corresponding to the old one,
       - before the last of these flushes started its column family was dropped; the last background flush call was a no-op; no one removed the old wal from alive_logs_,
       - hours have passed and no flushes happened even though lots of data was written; data is written to different column families, compactions are disabled; old column families are dropped before memtable grows big enough to trigger a flush; the old wal still sits in alive_logs_ preventing max_total_wal_size limit from kicking in,
       - a few more hours pass and we run out disk space because of one huge .log file.
      
      Test Plan: `make check`; backported the new test, checked that it fails without this diff
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D40893
      218487d8
    • A
      Fix unity build by removing anonymous namespace · e70115e7
      Aaron Feldman 提交于
      Summary: see title
      
      Test Plan: run 'make unity'
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D41079
      e70115e7
    • A
      Prepare 3.12 · 4159f5b8
      agiardullo 提交于
      Summary: About to cut release
      
      Test Plan: none
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D41061
      4159f5b8
    • A
      Multithreaded backup and restore in BackupEngineImpl · a69bc91e
      Aaron Feldman 提交于
      Summary:
      Add a new field: BackupableDBOptions.max_background_copies.
      CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
      If there is a backup rate limit, then max_background_copies must be 1.
      Update backupable_db_test.cc to test multi-threaded backup and restore.
      Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.
      
      Test Plan:
      Run ./backupable_db_test
      Run valgrind ./backupable_db_test
      Run with TSAN and ASAN
      
      Reviewers: yhchiang, rven, anthony, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: yhchiang, anthony, sdong, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40725
      a69bc91e
  5. 02 7月, 2015 2 次提交
    • Y
      [RocksJava] Fixed test failures · 03d433ee
      Yueh-Hsuan Chiang 提交于
      Summary:
      The option bottommost_level_compaction was introduced lately.
      This option breaks the Java API behavior. To prevent the library
      from doing so we set that option to a fixed value in Java.
      
      In future we are going to remove that portion and replace the
      hardcoded options using a more flexible way.
      
      Fixed bug introduced by WriteBatchWithIndex Patch
      
      Lately icanadi changed the behavior of WriteBatchWithIndex.
      See commit: 821cff11
      
      This commit solves problems introduced by above mentioned commit.
      
      Test Plan:
      make rocksdbjava
      make jtest
      
      Reviewers: adamretter, ankgup87, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: igor, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40647
      03d433ee
    • Y
      [RocksJava] Fix test failure of compactRangeToLevel · c00948d5
      Yueh-Hsuan Chiang 提交于
      Summary:
      Rewrite Java tests compactRangeToLevel and compactRangeToLevelColumnFamily
      to make them more deterministic and robust.
      
      Test Plan:
      make rocksdbjava
      make jtest
      
      Reviewers: anthony, fyrz, adamretter, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40941
      c00948d5
  6. 01 7月, 2015 3 次提交
    • S
      Allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena · 05e28319
      sdong 提交于
      Summary: Try to allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena, instead of calling malloc and free.
      
      Test Plan: valgrind check
      
      Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D40929
      05e28319
    • I
      Add rpath option to production builds for 4.8.1 toolchain · 436ed904
      Igor Canadi 提交于
      Summary: Copy change from D37533 to gcc 4.8.1 config
      
      Test Plan: make db_bench, `ldd db_bench`, try running it
      
      Reviewers: MarkCallaghan, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40845
      436ed904
    • K
      Increasing timeout for drop writes. · b0f1927d
      krad 提交于
      Summary: We have a race in the way test works. We avoided the race by adding the
      wait to the counter. I thought 1s was eternity, but that is not true in some
      scenarios. Increasing the timeout to 10s and adding warnings.
      
      Also, adding nosleep to avoid the case where the wakeup thread is waiting behind
      the sleeping thread for scheduling.
      
      Test Plan: Run make check
      
      Reviewers: siying igorcanadi
      
      CC: leveldb@
      
      Task ID: #7312624
      
      Blame Rev:
      b0f1927d
  7. 30 6月, 2015 5 次提交
    • T
      Fix a comparison in DBIter::FindPrevUserKey() · ec70fea4
      Tomislav Novak 提交于
      Summary:
      When seek target is a merge key (`kTypeMerge`), `DBIter::FindNextUserEntry()`
      advances the underlying iterator _past_ the current key (`saved_key_`); see
      `MergeValuesNewToOld()`. However, `FindPrevUserKey()` assumes that `iter_`
      points to an entry with the same user key as `saved_key_`. As a result,
      `it->Seek(key) && it->Prev()` can cause the iterator to be positioned at the
      _next_, instead of the previous, entry (new test, written by @lovro, reproduces
      the bug).
      
      This diff changes `FindPrevUserKey()` to also skip keys that are _greater_ than
      `saved_key_`.
      
      Test Plan: db_test
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba, lovro
      
      Differential Revision: https://reviews.facebook.net/D40791
      ec70fea4
    • Y
      Make column_family_test runnable in ROCKSDB_LITE · 501591c4
      Yueh-Hsuan Chiang 提交于
      Summary: Make column_family_test runnable in ROCKSDB_LITE.
      
      Test Plan: column_family_test
      
      Reviewers: sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40251
      501591c4
    • K
      Merge branch 'master' of github.com:facebook/rocksdb · 91cb82f3
      krad 提交于
      91cb82f3
    • I
      set -e in fb_compile_mongo.sh · 09f5a4b4
      Igor Canadi 提交于
      Summary: Based on @anthony's feedback, we want to fail early if our static linking fails.
      
      Test Plan: none
      
      Reviewers: anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, anthony, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40839
      09f5a4b4
    • K
      Fix race in unit test. · 6199cba9
      krad 提交于
      Summary: Avoid falling victim to race condition.
      
      Test Plan: Run the unit test
      
      Reviewers: sdong igor
      
      CC: leveldb@
      
      Task ID: #7312624
      
      Blame Rev:
      6199cba9
  8. 27 6月, 2015 4 次提交
    • I
      Use malloc_usable_size() for accounting block cache size · 0a019d74
      Igor Canadi 提交于
      Summary:
      Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
      
      This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
      
      This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
      
      I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
      
      Test Plan:
      1. fill up mongo instance with 80GB of data
      2. restart mongo with block cache size configured to 10GB
      3. do a table scan in mongo
      4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
      
      Reviewers: sdong, MarkCallaghan, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40635
      0a019d74
    • I
      Call merge operators with empty values · 4cbc4e6f
      Igor Canadi 提交于
      Summary: It's not really nice to call user's API with garbage data in new_value. This diff makes sure that new_value is empty before calling the merge operator.
      
      Test Plan: Added assert to Merge operator in merge_test
      
      Reviewers: sdong, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40773
      4cbc4e6f
    • I
      Fix mac compile · 619167ee
      Igor Canadi 提交于
      Summary: as title
      
      Test Plan: make check
      
      Reviewers: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40785
      619167ee
    • I
      Improve fb_compile_mongo.sh · 472e64d3
      Igor Canadi 提交于
      Summary: If we create a new temp directory for each build, scons will recompile everything because we have different parameters. Instead, let's set up a constant path to our static lib. That way we won't have to recompile.
      
      Test Plan: Run fb_compile_mongo.sh twice -- second time it didn't recompile everything
      
      Reviewers: MarkCallaghan, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40707
      472e64d3
  9. 26 6月, 2015 2 次提交
  10. 25 6月, 2015 3 次提交
    • I
      Reproducible MongoRocks compile with FB toolchain · dadc4297
      Igor Canadi 提交于
      Summary:
      Added a script that will compile MongoRocks with the same flags as RocksDB binary. On FB infra, we can now do:
      
        cd ~/rocksdb; make static_lib
        cd ~/mongo; ~/rocksdb/build_tools/fb_compile_mongo.sh
      
      No need to upgrade the g++ on the devbox (like Aaron and I did) or maintain a separate script to compile (like Mark did)
      
      fb_compile_mongo.sh gets the settings from fbcode_config.sh, so it also makes it easier to upgrade the environment one day.
      
      Test Plan: Compiled mongod with new script. Also, ldd output looks good: https://phabricator.fb.com/P19891602
      
      Reviewers: AaronFeldman, MarkCallaghan, anthony
      
      Reviewed By: anthony
      
      Subscribers: anthony, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40659
      dadc4297
    • Y
      Make stringappend_test runnable in ROCKSDB_LITE · 62a8fd15
      Yueh-Hsuan Chiang 提交于
      Summary: Make stringappend_test runnable in ROCKSDB_LITE
      
      Test Plan: stringappend_test
      
      Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40593
      62a8fd15
    • Y
      Improve the comment for BYTES_READ in statistics. · 48da7a9c
      Yueh-Hsuan Chiang 提交于
      Summary:
      BYTES_READ only count the number of logical bytes read from
      the DB::Get() function.  It neither includes all logical bytes read
      nor indicates IO read bytes.
      
      This patch improves the comment for BYTES_READ.
      
      Test Plan: Only change comment.
      
      Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D40599
      48da7a9c
  11. 24 6月, 2015 5 次提交
  12. 23 6月, 2015 1 次提交
    • K
      Introduce WAL recovery consistency levels · de85e4ca
      krad 提交于
      Summary:
      The "one size fits all" approach with WAL recovery will only introduce inconvenience for our varied clients as we go forward. The current recovery is a bit heuristic. We introduce the following levels of consistency while replaying the WAL.
      
      1. RecoverAfterRestart (kTolerateCorruptedTailRecords)
      
      This mocks the current recovery mode.
      
      2. RecoverAfterCleanShutdown (kAbsoluteConsistency)
      
      This is ideal for unit test and cases where the store is shutdown cleanly. We tolerate no corruption or incomplete writes.
      
      3. RecoverPointInTime (kPointInTimeRecovery)
      
      This is ideal when using devices with controller cache or file systems which can loose data on restart. We recover upto the point were is no corruption or incomplete write.
      
      4. RecoverAfterDisaster (kSkipAnyCorruptRecord)
      
      This is ideal mode to recover data. We tolerate corruption and incomplete writes, and we hop over those sections that we cannot make sense of salvaging as many records as possible.
      
      Test Plan:
      (1) Run added unit test to cover all levels.
      (2) Run make check.
      
      Reviewers: leveldb, sdong, igor
      
      Subscribers: yoshinorim, dhruba
      
      Differential Revision: https://reviews.facebook.net/D38487
      de85e4ca