1. 02 4月, 2016 7 次提交
    • A
      No need to limit to 20 files in UpdateAccumulatedStats() if options.max_open_files=-1 · cc87075d
      Aaron Gao 提交于
      Summary:
      There is a hardcoded constraint in our statistics collection that prevents reading properties from more than 20 SST files. This means our statistics will be very inaccurate for databases with > 20 files since additional files are just ignored. The purpose of constraining the number of files used is to bound the I/O performed during statistics collection, since these statistics need to be recomputed every time the database reopened.
      
      However, this constraint doesn't take into account the case where option "max_open_files" is -1. In that case, all the file metadata has already been read, so MaybeInitializeFileMetaData() won't incur any I/O cost. so this diff gets rid of the 20-file constraint in case max_open_files == -1.
      
      Test Plan:
      write into unit test db/db_properties_test.cc - "ValidateSampleNumber".
      We generate 20 files with 2 rows and 10 files with 1 row.
      If max_open_files !=-1, the `rocksdb.estimate-num-keys` should be (10*1 + 10*2)/20 * 30 = 45. Otherwise, it should be the ground truth, 50.
      {F1089153}
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56253
      cc87075d
    • I
      Eliminate std::deque initialization while iterating over merge operands · 8a1a603f
      Islam AbdelRahman 提交于
      Summary:
      This patch is similar to D52563, When we iterate over a DB with merge operands we keep creating std::queue to store the operands, optimize this by reusing merge_operands_ data member
      
      Before the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.757 micros/op 266141 ops/sec;   29.4 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.413 micros/op 2423538 ops/sec;  268.1 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.451 micros/op 2219071 ops/sec;  245.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.420 micros/op 2382039 ops/sec;  263.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.408 micros/op 2452017 ops/sec;  271.3 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.947 micros/op 253376 ops/sec;   28.0 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.441 micros/op 2266473 ops/sec;  250.7 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.471 micros/op 2122033 ops/sec;  234.8 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.440 micros/op 2271407 ops/sec;  251.3 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.429 micros/op 2331471 ops/sec;  257.9 MB/s
      ```
      
      with the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       4.080 micros/op 245092 ops/sec;   27.1 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.308 micros/op 3241843 ops/sec;  358.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.312 micros/op 3200408 ops/sec;  354.0 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.332 micros/op 3013962 ops/sec;  333.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.300 micros/op 3328017 ops/sec;  368.2 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.973 micros/op 251705 ops/sec;   27.8 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.320 micros/op 3123752 ops/sec;  345.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.335 micros/op 2986641 ops/sec;  330.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.339 micros/op 2950047 ops/sec;  326.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.319 micros/op 3131565 ops/sec;  346.4 MB/s
      ```
      
      Test Plan: make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56031
      8a1a603f
    • I
      WriteBatchWithIndex micro optimization · f38540b1
      Islam AbdelRahman 提交于
      Summary:
        - Put key offset and key size in WriteBatchIndexEntry
        - Use vector for comparators in WriteBatchEntryComparator
      
      I use a slightly modified version of @yoshinorim code to benchmark
      https://gist.github.com/IslamAbdelRahman/b120f4fba8d6ff7d58d2
      
      For Put I create a transaction that put a 1000000 keys and measure the time spent without commit.
      For GetForUpdate I read the keys that I added in the Put transaction.
      
      Original time:
      
      ```
       rm -rf /dev/shm/rocksdb-example/
       ./txn_bench put 1000000
       1000000 OK Ops | took      3.679 seconds
       ./txn_bench get_for_update 1000000
       1000000 OK Ops | took      3.940 seconds
      ```
      
      New Time
      
      ```
        rm -rf /dev/shm/rocksdb-example/
       ./txn_bench put 1000000
       1000000 OK Ops | took      2.727 seconds
       ./txn_bench get_for_update 1000000
       1000000 OK Ops | took      3.880 seconds
      ```
      
      It looks like there is no significant improvement in GetForUpdate() but we can see ~30% improvement in Put()
      
      Test Plan: unittests
      
      Reviewers: yhchiang, anthony, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D55539
      f38540b1
    • A
      Merge pull request #1053 from adamretter/benchmark-java-comparator · 20065406
      Adam Retter 提交于
      Benchmark Java comparator vs C++ comparator
      20065406
    • A
      Stderr info logger · f2c43a4a
      Andrew Kryczka 提交于
      Summary:
      Adapted a stderr logger from the option tests. Moved it to a separate
      header so we can reuse it, e.g., from ldb subcommands for faster debugging. This
      is especially useful to make errors/warnings more visible when running
      "ldb repair", which involves potential data loss.
      
      Test Plan:
      ran options_test and "ldb repair"
      
        $ ./ldb repair --db=./tmp/
        [WARN] **** Repaired rocksdb ./tmp/; recovered 1 files; 588bytes. Some data may have been lost. ****
        OK
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56151
      f2c43a4a
    • U
      Rocksdb backup can store optional application specific metadata · b55e2165
      Uddipta Maity 提交于
      Summary:
      Rocksdb backup engine maintains metadata about backups in separate files. But,
      there was no way to add extra application specific data to it. Adding support
      for that.
      In some use cases, applications decide to restore a backup based on some
      metadata. This will help those cases to cheaply decide whether to restore or
      not.
      
      Test Plan:
      Added a unit test. Existing ones are passing
      
      Sample meta file for BinaryMetadata test-
      
      ```
      
      1459454043
      0
      metadata 6162630A64656600676869
      2
      private/1/MANIFEST-000001 crc32 1184723444
      private/1/CURRENT crc32 3505765120
      
      ```
      
      Reviewers: sdong, ldemailly, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, ldemailly
      
      Differential Revision: https://reviews.facebook.net/D56007
      b55e2165
    • M
      Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. · 9b519875
      Marton Trencseni 提交于
      Summary:
      When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
      What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
      
      Test Plan:
      'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
      I didn't run the Java tests, I don't have Java set up on my devserver.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56133
      9b519875
  2. 01 4月, 2016 3 次提交
    • S
      Change some RocksDB default options · 2feafa3d
      sdong 提交于
      Summary: Change some RocksDB default options to make it more friendly to server workloads.
      
      Test Plan: Run all existing tests
      
      Reviewers: yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55941
      2feafa3d
    • Y
      Fixed compile warnings in posix_logger.h and coding.h · a558830f
      Yueh-Hsuan Chiang 提交于
      Summary:
      Fixed the following compile warnings:
      
      /Users/yhchiang/rocksdb/util/posix_logger.h:32:11: error: unused variable 'kDebugLogChunkSize' [-Werror,-Wunused-const-variable]
      const int kDebugLogChunkSize = 128 * 1024;
                ^
      /Users/yhchiang/rocksdb/util/coding.h:24:20: error: unused variable 'kMaxVarint32Length' [-Werror,-Wunused-const-variable]
      const unsigned int kMaxVarint32Length = 5;
                         ^
      2 errors generated.
      
      Test Plan: make clean rocksdb
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven, kradhakrishnan, adamretter
      
      Reviewed By: adamretter
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56223
      a558830f
    • Y
      Merge pull request #980 from adamretter/java-arm · 51c9464d
      Yueh-Hsuan Chiang 提交于
      ARM for the Java API
      51c9464d
  3. 31 3月, 2016 7 次提交
  4. 30 3月, 2016 1 次提交
  5. 26 3月, 2016 2 次提交
  6. 25 3月, 2016 2 次提交
    • Y
      Correct a typo in a comment · ad2fdaa8
      Yueh-Hsuan Chiang 提交于
      Summary: Correct a typo in a comment
      
      Test Plan: No code change.
      
      Reviewers: sdong, kradhakrishnan, IslamAbdelRahman
      
      Reviewed By: kradhakrishnan, IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55803
      ad2fdaa8
    • Y
      Fix data race issue when sub-compaction is used in CompactionJob · be9816b3
      Yueh-Hsuan Chiang 提交于
      Summary:
      When subcompaction is used, all subcompactions share the same Compaction
      pointer in CompactionJob while each subcompaction all keeps their mutable
      stats in SubcompactionState.  However, there're still some mutable part
      that is currently store in the shared Compaction pointer.
      
      This patch makes two changes:
      
      1. Make the shared Compaction pointer const so that it can never be modified
         during the compaction.
      2. Move necessary states from Compaction to SubcompactionState.
      3. Make functions of Compaction const if the function does not modify
         its internal state.
      
      Test Plan: rocksdb and MyRocks test
      
      Reviewers: sdong, kradhakrishnan, andrewkr, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, yoshinorim, gunnarku, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55923
      be9816b3
  7. 24 3月, 2016 4 次提交
    • I
      Merge pull request #1050 from yuslepukhin/support_db_test2 · e3802531
      Islam AbdelRahman 提交于
      Add support for db_test2 for dev and CI runs
      e3802531
    • D
      Add support for db_test2 for dev and CI runs · e7cc49cb
      Dmitri Smirnov 提交于
      e7cc49cb
    • S
      Add comments to perf_context skip counters · 3996770d
      sdong 提交于
      Summary: Document the skipped counters in perf context more clearly.
      
      Test Plan: Comment only.
      
      Reviewers: IslamAbdelRahman, yhchiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55833
      3996770d
    • M
      Make WritableFileWrapper not screw up preallocation · 4e85b747
      Mike Kolupaev 提交于
      Summary:
      Without this diff, this is what happens to compaction output file if it's a subclass of WritableFileWrapper:
      - during compaction, all `PrepareWrite()` calls update `last_preallocated_block_` of the `WritableFileWrapper` itself, not of `target_`, since `PrepareWrite()` is not virtual,
      - `PrepareWrite()` calls `Allocate()`, which is virtual; it does `fallocate()` on `target_`,
      - after writing data, `target_->Close()` calls `GetPreallocationStatus()` of `target_`; it returns `last_preallocated_block_` of `target_`, which is zero because it was never touched before,
      - `target_->Close()` doesn't call `ftruncate()`; file remains big.
      
      This diff fixes it in a straightforward way, by making the methods virtual. `WritableFileWrapper` ends up having the useless fields `last_preallocated_block_` and `preallocation_block_size_`. I think ideally the preallocation logic should be outside `WritableFile`, the same way as `log_writer.h` and `file_reader_writer.h` moved some non-platform-specific logic out of Env, but that's probably not worth the effort now.
      
      Test Plan: `make -j check`; I'm going to deploy it on our test tier and see if it fixes space reclamation problem there
      
      Reviewers: yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D54681
      4e85b747
  8. 23 3月, 2016 6 次提交
  9. 22 3月, 2016 3 次提交
  10. 19 3月, 2016 5 次提交
    • M
      Fix failing Java unit test. · d7ae42b0
      Marton Trencseni 提交于
      Test Plan: sent diff to sdong, passes :)
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55749
      d7ae42b0
    • A
      Add test for Snapshot 0 · fbbb8a61
      agiardullo 提交于
      Summary:
      I ran into this assert when stress testing transactions.  It's pretty easy to repro.
      
      Changing VersionSet::last_sequence_ to start at 1 seems pretty straightforward.  We would just need to change the 4 callers of SetLastSequence(), including recovery code.  I'd make this change myself, but I do not have enough time to test changes to recovery code-paths this week.  But checking in this test case (disabled) for future fixing.
      
      Test Plan: n/a
      
      Reviewers: yhchiang, kradhakrishnan, andrewkr, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55311
      fbbb8a61
    • A
      Add unit tests for RepairDB · e182f03c
      Andrew Kryczka 提交于
      Summary:
      Basic test cases:
      
      - Manifest is lost or corrupt
      - Manifest refers to too many or too few SST files
      - SST file is corrupt
      - Unflushed data is present when RepairDB is called
      
      Depends on D55065 for its CreateFile() function in file_utils
      
      Test Plan: Ran the tests.
      
      Reviewers: IslamAbdelRahman, yhchiang, yoshinorim, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55485
      e182f03c
    • P
      travis build fixes · 7d371863
      Praveen Rao 提交于
      7d371863
    • K
      Merge pull request #1042 from SherlockNoMad/HistFix · fbea4dc6
      Karthikeyan Radhakrishnan 提交于
      Fix in HistogramWindowingImpl
      fbea4dc6