1. 14 9月, 2019 1 次提交
  2. 13 9月, 2019 3 次提交
    • L
      Add insert hints for each writebatch (#5728) · 1a928c22
      Lingjing You 提交于
      Summary:
      Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.
      
      Bench result (qps):
      
      `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4`
      
      master:
      
      | batch size \ thread num | 1       | 2       | 4       | 8       |
      | ----------------------- | ------- | ------- | ------- | ------- |
      | 1                       | 387883  | 220790  | 308294  | 490998  |
      | 10                      | 1397208 | 978911  | 1275684 | 1733395 |
      | 100                     | 2045414 | 1589927 | 1798782 | 2681039 |
      | 1000                    | 2228038 | 1698252 | 1839877 | 2863490 |
      
      fillseq with writebatch hint:
      
      | batch size \ thread num | 1       | 2       | 4       | 8       |
      | ----------------------- | ------- | ------- | ------- | ------- |
      | 1                       | 286005  | 223570  | 300024  | 466981  |
      | 10                      | 970374  | 813308  | 1399299 | 1753588 |
      | 100                     | 1962768 | 1983023 | 2676577 | 3086426 |
      | 1000                    | 2195853 | 2676782 | 3231048 | 3638143 |
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728
      
      Differential Revision: D17297240
      
      fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
      1a928c22
    • H
      arm64 crc prefetch optimise (#5773) · a378a4c2
      HouBingjian 提交于
      Summary:
      prefetch data for following block,avoid cache miss when doing crc caculate
      
      I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz)
      ./db_bench --benchmarks=crc32c --block_size=500000000
      before optimise : 587313.500 micros/op 1 ops/sec;  811.9 MB/s (500000000 per op)
      after optimise  : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773
      
      Differential Revision: D17347339
      
      fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341
      a378a4c2
    • L
      Temporarily disable hash index in stress tests (#5792) · d35ffd56
      Levi Tamasi 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
      tests, resulting in assertion failures in Block. This patch disables
      the hash index until we can pinpoint the root cause of these issues.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792
      
      Test Plan:
      Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
      (binary search and partitioned index).
      
      Differential Revision: D17346777
      
      Pulled By: ltamasi
      
      fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
      d35ffd56
  3. 12 9月, 2019 7 次提交
    • A
      Fix RocksDB bug in block_cache_trace_analyzer.cc on Windows (#5786) · e8c2e68b
      Adam Retter 提交于
      Summary:
      This is required to compile on Windows with Visual Studio 2015.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786
      
      Differential Revision: D17335994
      
      fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011
      e8c2e68b
    • R
      Option to make write group size configurable (#5759) · d05c0fe4
      Ronak Sisodia 提交于
      Summary:
      The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759
      
      Differential Revision: D17329298
      
      fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5
      d05c0fe4
    • S
      Use delete to disable automatic generated methods. (#5009) · 9eb3e1f7
      Shylock Hg 提交于
      Summary:
      Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009
      
      Differential Revision: D17288733
      
      fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3
      9eb3e1f7
    • W
      record the timestamp on first configure (#4799) · fcda80fc
      Wilfried Goesgens 提交于
      Summary:
      cmake doesn't re-generate the timestamp on subsequent builds causing rebuilds of the lib
      
      This improves compile time turn-arounds if you have rocksdb as a compileable library include, since with the state its now it will re-generate the time stamp .cc file each time you build, and thus re-compile + re-link the rocksdb library though anything in the source actually changed.
      The original timestamp is recorded into `CMakeCache.txt` and will remain there until you flush this cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4799
      
      Differential Revision: D17290040
      
      fbshipit-source-id: 28357fef3422693c9c19e88fa2873c8db0f662ed
      fcda80fc
    • A
      Support partitioned index and filters in stress/crash tests (#4020) · dd2a35f1
      Andrew Kryczka 提交于
      Summary:
      - In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test
      - When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020
      
      Test Plan:
      currently this is blocked on fixing the bug that crash test caught:
      
      ```
      $ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
      ...
      Verification failed for column family 0 key 937501: Value not found: NotFound:
      Crash-recovery verification failed :(
      ```
      
      Differential Revision: D8508683
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629
      dd2a35f1
    • A
      Avoid clock_gettime on pre-10.12 macOS versions (#5570) · 20dd828c
      Andrew Kryczka 提交于
      Summary:
      On older macOS like 10.10 we saw the following compiler error:
      
      ```
      /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/env/env_posix.cc:845:19:
      error: use of undeclared identifier 'CLOCK_THREAD_CPUTIME_ID'
          clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);
                        ^
      ```
      
      According to mac's `man clock_gettime`: "These functions first appeared in Mac
      OSX 10.12". So we should not try to compile it on earlier versions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5570
      
      Test Plan:
      verified it compiles now on 10.10. Also did some investigation to
      ensure it does not cause regression on macOS 10.12+, although I do not
      have access to such an environment to really test.
      
      Differential Revision: D17322629
      
      Pulled By: riversand963
      
      fbshipit-source-id: e0a412223854f826b4d83e6d15c3739ff4620d7d
      20dd828c
    • T
      test size was wrong in 'fillbatch' benchmark (#5198) · c85c87a7
      tongyingrui 提交于
      Summary:
      for fillbatch benchmar, the numEntries should be [num_] but not [num_ / 1000] because numEntries is just the total entries we want to test
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5198
      
      Differential Revision: D17274664
      
      Pulled By: anand1976
      
      fbshipit-source-id: f96e952babdbac63fb99d14e1254d478a10437be
      c85c87a7
  4. 11 9月, 2019 3 次提交
  5. 10 9月, 2019 4 次提交
  6. 07 9月, 2019 4 次提交
  7. 06 9月, 2019 9 次提交
    • Fix WriteBatchWithIndex with MergeOperator bug (#5577) · 533e4770
      奏之章 提交于
      Summary:
      ```
      TEST_F(WriteBatchWithIndexTest, TestGetFromBatchAndDBMerge3) {
        DB* db;
        Options options;
      
        options.create_if_missing = true;
        std::string dbname = test::PerThreadDBPath("write_batch_with_index_test");
      
        options.merge_operator = MergeOperators::CreateFromStringId("stringappend");
      
        DestroyDB(dbname, options);
        Status s = DB::Open(options, dbname, &db);
        assert(s.ok());
      
        ReadOptions read_options;
        WriteOptions write_options;
        FlushOptions flush_options;
        std::string value;
      
        WriteBatchWithIndex batch;
      
        ASSERT_OK(db->Put(write_options, "A", "1"));
        ASSERT_OK(db->Flush(flush_options, db->DefaultColumnFamily()));
        ASSERT_OK(batch.Merge("A", "2"));
      
        ASSERT_OK(batch.GetFromBatchAndDB(db, read_options, "A", &value));
        ASSERT_EQ(value, "1,2");
      
        delete db;
        DestroyDB(dbname, options);
      }
      ```
      Fix ASSERT in batch.GetFromBatchAndDB()
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5577
      
      Differential Revision: D16379847
      
      fbshipit-source-id: b1320e24ec8e71350c525083cc0a16180a63f752
      533e4770
    • R
      Fixed FALLOC_FL_KEEP_SIZE undefined (#5614) · cfc20019
      Richard He 提交于
      Summary:
      Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614
      
      Differential Revision: D17217960
      
      fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c
      cfc20019
    • J
      Initialized pinned_pos_ and pinned_seq_pos_ in FragmentedRangeTombstoneIterator (#5720) · eae9f040
      Jeffrey Xiao 提交于
      Summary:
      These uninitialized member variables can cause a key to not be pinned when it should be, causing erroneous behavior. For example ingesting a file with range deletion tombstones will yield an "external file have corrupted keys" on a Mac.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5720
      
      Differential Revision: D17217673
      
      fbshipit-source-id: cd7df7ce3ad9cf69c841c4d3dc6fd144eff9e212
      eae9f040
    • Y
      Fix EncryptedEnv assert (#5735) · 83b99192
      Yi Wu 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735
      
      Test Plan:
      run `ENCRYPTED_ENV=1 ./db_encryption_test`
      Signed-off-by: NYi Wu <yiwu@pingcap.com>
      
      Differential Revision: D17133849
      
      fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0
      83b99192
    • A
      remove unused #include to fix musl libc build (#5583) · 43a5cdb5
      Andrew Kryczka 提交于
      Summary:
      The `#include "core_local.h"` was pulling in libgcc's `posix_memalign()`
      declaration. That declaration specifies `throw()` whereas musl libc's
      declaration does not. This was leading to the following compiler error
      when using musl libc:
      
      ```
      In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/jemalloc_helper.h:26:0,
                       from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.h:11,
                       from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.cc:6:
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: error: declaration of 'int posix_memalign(void**, size_t, size_t) throw ()' has a different exception specifier
       #  define je_posix_memalign posix_memalign
                                   ^
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: note: from previous declaration 'int posix_memalign(void**, size_t, size_t)'
       #  define je_posix_memalign posix_memalign
                                   ^
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:202:38: note: in expansion of macro 'je_posix_memalign'
       JEMALLOC_EXPORT int JEMALLOC_NOTHROW je_posix_memalign(void **memptr,
                                            ^~~~~~~~~~~~~~~~~
      make[4]: *** [CMakeFiles/rocksdb.dir/util/jemalloc_nodump_allocator.cc.o] Error 1
      ```
      
      Since `#include "core_local.h"` is not actually used, we can just remove
      it. I verified that fixes the build.
      
      There was a related PR here (https://github.com/facebook/rocksdb/issues/2188), although the problem description is
      slightly different.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5583
      
      Differential Revision: D16343227
      
      fbshipit-source-id: 0386bc2b5fd55b2c3b5fba19382014efa52e44f8
      43a5cdb5
    • H
      bloom test check fail on arm (#5745) · ac97e693
      HouBingjian 提交于
      Summary:
      FullFilterBitsBuilder::CalculateSpace use CACHE_LINE_SIZE which is 64@X86 but 128@ARM64
      when it run bloom_test.FullVaryingLengths it failed on ARM64 server,
      the assert can be fixed by change  128->CACHE_LINE_SIZE*2 as merged
      ASSERT_LE(FilterSize(), (size_t)((length * 10 / 8) + CACHE_LINE_SIZE * 2 + 5)) << length;
      
      run  bloom_test
      before fix:
      /root/rocksdb-master/util/bloom_test.cc:281: Failure
      Expected: (FilterSize()) <= ((size_t)((length * 10 / 8) + 128 + 5)), actual: 389 vs 383
      200
      [  FAILED  ] FullBloomTest.FullVaryingLengths (32 ms)
      [----------] 4 tests from FullBloomTest (32 ms total)
      
      [----------] Global test environment tear-down
      [==========] 7 tests from 2 test cases ran. (116 ms total)
      [  PASSED  ] 6 tests.
      [  FAILED  ] 1 test, listed below:
      [  FAILED  ] FullBloomTest.FullVaryingLengths
      
      after fix:
      Filters: 37 good, 0 mediocre
      [       OK ] FullBloomTest.FullVaryingLengths (90 ms)
      [----------] 4 tests from FullBloomTest (90 ms total)
      
      [----------] Global test environment tear-down
      [==========] 7 tests from 2 test cases ran. (174 ms total)
      [  PASSED  ] 7 tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5745
      
      Differential Revision: D17076047
      
      fbshipit-source-id: e7beb5d55d4855fceb2b84bc8119a6b0759de635
      ac97e693
    • P
      Faster new DynamicBloom implementation (for memtable) (#5762) · b55b2f45
      Peter Dillinger 提交于
      Summary:
      Since DynamicBloom is now only used in-memory, we're free to
      change it without schema compatibility issues. The new implementation
      is drawn from (with manifest permission)
      https://github.com/pdillinger/wormhashing/blob/303542a767437f56d8b66cea6ebecaac0e6a61e9/bloom_simulation_tests/foo.cc#L613
      
      This has several speed advantages over the prior implementation:
      * Uses fastrange instead of %
      * Minimum logic to determine first (and all) probed memory addresses
      * (Major) Two probes per 64-bit memory fetch/write.
      * Very fast and effective (murmur-like) hash expansion/re-mixing. (At
      least on recent CPUs, integer multiplication is very cheap.)
      
      While a Bloom filter with 512-bit cache locality has about a 1.15x FP
      rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes
      per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to
      1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples,
      unlike the old implementation with more erratic FP rates.
      
      Especially for the memtable, we expect speed to outweigh somewhat higher
      FP rates. For example, a negative table query would have to be 1000x
      slower than a BF query to justify doubling BF query time to shave 10% off
      FP rate (working assumption around 1% FP rate). While that seems likely
      for SSTs, my data suggests a speed factor of roughly 50x for the memtable
      (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom
      filter, after this change).  Thus, it's probably not worth even 5% more
      time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1%
      in absolute terms, and it's probably at least 20% slower to recoup that
      much FP rate from this new implementation. Because of this, we do not see
      a need for a 'locality' option that affects the MemTable Bloom filter
      and have decoupled the MemTable Bloom filter from Options::bloom_locality.
      
      Note that just 3% more memory to the Bloom filter (10.3 bits per key vs.
      just 10) is able to make up for the ~12% FP rate drop in the new
      implementation:
      
      [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ...
      
      [] # Close match to this new implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ...
      
      [] # Old locality=1 implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ...
      
      Also note the dramatic speed improvement vs. alternatives.
      
      --
      
      Performance unit test: DynamicBloomTest.concurrent_with_perf is updated
      to report more precise timing data. (Measure running time of each
      thread, not just longest running thread, etc.) Results averaged over
      various sizes enabled with --enable_perf and 20 runs each; old dynamic
      bloom refers to locality=1, the faster of the old:
      
      old dynamic bloom, avg add latency = 65.6468
      new dynamic bloom, avg add latency = 44.3809
      old dynamic bloom, avg query latency = 50.6485
      new dynamic bloom, avg query latency = 43.2186
      old avg parallel add latency = 41.678
      new avg parallel add latency = 24.5238
      old avg parallel hit latency = 14.6322
      new avg parallel hit latency = 12.3939
      old avg parallel miss latency = 16.7289
      new avg parallel miss latency = 12.2134
      
      Tested on a dedicated 64-bit production machine at Facebook. Significant
      improvement all around.
      
      Despite now using std::atomic<uint64_t>, quick before-and-after test on
      a 32-bit machine (Intel Atom N270, released 2008) shows no regression in
      performance, in some cases modest improvement.
      
      --
      
      Performance integration test (synthetic): with DEBUG_LEVEL=0, used
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000
      and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01
      300 runs each configuration.
      
      Write throughput change by enabling memtable bloom:
      Old locality=0: -3.06%
      Old locality=1: -2.37%
      New:            -1.50%
      conclusion -> seems to substantially close the gap
      
      Readmissing throughput change by enabling memtable bloom:
      Old locality=0: +34.47%
      Old locality=1: +34.80%
      New:            +33.25%
      conclusion -> maybe a small new penalty from FP rate
      
      Readrandom throughput change by enabling memtable bloom:
      Old locality=0: +31.54%
      Old locality=1: +31.13%
      New:            +30.60%
      conclusion -> maybe also from FP rate (after memtable flush)
      
      --
      
      Another conclusion we can draw from this new implementation is that the
      existing 32-bit hash function is not inherently crippling the Bloom
      filter speed or accuracy, below about 5 million keys. For speed, the
      implementation is essentially the same whether starting with 32-bits or
      64-bits of hash; it just determines whether the first multiplication
      after fastrange is a pseudorandom expansion or needed re-mix. Note that
      this multiplication can occur while memory is fetching.
      
      For accuracy, in a standard configuration, you need about 5 million
      keys before you have about a 1.1x FP penalty due to using a
      32-bit hash vs. 64-bit:
      
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ...
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762
      
      Differential Revision: D17214194
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595
      b55b2f45
    • J
      use c++17's try_emplace if available (#5696) · 19e8c9b6
      jsteemann 提交于
      Summary:
      This avoids rehashing the key in TrackKey() in case the key is not already
      in the map of tracked keys, which will happen at least once per key used in a
      transaction.
      
      Additionally fix two typos.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696
      
      Differential Revision: D17210178
      
      Pulled By: lth
      
      fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf
      19e8c9b6
    • P
      Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767) · 20dec140
      Peter Dillinger 提交于
      Summary:
      DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767
      
      Differential Revision: D17206963
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553
      20dec140
  8. 05 9月, 2019 4 次提交
    • E
      fix checking the '-march' flag (#5766) · 3f2723a8
      ENDOH takanao 提交于
      Summary:
      Hi! guys,
      
      I got errors on the ARM machine.
      
      before:
      
      ```console
      $ make static_lib
      ...
      g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto'
      g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native
      ```
      
      Thanks!
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766
      
      Differential Revision: D17191117
      
      fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868
      3f2723a8
    • M
      Add a unit test to detect infinite loops with reseek optimizations (#5727) · f9fb9f14
      Maysam Yabandeh 提交于
      Summary:
      Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727
      
      Differential Revision: D16952759
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51
      f9fb9f14
    • A
      Adding DB::GetCurrentWalFile() API as a repliction/backup helper (#5765) · 229e6fbe
      Affan Dar 提交于
      Summary:
      Adding a light weight API to get last live WAL file name and size. Meant to be used as a helper for backup/restore tooling in a larger ecosystem such as MySQL with a MyRocks storage engine.
      
      Specifically within MySQL's backup/restore mechanism, this call can be made with a write lock on the mysql db to get a transactionally consistent snapshot of the current WAL file position along with other non-rocksdb log/data files.
      
      Without this, the alternative would be to take the aforementioned lock, scan the WAL dir for all files, find the last file and note its exact size as the rocksdb 'checkpoint'.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5765
      
      Differential Revision: D17172717
      
      Pulled By: affandar
      
      fbshipit-source-id: f2fabafd4c0e6fc45f126670c8c88a9f84cb8a37
      229e6fbe
    • Y
      Replace named comparator struct with lambda (#5768) · 38b17ecd
      Yanqin Jin 提交于
      Summary:
      Tiny code mod: replace a named comparator struct with anonymous lambda.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5768
      
      Differential Revision: D17185141
      
      Pulled By: riversand963
      
      fbshipit-source-id: fabe367649931c33a39ad035dc707d2efc3ad5fc
      38b17ecd
  9. 04 9月, 2019 1 次提交
  10. 03 9月, 2019 1 次提交
    • V
      Persistent globally unique DB ID in manifest (#5725) · 979fbdc6
      Vijay Nadimpalli 提交于
      Summary:
      Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong.
      The goal of this PR is solve this problem by moving the  DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725
      
      Test Plan: Added unit tests.
      
      Differential Revision: D16963840
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9
      979fbdc6
  11. 31 8月, 2019 2 次提交
    • Y
      Fix a bug in file ingestion (#5760) · 44eca41a
      Yanqin Jin 提交于
      Summary:
      Before this PR, when the number of column families involved in a file ingestion exceeds 2, a bug in the looping logic prevents correct file number being assigned to each ingestion job.
      Also skip deleting non-existing hard links during cleanup-after-failure.
      
      Test plan (devserver)
      ```
      $COMPILE_WITH_ASAN=1 make all
      $./external_sst_file_test --gtest_filter=ExternalSSTFileTest/ExternalSSTFileTest.IngestFilesIntoMultipleColumnFamilies_*/*
      $makke check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5760
      
      Differential Revision: D17142982
      
      Pulled By: riversand963
      
      fbshipit-source-id: 06c1847a4e7a402647bcf28d124e70f2a0f9daf6
      44eca41a
    • Y
      Fix assertion failure in FIFO compaction with TTL (#5754) · 672befea
      Yanqin Jin 提交于
      Summary:
      Before this PR, the following sequence of events can cause assertion failure as shown below.
      Stack trace (partial):
      ```
      (gdb) bt
      2  0x00007f59b350ad15 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:92
      3  0x00007f59b350adc3 in __GI___assert_fail (assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:101
      4  0x0000000000492ccd in rocksdb::Compaction::MarkFilesBeingCompacted (this=<optimized out>, mark_as_compacted=<optimized out>) at db/compaction/compaction.cc:394
      5  0x000000000049467a in rocksdb::Compaction::Compaction (this=0x7f59af013000, vstorage=0x7f581af53030, _immutable_cf_options=..., _mutable_cf_options=..., _inputs=..., _output_level=<optimized out>, _target_file_size=0, _max_compaction_bytes=0, _output_path_id=0, _compression=<incomplete type>, _compression_opts=..., _max_subcompactions=0, _grandparents=..., _manual_compaction=false, _score=4, _deletion_compaction=true, _compaction_reason=rocksdb::CompactionReason::kFIFOTtl) at db/compaction/compaction.cc:241
      6  0x00000000004af9bc in rocksdb::FIFOCompactionPicker::PickTTLCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:101
      7  0x00000000004b0771 in rocksdb::FIFOCompactionPicker::PickCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:201
      8  0x00000000004838cc in rocksdb::ColumnFamilyData::PickCompaction (this=this@entry=0x7f59b31b3700, mutable_options=..., log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/column_family.cc:933
      9  0x00000000004f3645 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x7f59b3176000, made_progress=made_progress@entry=0x7f59b1bfa6bf, job_context=job_context@entry=0x7f59b1bfa760, log_buffer=log_buffer@entry=0x7f59b1bfa930, prepicked_compaction=prepicked_compaction@entry=0x0, thread_pri=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2541
      10 0x00000000004f5e2a in rocksdb::DBImpl::BackgroundCallCompaction (this=this@entry=0x7f59b3176000, prepicked_compaction=prepicked_compaction@entry=0x0, bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2312
      11 0x00000000004f648e in rocksdb::DBImpl::BGWorkCompaction (arg=<optimized out>) at db/db_impl/db_impl_compaction_flush.cc:2087
      ```
      This can be caused by the following sequence of events.
      ```
      Time
      |      thr          bg_compact_thr1                     bg_compact_thr2
      |      write
      |      flush
      |                   mark all l0 as being compacted
      |      write
      |      flush
      |                   add cf to queue again
      |                                                       mark all l0 as being
      |                                                       compacted, fail the
      |                                                       assertion
      V
      ```
      Test plan (on devserver)
      Since bg_compact_thr1 and bg_compact_thr2 are two threads executing the same
      code, it is difficult to use sync point dependency to
      coordinate their execution. Therefore, I choose to use db_stress.
      ```
      $TEST_TMPDIR=/dev/shm/rocksdb ./db_stress --periodic_compaction_seconds=1 --max_background_compactions=20 --format_version=2 --memtablerep=skip_list --max_write_buffer_number=3 --cache_index_and_filter_blocks=1 --reopen=20 --recycle_log_file_num=0 --acquire_snapshot_one_in=10000 --delpercent=4 --log2_keys_per_lock=22 --compaction_ttl=1 --block_size=16384 --use_multiget=1 --compact_files_one_in=1000000 --target_file_size_multiplier=2 --clear_column_family_one_in=0 --max_bytes_for_level_base=10485760 --use_full_merge_v1=1 --target_file_size_base=2097152 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=zstd --writepercent=35 --readpercent=45 --subcompactions=4 --use_merge=0 --write_buffer_size=4194304 --test_batches_snapshots=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --use_direct_reads=0 --compact_range_one_in=1000000 --open_files=-1 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=0 --nooverwritepercent=1 --compression_max_dict_bytes=0 --max_key=1000000 --prefixpercent=5 --flush_one_in=1000000 --ops_per_thread=40000 --index_block_restart_interval=7 --cache_size=1048576 --compaction_style=2 --verify_checksum=1 --delrangepercent=1 --use_direct_io_for_flush_and_compaction=0
      ```
      This should see no assertion failure.
      Last but not least,
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      $make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5754
      
      Differential Revision: D17109791
      
      Pulled By: riversand963
      
      fbshipit-source-id: 25fc46101235add158554e096540b72c324be078
      672befea
  12. 30 8月, 2019 1 次提交
    • P
      Adopt Contributor Covenant · 9a449865
      Paul O'Shannessy 提交于
      Summary:
      In order to foster healthy open source communities, we're adopting the
      [Contributor Covenant](https://www.contributor-covenant.org/). It has been
      built by open source community members and represents a shared understanding of
      what is expected from a healthy community.
      
      Reviewed By: josephsavona, danobi, rdzhabarov
      
      Differential Revision: D17104640
      
      fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49
      9a449865