1. 10 9月, 2019 3 次提交
  2. 07 9月, 2019 4 次提交
  3. 06 9月, 2019 9 次提交
    • Fix WriteBatchWithIndex with MergeOperator bug (#5577) · 533e4770
      奏之章 提交于
      Summary:
      ```
      TEST_F(WriteBatchWithIndexTest, TestGetFromBatchAndDBMerge3) {
        DB* db;
        Options options;
      
        options.create_if_missing = true;
        std::string dbname = test::PerThreadDBPath("write_batch_with_index_test");
      
        options.merge_operator = MergeOperators::CreateFromStringId("stringappend");
      
        DestroyDB(dbname, options);
        Status s = DB::Open(options, dbname, &db);
        assert(s.ok());
      
        ReadOptions read_options;
        WriteOptions write_options;
        FlushOptions flush_options;
        std::string value;
      
        WriteBatchWithIndex batch;
      
        ASSERT_OK(db->Put(write_options, "A", "1"));
        ASSERT_OK(db->Flush(flush_options, db->DefaultColumnFamily()));
        ASSERT_OK(batch.Merge("A", "2"));
      
        ASSERT_OK(batch.GetFromBatchAndDB(db, read_options, "A", &value));
        ASSERT_EQ(value, "1,2");
      
        delete db;
        DestroyDB(dbname, options);
      }
      ```
      Fix ASSERT in batch.GetFromBatchAndDB()
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5577
      
      Differential Revision: D16379847
      
      fbshipit-source-id: b1320e24ec8e71350c525083cc0a16180a63f752
      533e4770
    • R
      Fixed FALLOC_FL_KEEP_SIZE undefined (#5614) · cfc20019
      Richard He 提交于
      Summary:
      Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614
      
      Differential Revision: D17217960
      
      fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c
      cfc20019
    • J
      Initialized pinned_pos_ and pinned_seq_pos_ in FragmentedRangeTombstoneIterator (#5720) · eae9f040
      Jeffrey Xiao 提交于
      Summary:
      These uninitialized member variables can cause a key to not be pinned when it should be, causing erroneous behavior. For example ingesting a file with range deletion tombstones will yield an "external file have corrupted keys" on a Mac.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5720
      
      Differential Revision: D17217673
      
      fbshipit-source-id: cd7df7ce3ad9cf69c841c4d3dc6fd144eff9e212
      eae9f040
    • Y
      Fix EncryptedEnv assert (#5735) · 83b99192
      Yi Wu 提交于
      Summary:
      Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735
      
      Test Plan:
      run `ENCRYPTED_ENV=1 ./db_encryption_test`
      Signed-off-by: NYi Wu <yiwu@pingcap.com>
      
      Differential Revision: D17133849
      
      fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0
      83b99192
    • A
      remove unused #include to fix musl libc build (#5583) · 43a5cdb5
      Andrew Kryczka 提交于
      Summary:
      The `#include "core_local.h"` was pulling in libgcc's `posix_memalign()`
      declaration. That declaration specifies `throw()` whereas musl libc's
      declaration does not. This was leading to the following compiler error
      when using musl libc:
      
      ```
      In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/jemalloc_helper.h:26:0,
                       from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.h:11,
                       from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.cc:6:
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: error: declaration of 'int posix_memalign(void**, size_t, size_t) throw ()' has a different exception specifier
       #  define je_posix_memalign posix_memalign
                                   ^
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: note: from previous declaration 'int posix_memalign(void**, size_t, size_t)'
       #  define je_posix_memalign posix_memalign
                                   ^
      /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:202:38: note: in expansion of macro 'je_posix_memalign'
       JEMALLOC_EXPORT int JEMALLOC_NOTHROW je_posix_memalign(void **memptr,
                                            ^~~~~~~~~~~~~~~~~
      make[4]: *** [CMakeFiles/rocksdb.dir/util/jemalloc_nodump_allocator.cc.o] Error 1
      ```
      
      Since `#include "core_local.h"` is not actually used, we can just remove
      it. I verified that fixes the build.
      
      There was a related PR here (https://github.com/facebook/rocksdb/issues/2188), although the problem description is
      slightly different.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5583
      
      Differential Revision: D16343227
      
      fbshipit-source-id: 0386bc2b5fd55b2c3b5fba19382014efa52e44f8
      43a5cdb5
    • H
      bloom test check fail on arm (#5745) · ac97e693
      HouBingjian 提交于
      Summary:
      FullFilterBitsBuilder::CalculateSpace use CACHE_LINE_SIZE which is 64@X86 but 128@ARM64
      when it run bloom_test.FullVaryingLengths it failed on ARM64 server,
      the assert can be fixed by change  128->CACHE_LINE_SIZE*2 as merged
      ASSERT_LE(FilterSize(), (size_t)((length * 10 / 8) + CACHE_LINE_SIZE * 2 + 5)) << length;
      
      run  bloom_test
      before fix:
      /root/rocksdb-master/util/bloom_test.cc:281: Failure
      Expected: (FilterSize()) <= ((size_t)((length * 10 / 8) + 128 + 5)), actual: 389 vs 383
      200
      [  FAILED  ] FullBloomTest.FullVaryingLengths (32 ms)
      [----------] 4 tests from FullBloomTest (32 ms total)
      
      [----------] Global test environment tear-down
      [==========] 7 tests from 2 test cases ran. (116 ms total)
      [  PASSED  ] 6 tests.
      [  FAILED  ] 1 test, listed below:
      [  FAILED  ] FullBloomTest.FullVaryingLengths
      
      after fix:
      Filters: 37 good, 0 mediocre
      [       OK ] FullBloomTest.FullVaryingLengths (90 ms)
      [----------] 4 tests from FullBloomTest (90 ms total)
      
      [----------] Global test environment tear-down
      [==========] 7 tests from 2 test cases ran. (174 ms total)
      [  PASSED  ] 7 tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5745
      
      Differential Revision: D17076047
      
      fbshipit-source-id: e7beb5d55d4855fceb2b84bc8119a6b0759de635
      ac97e693
    • P
      Faster new DynamicBloom implementation (for memtable) (#5762) · b55b2f45
      Peter Dillinger 提交于
      Summary:
      Since DynamicBloom is now only used in-memory, we're free to
      change it without schema compatibility issues. The new implementation
      is drawn from (with manifest permission)
      https://github.com/pdillinger/wormhashing/blob/303542a767437f56d8b66cea6ebecaac0e6a61e9/bloom_simulation_tests/foo.cc#L613
      
      This has several speed advantages over the prior implementation:
      * Uses fastrange instead of %
      * Minimum logic to determine first (and all) probed memory addresses
      * (Major) Two probes per 64-bit memory fetch/write.
      * Very fast and effective (murmur-like) hash expansion/re-mixing. (At
      least on recent CPUs, integer multiplication is very cheap.)
      
      While a Bloom filter with 512-bit cache locality has about a 1.15x FP
      rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes
      per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to
      1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples,
      unlike the old implementation with more erratic FP rates.
      
      Especially for the memtable, we expect speed to outweigh somewhat higher
      FP rates. For example, a negative table query would have to be 1000x
      slower than a BF query to justify doubling BF query time to shave 10% off
      FP rate (working assumption around 1% FP rate). While that seems likely
      for SSTs, my data suggests a speed factor of roughly 50x for the memtable
      (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom
      filter, after this change).  Thus, it's probably not worth even 5% more
      time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1%
      in absolute terms, and it's probably at least 20% slower to recoup that
      much FP rate from this new implementation. Because of this, we do not see
      a need for a 'locality' option that affects the MemTable Bloom filter
      and have decoupled the MemTable Bloom filter from Options::bloom_locality.
      
      Note that just 3% more memory to the Bloom filter (10.3 bits per key vs.
      just 10) is able to make up for the ~12% FP rate drop in the new
      implementation:
      
      [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ...
      
      [] # Close match to this new implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ...
      
      [] # Old locality=1 implementation
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ...
      
      Also note the dramatic speed improvement vs. alternatives.
      
      --
      
      Performance unit test: DynamicBloomTest.concurrent_with_perf is updated
      to report more precise timing data. (Measure running time of each
      thread, not just longest running thread, etc.) Results averaged over
      various sizes enabled with --enable_perf and 20 runs each; old dynamic
      bloom refers to locality=1, the faster of the old:
      
      old dynamic bloom, avg add latency = 65.6468
      new dynamic bloom, avg add latency = 44.3809
      old dynamic bloom, avg query latency = 50.6485
      new dynamic bloom, avg query latency = 43.2186
      old avg parallel add latency = 41.678
      new avg parallel add latency = 24.5238
      old avg parallel hit latency = 14.6322
      new avg parallel hit latency = 12.3939
      old avg parallel miss latency = 16.7289
      new avg parallel miss latency = 12.2134
      
      Tested on a dedicated 64-bit production machine at Facebook. Significant
      improvement all around.
      
      Despite now using std::atomic<uint64_t>, quick before-and-after test on
      a 32-bit machine (Intel Atom N270, released 2008) shows no regression in
      performance, in some cases modest improvement.
      
      --
      
      Performance integration test (synthetic): with DEBUG_LEVEL=0, used
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000
      and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01
      300 runs each configuration.
      
      Write throughput change by enabling memtable bloom:
      Old locality=0: -3.06%
      Old locality=1: -2.37%
      New:            -1.50%
      conclusion -> seems to substantially close the gap
      
      Readmissing throughput change by enabling memtable bloom:
      Old locality=0: +34.47%
      Old locality=1: +34.80%
      New:            +33.25%
      conclusion -> maybe a small new penalty from FP rate
      
      Readrandom throughput change by enabling memtable bloom:
      Old locality=0: +31.54%
      Old locality=1: +31.13%
      New:            +30.60%
      conclusion -> maybe also from FP rate (after memtable flush)
      
      --
      
      Another conclusion we can draw from this new implementation is that the
      existing 32-bit hash function is not inherently crippling the Bloom
      filter speed or accuracy, below about 5 million keys. For speed, the
      implementation is essentially the same whether starting with 32-bits or
      64-bits of hash; it just determines whether the first multiplication
      after fastrange is a pseudorandom expansion or needed re-mix. Note that
      this multiplication can occur while memory is fetching.
      
      For accuracy, in a standard configuration, you need about 5 million
      keys before you have about a 1.1x FP penalty due to using a
      32-bit hash vs. 64-bit:
      
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ...
      [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000
      ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762
      
      Differential Revision: D17214194
      
      Pulled By: pdillinger
      
      fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595
      b55b2f45
    • J
      use c++17's try_emplace if available (#5696) · 19e8c9b6
      jsteemann 提交于
      Summary:
      This avoids rehashing the key in TrackKey() in case the key is not already
      in the map of tracked keys, which will happen at least once per key used in a
      transaction.
      
      Additionally fix two typos.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696
      
      Differential Revision: D17210178
      
      Pulled By: lth
      
      fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf
      19e8c9b6
    • P
      Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767) · 20dec140
      Peter Dillinger 提交于
      Summary:
      DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767
      
      Differential Revision: D17206963
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553
      20dec140
  4. 05 9月, 2019 4 次提交
    • E
      fix checking the '-march' flag (#5766) · 3f2723a8
      ENDOH takanao 提交于
      Summary:
      Hi! guys,
      
      I got errors on the ARM machine.
      
      before:
      
      ```console
      $ make static_lib
      ...
      g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto'
      g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native
      ```
      
      Thanks!
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766
      
      Differential Revision: D17191117
      
      fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868
      3f2723a8
    • M
      Add a unit test to detect infinite loops with reseek optimizations (#5727) · f9fb9f14
      Maysam Yabandeh 提交于
      Summary:
      Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727
      
      Differential Revision: D16952759
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51
      f9fb9f14
    • A
      Adding DB::GetCurrentWalFile() API as a repliction/backup helper (#5765) · 229e6fbe
      Affan Dar 提交于
      Summary:
      Adding a light weight API to get last live WAL file name and size. Meant to be used as a helper for backup/restore tooling in a larger ecosystem such as MySQL with a MyRocks storage engine.
      
      Specifically within MySQL's backup/restore mechanism, this call can be made with a write lock on the mysql db to get a transactionally consistent snapshot of the current WAL file position along with other non-rocksdb log/data files.
      
      Without this, the alternative would be to take the aforementioned lock, scan the WAL dir for all files, find the last file and note its exact size as the rocksdb 'checkpoint'.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5765
      
      Differential Revision: D17172717
      
      Pulled By: affandar
      
      fbshipit-source-id: f2fabafd4c0e6fc45f126670c8c88a9f84cb8a37
      229e6fbe
    • Y
      Replace named comparator struct with lambda (#5768) · 38b17ecd
      Yanqin Jin 提交于
      Summary:
      Tiny code mod: replace a named comparator struct with anonymous lambda.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5768
      
      Differential Revision: D17185141
      
      Pulled By: riversand963
      
      fbshipit-source-id: fabe367649931c33a39ad035dc707d2efc3ad5fc
      38b17ecd
  5. 04 9月, 2019 1 次提交
  6. 03 9月, 2019 1 次提交
    • V
      Persistent globally unique DB ID in manifest (#5725) · 979fbdc6
      Vijay Nadimpalli 提交于
      Summary:
      Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong.
      The goal of this PR is solve this problem by moving the  DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725
      
      Test Plan: Added unit tests.
      
      Differential Revision: D16963840
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9
      979fbdc6
  7. 31 8月, 2019 2 次提交
    • Y
      Fix a bug in file ingestion (#5760) · 44eca41a
      Yanqin Jin 提交于
      Summary:
      Before this PR, when the number of column families involved in a file ingestion exceeds 2, a bug in the looping logic prevents correct file number being assigned to each ingestion job.
      Also skip deleting non-existing hard links during cleanup-after-failure.
      
      Test plan (devserver)
      ```
      $COMPILE_WITH_ASAN=1 make all
      $./external_sst_file_test --gtest_filter=ExternalSSTFileTest/ExternalSSTFileTest.IngestFilesIntoMultipleColumnFamilies_*/*
      $makke check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5760
      
      Differential Revision: D17142982
      
      Pulled By: riversand963
      
      fbshipit-source-id: 06c1847a4e7a402647bcf28d124e70f2a0f9daf6
      44eca41a
    • Y
      Fix assertion failure in FIFO compaction with TTL (#5754) · 672befea
      Yanqin Jin 提交于
      Summary:
      Before this PR, the following sequence of events can cause assertion failure as shown below.
      Stack trace (partial):
      ```
      (gdb) bt
      2  0x00007f59b350ad15 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:92
      3  0x00007f59b350adc3 in __GI___assert_fail (assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:101
      4  0x0000000000492ccd in rocksdb::Compaction::MarkFilesBeingCompacted (this=<optimized out>, mark_as_compacted=<optimized out>) at db/compaction/compaction.cc:394
      5  0x000000000049467a in rocksdb::Compaction::Compaction (this=0x7f59af013000, vstorage=0x7f581af53030, _immutable_cf_options=..., _mutable_cf_options=..., _inputs=..., _output_level=<optimized out>, _target_file_size=0, _max_compaction_bytes=0, _output_path_id=0, _compression=<incomplete type>, _compression_opts=..., _max_subcompactions=0, _grandparents=..., _manual_compaction=false, _score=4, _deletion_compaction=true, _compaction_reason=rocksdb::CompactionReason::kFIFOTtl) at db/compaction/compaction.cc:241
      6  0x00000000004af9bc in rocksdb::FIFOCompactionPicker::PickTTLCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:101
      7  0x00000000004b0771 in rocksdb::FIFOCompactionPicker::PickCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:201
      8  0x00000000004838cc in rocksdb::ColumnFamilyData::PickCompaction (this=this@entry=0x7f59b31b3700, mutable_options=..., log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/column_family.cc:933
      9  0x00000000004f3645 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x7f59b3176000, made_progress=made_progress@entry=0x7f59b1bfa6bf, job_context=job_context@entry=0x7f59b1bfa760, log_buffer=log_buffer@entry=0x7f59b1bfa930, prepicked_compaction=prepicked_compaction@entry=0x0, thread_pri=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2541
      10 0x00000000004f5e2a in rocksdb::DBImpl::BackgroundCallCompaction (this=this@entry=0x7f59b3176000, prepicked_compaction=prepicked_compaction@entry=0x0, bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2312
      11 0x00000000004f648e in rocksdb::DBImpl::BGWorkCompaction (arg=<optimized out>) at db/db_impl/db_impl_compaction_flush.cc:2087
      ```
      This can be caused by the following sequence of events.
      ```
      Time
      |      thr          bg_compact_thr1                     bg_compact_thr2
      |      write
      |      flush
      |                   mark all l0 as being compacted
      |      write
      |      flush
      |                   add cf to queue again
      |                                                       mark all l0 as being
      |                                                       compacted, fail the
      |                                                       assertion
      V
      ```
      Test plan (on devserver)
      Since bg_compact_thr1 and bg_compact_thr2 are two threads executing the same
      code, it is difficult to use sync point dependency to
      coordinate their execution. Therefore, I choose to use db_stress.
      ```
      $TEST_TMPDIR=/dev/shm/rocksdb ./db_stress --periodic_compaction_seconds=1 --max_background_compactions=20 --format_version=2 --memtablerep=skip_list --max_write_buffer_number=3 --cache_index_and_filter_blocks=1 --reopen=20 --recycle_log_file_num=0 --acquire_snapshot_one_in=10000 --delpercent=4 --log2_keys_per_lock=22 --compaction_ttl=1 --block_size=16384 --use_multiget=1 --compact_files_one_in=1000000 --target_file_size_multiplier=2 --clear_column_family_one_in=0 --max_bytes_for_level_base=10485760 --use_full_merge_v1=1 --target_file_size_base=2097152 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=zstd --writepercent=35 --readpercent=45 --subcompactions=4 --use_merge=0 --write_buffer_size=4194304 --test_batches_snapshots=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --use_direct_reads=0 --compact_range_one_in=1000000 --open_files=-1 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=0 --nooverwritepercent=1 --compression_max_dict_bytes=0 --max_key=1000000 --prefixpercent=5 --flush_one_in=1000000 --ops_per_thread=40000 --index_block_restart_interval=7 --cache_size=1048576 --compaction_style=2 --verify_checksum=1 --delrangepercent=1 --use_direct_io_for_flush_and_compaction=0
      ```
      This should see no assertion failure.
      Last but not least,
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      $make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5754
      
      Differential Revision: D17109791
      
      Pulled By: riversand963
      
      fbshipit-source-id: 25fc46101235add158554e096540b72c324be078
      672befea
  8. 30 8月, 2019 5 次提交
  9. 29 8月, 2019 1 次提交
    • A
      Support row cache with batched MultiGet (#5706) · e1057033
      anand76 提交于
      Summary:
      This PR adds support for row cache in ```rocksdb::TableCache::MultiGet```.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5706
      
      Test Plan:
      1. Unit tests in db_basic_test
      2. db_bench results with batch size of 2 (```Get``` is faster than ```MultiGet``` for single key) -
      Get -
      readrandom   :       3.935 micros/op 254116 ops/sec;   28.1 MB/s (22870998 of 22870999 found)
      MultiGet -
      multireadrandom :       3.743 micros/op 267190 ops/sec; (24047998 of 24047998 found)
      
      Command used -
      TEST_TMPDIR=/dev/shm/multiget numactl -C 10  ./db_bench -use_existing_db=true -use_existing_keys=false -benchmarks="readtorowcache,[read|multiread]random" -write_buffer_size=16777216 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -row_cache_size=4194304000 -batch_size=2 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=131072
      
      Differential Revision: D17086297
      
      Pulled By: anand1976
      
      fbshipit-source-id: 85784378da913e05f1baf31ec1b4e7c9345e7f57
      e1057033
  10. 28 8月, 2019 2 次提交
  11. 27 8月, 2019 3 次提交
  12. 24 8月, 2019 2 次提交
    • Z
      Refactor trimming logic for immutable memtables (#5022) · 2f41ecfe
      Zhongyi Xie 提交于
      Summary:
      MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory.
      We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one.
      The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming.
      In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022
      
      Differential Revision: D14394062
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5
      2f41ecfe
    • D
      crc32c_arm64 performance optimization (#5675) · 26293c89
      DaiZhiwei 提交于
      Summary:
      Crc32c Parallel computation coding optimization:
      Macro unfolding removes the "for" loop and is good to decrease branch-miss in arm64 micro architecture
      1024 Bytes is divided into  8(head) + 1008( 6 * 7 * 3 * 8 ) + 8(tail)  three parts
      Macro unfolding 42 loops to 6 CRC32C7X24BYTESs
      1 CRC32C7X24BYTES containing 7 CRC32C24BYTESs
      
      1, crc32c_test
      [==========] Running 4 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 4 tests from CRC
      [ RUN      ] CRC.StandardResults
      [       OK ] CRC.StandardResults (1 ms)
      [ RUN      ] CRC.Values
      [       OK ] CRC.Values (0 ms)
      [ RUN      ] CRC.Extend
      [       OK ] CRC.Extend (0 ms)
      [ RUN      ] CRC.Mask
      [       OK ] CRC.Mask (0 ms)
      [----------] 4 tests from CRC (1 ms total)
      
      [----------] Global test environment tear-down
      [==========] 4 tests from 1 test case ran. (1 ms total)
      [  PASSED  ] 4 tests.
      
      2, db_bench --benchmarks="crc32c"
      crc32c : 0.218 micros/op 4595390 ops/sec; 17950.7 MB/s (4096 per op)
      
      3, repeated crc32c_test case  60000 times
      perf stat -e branch-miss -- ./crc32c_test
      before optimization:
      739,426,504      branch-miss
      after optimization:
      1,128,572      branch-miss
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5675
      
      Differential Revision: D16989210
      
      fbshipit-source-id: 7204e6069bb6ed066d49c2d1b3ac385065a98557
      26293c89
  13. 23 8月, 2019 3 次提交
    • L
      Revert to storing UncompressionDicts in the cache (#5645) · df8c307d
      Levi Tamasi 提交于
      Summary:
      PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point
      of the digest is to create it once and reuse it over and over again. This patch goes back to
      storing the uncompression dictionary itself in the cache (which should be now safe to do,
      since it no longer includes a Statistics pointer), while preserving the rest of the refactoring.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645
      
      Test Plan: make asan_check
      
      Differential Revision: D16551864
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800
      df8c307d
    • S
      Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729) · d8a27d93
      sdong 提交于
      Summary:
      AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
      Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729
      
      Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.
      
      Differential Revision: D16969791
      
      fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
      d8a27d93
    • P
      Fix local includes · 202942b2
      Patrick Pei 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5722
      
      Differential Revision: D16908380
      
      fbshipit-source-id: 6a0e3cb2730b08d6012d3d7f31c937f01c399846
      202942b2