1. 02 6月, 2018 5 次提交
    • Z
      Fix test for rocksdb_lite: hide incompatible option kDirectIO · 50d7ac0e
      Zhongyi Xie 提交于
      Summary:
      Previous commit https://github.com/facebook/rocksdb/pull/3935 unhide a few test options which includes kDirectIO. However it's not supported by RocksDB lite. Need to hide this option from the lite build.
      Closes https://github.com/facebook/rocksdb/pull/3943
      
      Differential Revision: D8242757
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 1edfad3a5d01a46bfb7eedee765981ebe02c500a
      50d7ac0e
    • A
      Copy Get() result when file reads use mmap · fea2b1df
      Andrew Kryczka 提交于
      Summary:
      For iterator reads, a `SuperVersion` is pinned to preserve a snapshot of SST files, and `Block`s are pinned to allow `key()` and `value()` to return pointers directly into a RocksDB memory region. This works for both non-mmap reads, where the block owns the memory region, and mmap reads, where the file owns the memory region.
      
      For point reads with `PinnableSlice`, only the `Block` object is pinned. This works for non-mmap reads because the block owns the memory region, so even if the file is deleted after compaction, the memory region survives. However, for mmap reads, file deletion causes the memory region to which the `PinnableSlice` refers to be unmapped.   The result is usually a segfault upon accessing the `PinnableSlice`, although sometimes it returned wrong results (I repro'd this a bunch of times with `db_stress`).
      
      This PR copies the value into the `PinnableSlice` when it comes from mmap'd memory. We can tell whether the `Block` owns its memory using `Block::cachable()`, which is unset when reads do not use the provided buffer as is the case with mmap file reads. When that is false we ensure the result of `Get()` is copied.
      
      This feels like a short-term solution as ideally we'd have the `PinnableSlice` pin the mmap'd memory so we can do zero-copy reads. It seemed hard so I chose this approach to fix correctness in the meantime.
      Closes https://github.com/facebook/rocksdb/pull/3881
      
      Differential Revision: D8076288
      
      Pulled By: ajkr
      
      fbshipit-source-id: 31d78ec010198723522323dbc6ea325122a46b08
      fea2b1df
    • A
      Configure direct I/O statically in db_stress · 88c3ee2d
      Andrew Kryczka 提交于
      Summary:
      Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):
      
      - It's a DB option so SetDBOptions should've been called instead
      - It's not a dynamic option so even SetDBOptions would fail
      - It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.
      
      In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
      Closes https://github.com/facebook/rocksdb/pull/3939
      
      Differential Revision: D8238120
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
      88c3ee2d
    • M
      Extend existing unit tests to run with WriteUnprepared as well · 01e3c30d
      Manuel Ung 提交于
      Summary:
      As titled.
      
      I have not extended the Compatibility tests because the new WAL markers are still unimplemented.
      Closes https://github.com/facebook/rocksdb/pull/3941
      
      Differential Revision: D8238394
      
      Pulled By: lth
      
      fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d
      01e3c30d
    • S
      add c api rocksdb_sstfilewriter_file_size · 89b37081
      straw 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3922
      
      Differential Revision: D8208528
      
      Pulled By: ajkr
      
      fbshipit-source-id: d384fe53cf526f2aadc7b79a423ce36dbd3ff224
      89b37081
  2. 01 6月, 2018 6 次提交
  3. 31 5月, 2018 2 次提交
    • J
      Compile error in db bench tool · 727eb881
      Jacquin Mininger 提交于
      Summary:
      Small format error below causes build to fail. I believe that this :
      ```
      fprintf(stderr, "num reads to do %lu\n", reads_);
      ```
      Can be changed to this:
      ```
      fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_);
      ```
      Successful build
      ```
        CC       utilities/blob_db/blob_dump_tool.o
        AR       librocksdb_debug.a
      ar: creating archive librocksdb_debug.a
      /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols
        CC       tools/db_bench.o
        CC       tools/db_bench_tool.o
      tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
          fprintf(stderr, "num reads to do %lu\n", reads_);
                                           ~~~     ^~~~~~
                                           %lld
      1 error generated.
      make: *** [tools/db_bench_tool.o] Error 1
      ```
      
      ```
      $ cd rocksdb
      $ make all
      
      $ g++ --version
      Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
      Apple LLVM version 9.1.0 (clang-902.0.39.1)
      Target: x86_64-apple-darwin17.5.0
      Thread model: posix
      InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
      ```
      Closes https://github.com/facebook/rocksdb/pull/3909
      
      Differential Revision: D8215710
      
      Pulled By: siying
      
      fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751
      727eb881
    • S
      Remove tests from ROCKSDB_VALGRIND_RUN · 4dd80deb
      Siying Dong 提交于
      Summary:
      In order to make valgrind check test to pass in a day, remove some tests that run prohibitively slow under valgrind.
      Closes https://github.com/facebook/rocksdb/pull/3924
      
      Differential Revision: D8210184
      
      Pulled By: siying
      
      fbshipit-source-id: 5b06fb08f3cf57571d422d05a0dbddc9f9376f7a
      4dd80deb
  4. 30 5月, 2018 4 次提交
    • A
      Delete triggered compaction for universal style · a736255d
      Anand Ananthabhotla 提交于
      Summary:
      This is still WIP, but I'm hoping for early feedback on the overall approach.
      
      This patch implements deletion triggered compaction, which till now only
      worked for leveled, for universal style. SST files are marked for
      compaction by the CompactOnDeletionCollertor table property. This is
      expected to be used when free disk space is low and the user wants to
      reclaim space by deleting a bunch of keys. The deletions are expected to
      be dense. In such a situation, we want to avoid a full compaction due to
      its space overhead.
      
      The strategy used in this case is similar to leveled. We pick one file
      from the set of files marked for compaction. We then expand the inputs
      to a clean cut on the same level, and then pick overlapping files from
      the next non-mepty level. Picking files from the next level can cause
      the key range to expand, and we opportunistically expand inputs in the
      source level to include files wholly in this key range.
      
      The main side effect of this is that it breaks the property of no time
      range overlap between levels. This shouldn't break any functionality.
      Closes https://github.com/facebook/rocksdb/pull/3860
      
      Differential Revision: D8124397
      
      Pulled By: anand1976
      
      fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf
      a736255d
    • Y
      Fix LRUCache missing null check on destruct · 724855c7
      Yi Wu 提交于
      Summary:
      Fix LRUCache missing null check on destruct. The check is needed if LRUCache::DisownData is called.
      Closes https://github.com/facebook/rocksdb/pull/3920
      
      Differential Revision: D8191631
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: d5014f6e49b51692c18a25fb55ece935f5a023c4
      724855c7
    • Y
      Fix compilation error when OPT="-DROCKSDB_LITE". · cf826de3
      Yanqin Jin 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3917
      
      Differential Revision: D8187733
      
      Pulled By: riversand963
      
      fbshipit-source-id: e4aa179cd0791ca77167e357f99de9afd4aef910
      cf826de3
    • M
      Check for rep_->table_properties being nullptr · 03cda531
      Maysam Yabandeh 提交于
      Summary:
      The very old sst formats do not have table_properties and rep_->table_properties is thus nullptr. The recent patch in https://github.com/facebook/rocksdb/pull/3894 does not check for nullptr and hence makes it backward incompatible. This patch adds the check.
      Closes https://github.com/facebook/rocksdb/pull/3918
      
      Differential Revision: D8188638
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b1d986665ecf0b4d1c442adfa8a193b97707d47b
      03cda531
  5. 29 5月, 2018 1 次提交
  6. 26 5月, 2018 5 次提交
    • M
      Exclude seq from index keys · 402b7aa0
      Maysam Yabandeh 提交于
      Summary:
      Index blocks have the same format as data blocks. The keys therefore similarly to the keys in the data blocks are internal keys, which means that in addition to the user key it also has 8 bytes that encodes sequence number and value type. This extra 8 bytes however is not necessary in index blocks since the index keys act as an separator between two data blocks. The only exception is when the last key of a block and the first key of the next block share the same user key, in which the sequence number is required to act as a separator.
      The patch excludes the sequence from index keys only if the above special case does not happen for any of the index keys. It then records that in the property block. The reader looks at the property block to see if it should expect sequence numbers in the keys of the index block.s
      Closes https://github.com/facebook/rocksdb/pull/3894
      
      Differential Revision: D8118775
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 915479f028b5799ca91671d67455ecdefbd873bd
      402b7aa0
    • N
      Check status when reading HashIndexPrefixesMetadataBlock · 8c3bf080
      Nathan VanBenschoten 提交于
      Summary:
      This was missed in a refactor of `ReadBlockContents` (2f1a3a4d).
      Closes https://github.com/facebook/rocksdb/pull/3906
      
      Differential Revision: D8172648
      
      Pulled By: ajkr
      
      fbshipit-source-id: 27e453b19795fea974bfed4721105be6f3a12090
      8c3bf080
    • A
      Fix an issue with unnecessary capture in lambda expressions · 45434178
      Adam Retter 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/3900
      Replaces https://github.com/facebook/rocksdb/pull/3901
      
      I needed this to build v5.12.4 on Mac OS X (10.13.3).
      Closes https://github.com/facebook/rocksdb/pull/3904
      
      Differential Revision: D8169357
      
      Pulled By: sagar0
      
      fbshipit-source-id: 85faac42168796e7def9250d0c221a9a03b84476
      45434178
    • Y
      Fix segfault caused by object premature destruction · aa53579d
      Yanqin Jin 提交于
      Summary:
      Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609).
      There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change.
      
      To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
      Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
      Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.
      
      Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.
      
      Test plan
      ```
      $ make -j16
      $ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
      ```
      
      Expected behavior:
      All tests should pass.
      Closes https://github.com/facebook/rocksdb/pull/3898
      
      Differential Revision: D8149421
      
      Pulled By: riversand963
      
      fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
      aa53579d
    • Fix Fadvise on closed file when reads use mmap · 6e08916e
      奏之章 提交于
      Summary:
      ```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called.
      Closes https://github.com/facebook/rocksdb/pull/2764
      
      Differential Revision: D8152515
      
      Pulled By: ajkr
      
      fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65
      6e08916e
  7. 25 5月, 2018 4 次提交
  8. 24 5月, 2018 3 次提交
  9. 23 5月, 2018 4 次提交
  10. 22 5月, 2018 6 次提交
    • S
      PersistRocksDBOptions() to use WritableFileWriter · 3db1ada3
      Siying Dong 提交于
      Summary:
      By using WritableFileWriter rather than WritableFile directly, we can buffer multiple Append() calls to one write() file system call, which will be expensive to underlying Env without its own write buffering.
      Closes https://github.com/facebook/rocksdb/pull/3882
      
      Differential Revision: D8080673
      
      Pulled By: siying
      
      fbshipit-source-id: e0db900cb3c178166aa738f3985db65e3ae2cf1b
      3db1ada3
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
    • Y
      Update ColumnFamilyTest for multi-CF verification · 263ef52b
      Yanqin Jin 提交于
      Summary:
      Change `keys_` from `set<string>` to `vector<set<string>>` so that each column
      family's keys are stored in one set.
      
      ajkr When you have a chance, can you PTAL? Thanks!
      Closes https://github.com/facebook/rocksdb/pull/3871
      
      Differential Revision: D8056447
      
      Pulled By: riversand963
      
      fbshipit-source-id: 650d0f9cad02b1bc005fc329ad76edbf053e6386
      263ef52b
    • A
      Print histogram count and sum in statistics string · 508a09fd
      Andrew Kryczka 提交于
      Summary:
      Previously it only printed percentiles, even though our histogram keeps track of count and sum (and more). There have been many times we want to know more than the percentiles. For example, we currently want sum of "rocksdb.compression.times.nanos" and sum of "rocksdb.decompression.times.nanos", which would allow us to know the relative cost of compression vs decompression.
      
      This PR adds count and sum to the string printed by `StatisticsImpl::ToString`. This is a bit risky as there are definitely parsers assuming the old format. I will mention it in HISTORY.md and hope for the best...
      Closes https://github.com/facebook/rocksdb/pull/3863
      
      Differential Revision: D8038831
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0465b72e4b0cbf18ef965f4efe402601d16d5b5c
      508a09fd
    • A
      Assert keys/values pinned by range deletion meta-block iterators · 7b655214
      Andrew Kryczka 提交于
      Summary:
      `RangeDelAggregator` holds the pointers returned by `BlockIter::key()` and `BlockIter::value()` so requires the data to which they point is pinned. `BlockIter::key()` points into block memory and is guaranteed to be pinned if and only if prefix encoding is disabled (or, equivalently, restart interval is set to one). I think `BlockIter::value()` is always pinned. Added an assert for these and removed the wrong TODO about increasing restart interval, which would enable key prefix encoding and break the assertion.
      Closes https://github.com/facebook/rocksdb/pull/3875
      
      Differential Revision: D8063667
      
      Pulled By: ajkr
      
      fbshipit-source-id: 60b5ebcc0cdd610dd6aad9e74a23378793672c41
      7b655214
    • A
      Add missing test files to src.mk · e410501e
      Andrew Kryczka 提交于
      Summary:
      We only generate the header dependency (".cc.d") files for files mentioned in "src.mk". When we don't generate them, changes to header dependencies do not cause `make` to recompile the dependent ".o". Then it takes a while for developers (or maybe just me) to realize `make clean` is necessary.
      Closes https://github.com/facebook/rocksdb/pull/3876
      
      Differential Revision: D8065389
      
      Pulled By: ajkr
      
      fbshipit-source-id: 0f62eee7bcab15b0215791564e6ab3775d46996b
      e410501e