1. 20 12月, 2017 1 次提交
    • Y
      Port 3 way SSE4.2 crc32c implementation from Folly · f54d7f5f
      yingsu00 提交于
      Summary:
      **# Summary**
      
      RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
      
      This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
      
      **# Performance Test Results**
      We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
      
      Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
      
      1) ReadRandom in db_bench overall metrics
      
          PER RUN
          Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
          3-way      |  1   | 4.143   | 241387 | 26.7
          3-way      |  2   | 3.775   | 264872 | 29.3
          3-way      | 3    | 4.116   | 242929 | 26.9
          FastCrc32c|1  | 4.037   | 247727 | 27.4
          FastCrc32c|2  | 4.648   | 215166 | 23.8
          FastCrc32c|3  | 4.352   | 229799 | 25.4
      
           AVG
          Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
          3-way           |     4.01                               |      249,729                 |      27.63
          FastCrc32c  |     4.35                              |     230,897                  |      25.53
      
       2)   Crc32c computation CPU cost (inclusive samples percentage)
          PER RUN
          Implementation | run |  TotalSamples   | Crc32c percentage
          3-way                 |  1    |  4,572,250,000 | 4.37%
          3-way                 |  2    |  3,779,250,000 | 4.62%
          3-way                 |  3    |  4,129,500,000 | 4.48%
          FastCrc32c       |  1    |  4,663,500,000 | 11.24%
          FastCrc32c       |  2    |  4,047,500,000 | 12.34%
          FastCrc32c       |  3    |  4,366,750,000 | 11.68%
      
       **# Test Plan**
           make -j64 corruption_test && ./corruption_test
            By default it uses 3-way SSE algorithm
      
           NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
      
          make clean && DEBUG_LEVEL=0 make -j64 db_bench
          make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
      Closes https://github.com/facebook/rocksdb/pull/3173
      
      Differential Revision: D6330882
      
      Pulled By: yingsu00
      
      fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
      f54d7f5f
  2. 12 12月, 2017 1 次提交
    • S
      Refactor ReadBlockContents() · 2f1a3a4d
      Siying Dong 提交于
      Summary:
      Divide ReadBlockContents() to multiple sub-functions. Maintaining the input and intermediate data in a new class BlockFetcher.
      I hope in general it makes the code easier to maintain.
      Another motivation to do it is to clearly divide the logic before file reading and after file reading. The refactor will help us evaluate how can we make I/O async in the future.
      Closes https://github.com/facebook/rocksdb/pull/3244
      
      Differential Revision: D6520983
      
      Pulled By: siying
      
      fbshipit-source-id: 338d90bc0338472d46be7a7682028dc9114b12e9
      2f1a3a4d
  3. 02 12月, 2017 1 次提交
    • A
      gflags in cmake on linux · 57056bb6
      Andrew Kryczka 提交于
      Summary:
      We should use it if available otherwise the tools builds never work. Thanks to #3212, we can set -DGFLAGS=1 and it'll be independent of the namespace with which gflags was compiled.
      Closes https://github.com/facebook/rocksdb/pull/3214
      
      Differential Revision: D6462214
      
      Pulled By: ajkr
      
      fbshipit-source-id: db4e5f1b905322e3119554a9d01b57532c499384
      57056bb6
  4. 01 12月, 2017 1 次提交
  5. 14 11月, 2017 1 次提交
  6. 03 11月, 2017 1 次提交
  7. 28 10月, 2017 1 次提交
  8. 20 10月, 2017 1 次提交
  9. 13 10月, 2017 1 次提交
  10. 12 10月, 2017 1 次提交
  11. 07 10月, 2017 1 次提交
    • Y
      WritePrepared Txn: Compaction/Flush · d1b74b0c
      Yi Wu 提交于
      Summary:
      Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator:
      * check if key has been committed. If not, output uncommitted keys AS-IS.
      * use SnapshotChecker to check if key is visible to a snapshot when in need.
      * do not output key with seq = 0 if the key is not committed.
      Closes https://github.com/facebook/rocksdb/pull/2926
      
      Differential Revision: D5902907
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 945e037fdf0aa652dc5ba0ad879461040baa0320
      d1b74b0c
  12. 04 10月, 2017 1 次提交
    • Y
      Add ValueType::kTypeBlobIndex · d1cab2b6
      Yi Wu 提交于
      Summary:
      Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
      1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
      2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
      3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).
      
      The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().
      
      Changes on blob db side will be in a separate patch.
      Closes https://github.com/facebook/rocksdb/pull/2886
      
      Differential Revision: D5838431
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
      d1cab2b6
  13. 20 9月, 2017 2 次提交
    • Y
      Update cmake_minimum_required to 2.8.12. · 8ae81684
      Yao Zongyou 提交于
      Summary:
      Hello,
      
      current master branch declares cmake_minimum_required (VERSION 2.8.11)
      but cmake gives the following error:
      
      [  6%] CMake Error at CMakeLists.txt:658 (install):
        install TARGETS given unknown argument "INCLUDES".
      
      CMake Error at src/CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES".
      
      because this argument not supported on CMake versions prior 2.8.12
      Closes https://github.com/facebook/rocksdb/pull/2904
      
      Differential Revision: D5863430
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 0f7230e080add472ad4b87836b3104ea0b971a38
      8ae81684
    • O
      Fix MinGW build · 34ebadf9
      Orgad Shaneh 提交于
      Summary:
      snprintf is defined as _snprintf, which doesn't exist in the std
      namespace.
      Closes https://github.com/facebook/rocksdb/pull/2298
      
      Differential Revision: D5070457
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
      34ebadf9
  14. 13 9月, 2017 1 次提交
    • B
      Use cmake TIMESTAMP function · 82860bd5
      Bernhard M. Wiedemann 提交于
      Summary:
      because it is not only platform independent
      but also allows to override the build date
      This helps to make ceph builds reproducible (that includes a fork of rockdb in a submodule)
      
      Also adds UTC flag, to be independent of timezone.
      
      Requires cmake-2.8.11+ from 2013
      Closes https://github.com/facebook/rocksdb/pull/2848
      
      Differential Revision: D5820189
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: e3e8c1550e10e238c173f6c5d9ba15f71ad3ce28
      82860bd5
  15. 01 9月, 2017 1 次提交
  16. 29 8月, 2017 2 次提交
  17. 22 8月, 2017 1 次提交
  18. 14 8月, 2017 2 次提交
    • N
      properly set C[XX]FLAGS during CMake configure-time checks · 279296f4
      Nikhil Benesch 提交于
      Summary:
      Some compilers require `-std=c++11` for the `cstdint` header to be available. We already have logic to add `-std=c++11` to `CXXFLAGS` when the compiler is not MSVC; simply reorder CMakeLists.txt so that logic happens before the calls to `CHECK_CXX_SOURCE_COMPILES`.
      
      Additionally add a missing `set(CMAKE_REQUIRED_FLAGS, ...)` before a call to `CHECK_C_SOURCE_COMPILES`.
      Closes https://github.com/facebook/rocksdb/pull/2535
      
      Differential Revision: D5384244
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 2dbae4297c5d8ab4636e08b1457ffb2d3e37aef4
      279296f4
    • J
      cmake: support more compression type · 185ade4c
      Jay 提交于
      Summary:
      This pr enables linking all the supported compression libraries via cmake.
      Closes https://github.com/facebook/rocksdb/pull/2552
      
      Differential Revision: D5620607
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: b6949181f305bfdf04a98f898c92fd0caba0c45a
      185ade4c
  19. 08 8月, 2017 1 次提交
    • M
      Refactor PessimisticTransaction · bdc056f8
      Maysam Yabandeh 提交于
      Summary:
      This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
      Closes https://github.com/facebook/rocksdb/pull/2691
      
      Differential Revision: D5569464
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325
      bdc056f8
  20. 06 8月, 2017 1 次提交
  21. 03 8月, 2017 1 次提交
    • M
      Refactor TransactionImpl · c3d5c4d3
      Maysam Yabandeh 提交于
      Summary:
      This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
      Closes https://github.com/facebook/rocksdb/pull/2676
      
      Differential Revision: D5549998
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
      c3d5c4d3
  22. 02 8月, 2017 1 次提交
    • Y
      Dump Blob DB options to info log · 1900771b
      Yi Wu 提交于
      Summary:
      * Dump blob db options to info log
      * Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
      * Change some of the default options
      * Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
      Closes https://github.com/facebook/rocksdb/pull/2671
      
      Differential Revision: D5529912
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
      1900771b
  23. 29 7月, 2017 1 次提交
    • I
      CacheActivityLogger, component to log cache activity into a file · 50a96913
      Islam AbdelRahman 提交于
      Summary:
      Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache.
      API:
      ```
      SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>)
      SimCache::StopActivityLogging()
      ```
      
      Sending for review, Still need to add more comments.
      
      I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache.
      Closes https://github.com/facebook/rocksdb/pull/2295
      
      Differential Revision: D5063826
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c
      50a96913
  24. 28 7月, 2017 1 次提交
    • Y
      Blob DB TTL extractor · 6083bc79
      Yi Wu 提交于
      Summary:
      Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL
      extractor can be use to extract TTL from keys insert with Put or
      WriteBatch. Change over existing extract_ttl_fn are:
      * If value is changed, it will be return via std::string* (rather than Slice*). With Slice* the new value has to be part of the existing value. With std::string* the limitation is removed.
      * It can optionally return TTL or expiration.
      
      Other changes in this PR:
      * replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests.
      * add several TTL tests.
      * other minor naming change.
      Closes https://github.com/facebook/rocksdb/pull/2659
      
      Differential Revision: D5512627
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440
      6083bc79
  25. 22 7月, 2017 1 次提交
    • P
      Cassandra compaction filter for purge expired columns and rows · 534c255c
      Pengchao Wang 提交于
      Summary:
      Major changes in this PR:
      * Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
      * Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
      * Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
      * Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.
      Closes https://github.com/facebook/rocksdb/pull/2588
      
      Differential Revision: D5430010
      
      Pulled By: wpc
      
      fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687
      534c255c
  26. 18 7月, 2017 1 次提交
    • A
      Revert cmake -DNDEBUG for non-MSVC · 7ac184c6
      Andrew Kryczka 提交于
      Summary:
      Unfortunately we can't use -DNDEBUG yet since we don't properly exclude the test libraries/executables from the non-debug builds on non-MSVC platforms. Previously this was failing on Linux for every build type except `CMAKE_BUILD_TYPE=Debug`.
      
      Reverts a48a62d5
      Closes https://github.com/facebook/rocksdb/pull/2595
      
      Differential Revision: D5436182
      
      Pulled By: ajkr
      
      fbshipit-source-id: 062f07cc9ce06a073b66054722b27bac1890dca3
      7ac184c6
  27. 11 7月, 2017 1 次提交
    • G
      Fix undefined behavior in Hash · 8f927e5f
      Giuseppe Ottaviano 提交于
      Summary:
      Instead of ignoring UBSan checks, fix the negative shifts in
      Hash(). Also add test to make sure the hash values are stable over
      time. The values were computed before this change, so the test also
      verifies the correctness of the change.
      Closes https://github.com/facebook/rocksdb/pull/2546
      
      Differential Revision: D5386369
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
      8f927e5f
  28. 28 6月, 2017 1 次提交
    • Y
      Fix TARGETS file tests list · 982cec22
      Yi Wu 提交于
      Summary:
      1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
      2. add blob_db_test
      Closes https://github.com/facebook/rocksdb/pull/2506
      
      Differential Revision: D5331517
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
      982cec22
  29. 27 6月, 2017 1 次提交
    • E
      Encryption at rest support · 51778612
      Ewout Prangsma 提交于
      Summary:
      This PR adds support for encrypting data stored by RocksDB when written to disk.
      
      It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files.
      The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done.
      Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!).
      
      The Counter operation mode uses an initial counter & random initialization vector (IV).
      Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize).
      The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there.
      
      To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests.
      Typically you would run it like this:
      
      ```
      ENCRYPTED_ENV=1 make check_some
      ```
      
      There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings.
      Closes https://github.com/facebook/rocksdb/pull/2424
      
      Differential Revision: D5322178
      
      Pulled By: sdwilsh
      
      fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f
      51778612
  30. 17 6月, 2017 1 次提交
  31. 03 6月, 2017 1 次提交
    • S
      Improve write buffer manager (and allow the size to be tracked in block cache) · 95b0e89b
      Siying Dong 提交于
      Summary:
      Improve write buffer manager in several ways:
      1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
      2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
      3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
      Closes https://github.com/facebook/rocksdb/pull/2350
      
      Differential Revision: D5110648
      
      Pulled By: siying
      
      fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
      95b0e89b
  32. 02 6月, 2017 1 次提交
    • M
      Retire memenv https://github.com/facebook/rocksdb/pull/2082 · 5a9b4d74
      Maysam Yabandeh 提交于
      Summary:
      This is a manual commit of this PR:
      Retire InMemoryEnv in favor of MockEnv #2082
      With MockEnv doing the same yet being more mature, InMemoryEnv is redundant.
      
      Reviewed By: IslamAbdelRahman
      
      Differential Revision: D5162323
      
      fbshipit-source-id: 59fd0082a891dc99cc531e4da9d68bf891eae3f5
      5a9b4d74
  33. 01 6月, 2017 2 次提交
    • T
      db: avoid `#include`ing malloc and jemalloc simultaneously · 0dc3040d
      Tamir Duberstein 提交于
      Summary:
      This fixes a compilation failure on Linux when the system libc is not
      glibc. jemalloc's configure script incorrectly assumes that glibc is
      always used on Linux systems, producing glibc-style signatures; when
      the system libc is e.g. musl, the following error is observed:
      
      ```
        [  0%] Building CXX object CMakeFiles/rocksdb.dir/db/db_impl.cc.o
        In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/table/block.h:19:0,
                         from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:77:
        /x-tools/x86_64-unknown-linux-musl/x86_64-unknown-linux-musl/sysroot/usr/include/malloc.h:19:8: error: declaration of 'size_t malloc_usable_size(void*)' has a different exception specifier
         size_t malloc_usable_size(void *);
                ^~~~~~~~~~~~~~~~~~
        In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:20:0:
        /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:78:33: note: from previous declaration 'size_t malloc_usable_size(void*) throw ()'
         #  define je_malloc_usable_size malloc_usable_size
                                         ^
        /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:239:41: note: in expansion of macro 'je_malloc_usable_size'
         JEMALLOC_EXPORT size_t JEMALLOC_NOTHROW je_malloc_usable_size(
                                                 ^~~~~~~~~~~~~~~~~~~~~
        CMakeFiles/rocksdb.dir/build.make:350: recipe for target 'CMakeFiles/rocksdb.dir/db/db_impl.cc.o' failed
      ```
      
      This works around the issue by rearranging the sources such that
      jemalloc's headers are never in the same scope as the system's malloc
      header. The jemalloc issue has been reported as well, see:
      https://github.com/jemalloc/jemalloc/issues/778.
      
      cc tschottdorf
      Closes https://github.com/facebook/rocksdb/pull/2188
      
      Differential Revision: D5163048
      
      Pulled By: siying
      
      fbshipit-source-id: c553125458892def175c1be5682b0330d80b2a0d
      0dc3040d
    • Y
      Fixing blob db sequence number handling · ad19eb86
      Yi Wu 提交于
      Summary:
      Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.
      
      Stacking on #2375.
      Closes https://github.com/facebook/rocksdb/pull/2385
      
      Differential Revision: D5148358
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
      ad19eb86
  34. 31 5月, 2017 2 次提交
  35. 17 5月, 2017 1 次提交