1. 07 8月, 2019 2 次提交
    • V
      New API to get all merge operands for a Key (#5604) · d150e014
      Vijay Nadimpalli 提交于
      Summary:
      This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
      1. Update subset of columns and read subset of columns -
      Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
      2. Updating very few attributes in a value which is a JSON-like document -
      Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
      ----------------------------------------------------------------------------------------------------
      API :
      Status GetMergeOperands(
            const ReadOptions& options, ColumnFamilyHandle* column_family,
            const Slice& key, PinnableSlice* merge_operands,
            GetMergeOperandsOptions* get_merge_operands_options,
            int* number_of_operands)
      
      Example usage :
      int size = 100;
      int number_of_operands = 0;
      std::vector<PinnableSlice> values(size);
      GetMergeOperandsOptions merge_operands_info;
      db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);
      
      Description :
      Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
      merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604
      
      Test Plan:
      Added unit test and perf test in db_bench that can be run using the command:
      ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist
      
      Differential Revision: D16657366
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
      d150e014
    • Y
      Correct the default write buffer size of java doc (#5670) · 4f98b43b
      Yun Tang 提交于
      Summary:
      The actual value of default write buffer size within `rocksdb/include/rocksdb/options.h` is 64 MB, we should correct this value in java doc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5670
      
      Differential Revision: D16668815
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: cc3a981c9f1c2cd4a8392b0ed5f1fd0a2d729afb
      4f98b43b
  2. 06 8月, 2019 5 次提交
    • K
      cmake: cmake related cleanups (#5662) · cc9fa7fc
      Kefu Chai 提交于
      Summary:
      - cmake: use the builtin FindBzip2.cmake from CMake
      - cmake: require CMake v3.5.1
      - cmake: add imported target for 3rd party libraries
      - cmake: extract ReadVersion.cmake out and refactor it
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5662
      
      Differential Revision: D16660974
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 681594910e74253251fe14ad0befc41a4d0f4fd4
      cc9fa7fc
    • H
      Block cache analyzer: python script to plot graphs (#5673) · f4a616eb
      haoyuhuang 提交于
      Summary:
      This PR updated the python script to plot graphs for stats output from block cache analyzer.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5673
      
      Test Plan: Manually run the script to generate graphs.
      
      Differential Revision: D16657145
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: fd510b5fd4307835f9a986fac545734dbe003d28
      f4a616eb
    • Y
      Fix make target 'all' and 'check' (#5672) · b1a02ffe
      Yanqin Jin 提交于
      Summary:
      If a test is one of parallel tests, then it should also be one of the 'tests'.
      Otherwise, `make all` won't build the binaries. For examle,
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      ```
      Then if you do
      ```
      $make check
      ```
      The second command will invoke the compilation and building for db_bloom_test
      and file_reader_writer_test **without** the `COMPILE_WITH_ASAN=1`, causing the
      command to fail.
      
      Test plan (on devserver):
      ```
      $make -j32 all
      ```
      Verify all binaries are built so that `make check` won't have to compile any
      thing.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5672
      
      Differential Revision: D16655834
      
      Pulled By: riversand963
      
      fbshipit-source-id: 050131412b5313496f85ae3deeeeb8d28af75746
      b1a02ffe
    • M
      WritePrepared: fix Get without snapshot (#5664) · 208556ee
      Maysam Yabandeh 提交于
      Summary:
      if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664
      
      Differential Revision: D16614033
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
      208556ee
    • M
      Disable ReadYourOwnWriteStress when run under Valgrind (#5671) · e579e32e
      Maysam Yabandeh 提交于
      Summary:
      It sometimes times out when run under valgrind taking around 20m. The patch skips the test under Valgrind.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5671
      
      Differential Revision: D16652382
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 0f6f4f76d37337d56226b689e01b14523dd07aae
      e579e32e
  3. 03 8月, 2019 1 次提交
    • Y
      Change buckifier to support parameterized dependencies (#5648) · 30edf187
      Yanqin Jin 提交于
      Summary:
      Users may desire to specify extra dependencies via buck. This PR allows users to pass additional dependencies as a JSON object so that the buckifier script can generate TARGETS file with desired extra dependencies.
      
      Test plan (on dev server)
      ```
      $python buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}'
      Generating TARGETS
      Extra dependencies:
      {'': {'extra_compiler_flags': [], 'extra_deps': []}, 'test_dep1': {'extra_compiler_flags': ['-O2', '-DROCKSDB_LITE'], 'extra_deps': [':fake', '//dep1/mock']}}
      Generated TARGETS Summary:
      - 5 libs
      - 0 binarys
      - 296 tests
      ```
      Verify the TARGETS file.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5648
      
      Differential Revision: D16565043
      
      Pulled By: riversand963
      
      fbshipit-source-id: a6ef02274174fcf159692d7b846e828454d01e89
      30edf187
  4. 02 8月, 2019 1 次提交
    • Z
      Fix duplicated file names in PurgeObsoleteFiles (#5603) · d1c9ede1
      Zhongyi Xie 提交于
      Summary:
      Currently in `DBImpl::PurgeObsoleteFiles`, the list of candidate files is create through a combination of calling LogFileName using `log_delete_files` and `full_scan_candidate_files`.
      
      In full_scan_candidate_files, the filenames look like this
      {file_name = "074715.log", file_path = "/txlogs/3306"},
      but LogFileName produces filenames like this that prepends a slash:
      {file_name = "/074715.log", file_path = "/txlogs/3306"},
      
      This confuses the dedup step here: https://github.com/facebook/rocksdb/blob/bb4178066dc4f18b9b7f1d371e641db027b3edbe/db/db_impl/db_impl_files.cc#L339-L345
      
      Because duplicates still exist, DeleteFile is called on the same file twice, and hits an error on the second try. Error message: Failed to mark /txlogs/3302/764418.log as trash.
      
      The root cause is the use of `kDumbDbName` when generating file names, it creates file names like /074715.log. This PR removes the use of `kDumbDbName` and create paths without leading '/' when dbname can be ignored.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5603
      
      Test Plan: make check
      
      Differential Revision: D16413203
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 6ba8288382c55f7d5e3892d722fc94b57d2e4491
      d1c9ede1
  5. 01 8月, 2019 3 次提交
    • L
      Test the various configurations in parallel in MergeOperatorPinningTest (#5659) · 1dfc5eaa
      Levi Tamasi 提交于
      Summary:
      MergeOperatorPinningTest.Randomized frequently times out under TSAN
      because it tests ~40 option configurations sequentially in a loop. The
      patch parallelizes the tests of the various configurations to make the
      test complete faster.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5659
      
      Test Plan: Tested using buck test mode/dev-tsan ...
      
      Differential Revision: D16587518
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 65bd25c0ad9a23587fed5592e69c1a0097fa27f6
      1dfc5eaa
    • M
      WriteUnPrepared: savepoint support (#5627) · f622ca2c
      Manuel Ung 提交于
      Summary:
      Add savepoint support when the current transaction has flushed unprepared batches.
      
      Rolling back to savepoint is similar to rolling back a transaction. It requires the set of keys that have changed since the savepoint, re-reading the keys at the snapshot at that savepoint, and the restoring the old keys by writing out another unprepared batch.
      
      For this strategy to work though, we must be capable of reading keys at a savepoint. This does not work if keys were written out using the same sequence number before and after a savepoint. Therefore, when we flush out unprepared batches, we must split the batch by savepoint if any savepoints exist.
      
      eg. If we have the following:
      ```
      Put(A)
      Put(B)
      Put(C)
      SetSavePoint()
      Put(D)
      Put(E)
      SetSavePoint()
      Put(F)
      ```
      
      Then we will write out 3 separate unprepared batches:
      ```
      Put(A) 1
      Put(B) 1
      Put(C) 1
      Put(D) 2
      Put(E) 2
      Put(F) 3
      ```
      
      This is so that when we rollback to eg. the first savepoint, we can just read keys at snapshot_seq = 1.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5627
      
      Differential Revision: D16584130
      
      Pulled By: lth
      
      fbshipit-source-id: 6d100dd548fb20c4b76661bd0f8a2647e64477fa
      f622ca2c
    • M
      WriteUnPrepared: use WriteUnpreparedTxnReadCallback for ValidateSnapshot (#5657) · d599135a
      Manuel Ung 提交于
      Summary:
      In DeferSnapshotSavePointTest, writes were failing with snapshot validation error because the key with the latest sequence number was an unprepared key from the current transaction.
      
      Fix this by passing down the correct read callback.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5657
      
      Differential Revision: D16582466
      
      Pulled By: lth
      
      fbshipit-source-id: 11645dac0e7c1374d917ef5fdf757d13c1d1108d
      d599135a
  6. 31 7月, 2019 5 次提交
  7. 30 7月, 2019 2 次提交
    • M
      WriteUnPrepared: Use WriteUnpreparedTxnReadCallback for MultiGet (#5634) · 399f4778
      Manuel Ung 提交于
      Summary:
      The `TransactionTest.MultiGetBatchedTest` were failing with unprepared batches because we were not using the correct callbacks. Override MultiGet to pass down the correct ReadCallback. A similar problem is also fixed in WritePrepared.
      
      This PR also fixes an issue similar to (https://github.com/facebook/rocksdb/pull/5147), but for MultiGet instead of Get.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5634
      
      Differential Revision: D16552674
      
      Pulled By: lth
      
      fbshipit-source-id: 736eaf8e919c6b13d5f5655b1c0d36b57ad04804
      399f4778
    • H
      Cache simulator: Optimize hybrid row-block cache. (#5616) · e648c1d9
      haoyuhuang 提交于
      Summary:
      This PR optimizes the hybrid row-block cache simulator. If a Get request hits the cache, we treat all its future accesses as hits.
      
      Consider a Get request (no snapshot) accesses multiple files, e.g, file1, file2, file3. We construct the row key as "fdnumber_key_0". Before this PR, if it hits the cache when searching the key in file1, we continue to process its accesses in file2 and file3 which is unnecessary.
      
      With this PR, if "file1_key_0" is in the cache, we treat all future accesses of this Get request as hits.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5616
      
      Differential Revision: D16453187
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 56f3169cc322322305baaf5543226a0824fae19f
      e648c1d9
  8. 27 7月, 2019 7 次提交
    • M
      Use int64_t instead of ssize_t (#5638) · 80d7067c
      Manuel Ung 提交于
      Summary:
      The ssize_t type was introduced in https://github.com/facebook/rocksdb/pull/5633, but it seems like it's a POSIX specific type.
      
      I just need a signed type to represent number of bytes, so use int64_t instead. It seems like we have a typedef from SSIZE_T for Windows, but it doesn't seem like we ever include "port/port.h" in our public header files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5638
      
      Differential Revision: D16526269
      
      Pulled By: lth
      
      fbshipit-source-id: 8d3a5c41003951b74b29bc5f1d949b2b22da0cee
      80d7067c
    • L
      Reduce the number of random iterations in compact_on_deletion_collector_test (#5635) · 3f89af1c
      Levi Tamasi 提交于
      Summary:
      This test frequently times out under TSAN; reducing the number of random
      iterations to make it complete faster.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5635
      
      Test Plan: buck test mode/dev-tsan internal_repo_rocksdb/repo:compact_on_deletion_collector_test
      
      Differential Revision: D16523505
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 6a69909bce9d204c891150fcb3d536547b3253d0
      3f89af1c
    • H
      Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610) · 70c7302f
      haoyuhuang 提交于
      Summary:
      This PR implements cache eviction using reinforcement learning. It includes two implementations:
      1. An implementation of Thompson Sampling for the Bernoulli Bandit [1].
      2. An implementation of LinUCB with disjoint linear models [2].
      
      The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss.
      Thompson Sampling is contextless and does not include any features.
      LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use.
      
      [1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070
      [2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610
      
      Differential Revision: D16435067
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb
      70c7302f
    • M
      WriteUnPrepared: Add new variable write_batch_flush_threshold (#5633) · 41df7348
      Manuel Ung 提交于
      Summary:
      Instead of reusing `TransactionOptions::max_write_batch_size` for determining when to flush a write batch for write unprepared, add a new variable called `write_batch_flush_threshold` for this use case instead.
      
      Also add `TransactionDBOptions::default_write_batch_flush_threshold` which sets the default value if `TransactionOptions::write_batch_flush_threshold` is unspecified.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5633
      
      Differential Revision: D16520364
      
      Pulled By: lth
      
      fbshipit-source-id: d75ae5a2141ce7708982d5069dc3f0b58d250e8c
      41df7348
    • L
      Parallelize db_bloom_filter_test (#5632) · 3617287e
      Levi Tamasi 提交于
      Summary:
      This test frequently times out under TSAN; parallelizing it should fix
      this issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5632
      
      Test Plan:
      make check
      buck test mode/dev-tsan internal_repo_rocksdb/repo:db_bloom_filter_test
      
      Differential Revision: D16519399
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 66e05a644d6f79c6d544255ffcf6de195d2d62fe
      3617287e
    • M
      Fix PopSavePoint to merge info into the previous savepoint (#5628) · 230b909d
      Manuel Ung 提交于
      Summary:
      Transaction::RollbackToSavePoint undos the modification made since the SavePoint beginning, and also unlocks the corresponding keys, which are tracked in the last SavePoint. Currently ::PopSavePoint simply discard these tracked keys, leaving them locked in the lock manager. This breaks a subsequent ::RollbackToSavePoint behavior as it loses track of such keys, and thus cannot unlock them. The patch fixes ::PopSavePoint by passing on the track key information to the previous SavePoint.
      Fixes https://github.com/facebook/rocksdb/issues/5618
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5628
      
      Differential Revision: D16505325
      
      Pulled By: lth
      
      fbshipit-source-id: 2bc3b30963ab4d36d996d1f66543c93abf358980
      230b909d
    • Y
      Fix target 'clean' to include parallel test binaries (#5629) · 74782cec
      Yanqin Jin 提交于
      Summary:
      current `clean` target in Makefile does not remove parallel test
      binaries. Fix this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5629
      
      Test Plan:
      (on devserver)
      Take file_reader_writer_test for instance.
      ```
      $make -j32 file_reader_writer_test
      $make clean
      ```
      Verify that binary file 'file_reader_writer_test' is delete by `make clean`.
      
      Differential Revision: D16513176
      
      Pulled By: riversand963
      
      fbshipit-source-id: 70acb9f56c928a494964121b86aacc0090f31ff6
      74782cec
  9. 26 7月, 2019 3 次提交
    • E
      Added SizeApproximationOptions to DB::GetApproximateSizes (#5626) · 9625a2bc
      Eli Pozniansky 提交于
      Summary:
      The new DB::GetApproximateSizes with SizeApproximationOptions argument, which allows to add more options/knobs to the DB::GetApproximateSizes call (beyond only the include_flags)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5626
      
      Differential Revision: D16496913
      
      Pulled By: elipoz
      
      fbshipit-source-id: ee8c6c182330a285fa056ecfc3905a592b451720
      9625a2bc
    • Y
      Avoid user key copying for Get/Put/Write with user-timestamp (#5502) · ae152ee6
      Yanqin Jin 提交于
      Summary:
      In previous https://github.com/facebook/rocksdb/issues/5079, we added user-specified timestamp to `DB::Get()` and `DB::Put()`. Limitation is that these two functions may cause extra memory allocation and key copy. The reason is that `WriteBatch` does not allocate extra memory for timestamps because it is not aware of timestamp size, and we did not provide an API to assign/update timestamp of each key within a `WriteBatch`.
      We address these issues in this PR by doing the following.
      1. Add a `timestamp_size_` to `WriteBatch` so that `WriteBatch` can take timestamps into account when calling `WriteBatch::Put`, `WriteBatch::Delete`, etc.
      2. Add APIs `WriteBatch::AssignTimestamp` and `WriteBatch::AssignTimestamps` so that application can assign/update timestamps for each key in a `WriteBatch`.
      3. Avoid key copy in `GetImpl` by adding new constructor to `LookupKey`.
      
      Test plan (on devserver):
      ```
      $make clean && COMPILE_WITH_ASAN=1 make -j32 all
      $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/*
      $make check
      ```
      If the API extension looks good, I will add more unit tests.
      
      Some simple benchmark using db_bench.
      ```
      $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000
      $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -disable_wal=true
      ```
      Master is at a78503bd.
      ```
      |        | readrandom | fillrandom |
      | master | 15.53 MB/s | 25.97 MB/s |
      | PR5502 | 16.70 MB/s | 25.80 MB/s |
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5502
      
      Differential Revision: D16340894
      
      Pulled By: riversand963
      
      fbshipit-source-id: 51132cf792be07d1efc3ac33f5768c4ee2608bb8
      ae152ee6
    • C
      rocksdb: build on macosx · 0d16fad5
      Chad Austin 提交于
      Summary:
      Make rocksdb build on macos:
      1) Reorganize OS-specific flags and deps in rocksdb/src/TARGETS
      2) Sandbox fbcode apple platform builds from repo root include path (which conflicts
          with layout of rocksdb headers).
      3) Fix dep-translation for bzip2.
      
      Reviewed By: andrewjcg
      
      Differential Revision: D15125826
      
      fbshipit-source-id: 8e143c689b88b5727e54881a5e80500f879a320b
      0d16fad5
  10. 25 7月, 2019 4 次提交
    • M
      Declare snapshot refresh incompatible with delete range (#5625) · d9dc6b46
      Maysam Yabandeh 提交于
      Summary:
      The ::snap_refresh_nanos option is incompatible with DeleteRange feature. Currently the code relies on range_del_agg.IsEmpty() to disable it if there are range delete tombstones. However ::IsEmpty does not guarantee that there is no RangeDelete tombstones in the SST files. The patch declares the two features incompatible in inline comments until we later figure how to properly detect the presence of RangeDelete tombstones in compaction inputs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5625
      
      Differential Revision: D16468218
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bd7beca278bc7e1db75e7ee4522d05a3a6ca86f4
      d9dc6b46
    • S
      Auto Roll Logger to add some extra checking to avoid segfault. (#5623) · 7260347f
      sdong 提交于
      Summary:
      AutoRollLogger sets GetStatus() to be non-OK if the log file fails to be created and logger_ is set to null. It is left to the caller to check the status before calling function to this class. There is no harm to create another null checking to logger_ before we using it, so that in case users mis-use the logger, they don't get a segfault.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5623
      
      Test Plan: Run all existing tests.
      
      Differential Revision: D16466251
      
      fbshipit-source-id: 262b885eec28bf741d91e9191c3cb5ff964e1bce
      7260347f
    • S
      Fix regression bug of Auto rolling logger when handling failures (#5622) · 5daa426a
      sdong 提交于
      Summary:
      Auto roll logger fails to handle file creation error in the correct way, which may expose to seg fault condition to users. Fix it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5622
      
      Test Plan: Add a unit test on creating file under a non-existing directory. The test fails without the fix.
      
      Differential Revision: D16460853
      
      fbshipit-source-id: e96da4bef4f16db171ea04a11b2ec5a9448ddbde
      5daa426a
    • M
      Simplify WriteUnpreparedTxnReadCallback and fix some comments (#5621) · 66b524a9
      Manuel Ung 提交于
      Summary:
      Simplify WriteUnpreparedTxnReadCallback so we just have one function `CalcMaxVisibleSeq`. Also, there's no need for the read callback to hold onto the transaction any more, so just hold the set of unprep_seqs, reducing about of indirection in `IsVisibleFullCheck`.
      
      Also, some comments about using transaction snapshot were out of date, so remove them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5621
      
      Differential Revision: D16459883
      
      Pulled By: lth
      
      fbshipit-source-id: cd581323fd18982e817d99af57b6eaba59e599bb
      66b524a9
  11. 24 7月, 2019 6 次提交
    • S
      Fix wrong info log printing for num_range_deletions (#5617) · f5b951f7
      sdong 提交于
      Summary:
      num_range_deletions printing is wrong in this log line:
      
      2019/07/18-12:59:15.309271 7f869f9ff700 EVENT_LOG_v1 {"time_micros": 1563479955309228, "cf_name": "5", "job": 955, "event": "table_file_creation", "file_number": 34579, "file_size": 2239842, "table_properties": {"data_size": 1988792, "index_size": 3067, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 1, "filter_size": 170821, "raw_key_size": 1951792, "raw_average_key_size": 16, "raw_value_size": 1731720, "raw_average_value_size": 14, "num_data_blocks": 199, "num_entries": 121987, "num_deletions": 15184, "num_merge_operands": 86512, "num_range_deletions": 86512, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "5", "column_family_id": 5, "comparator": "leveldb.BytewiseComparator", "merge_operator": "PutOperator", "prefix_extractor_name": "rocksdb.FixedPrefix.7", "property_collectors": "[]", "compression": "ZSTD", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1563479951, "oldest_key_time": 0, "file_creation_time": 1563479954}}
      
      It actually prints "num_merge_operands" number. Fix it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5617
      
      Test Plan: Just build.
      
      Differential Revision: D16453110
      
      fbshipit-source-id: fc1024b3cd5650312ed47a1379f0d2cf8b2d8a8f
      f5b951f7
    • M
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293) · cfcf045a
      Mark Rambacher 提交于
      Summary:
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.  Objects are registered with the registry by Type (the class must implement the static const char *Type() method).
      
      This change is necessary for a few reasons:
      - By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
      - By having a class with instances, different units could have different objects registered.  This could be useful if, for example, one Option allowed for a dynamic library and one did not.
      
      When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.
      
      Test plan (on riversand963's  devserver)
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
      ```
      All tests pass.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293
      
      Differential Revision: D16363396
      
      Pulled By: riversand963
      
      fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
      cfcf045a
    • L
      Move the uncompression dictionary object out of the block cache (#5584) · 092f4170
      Levi Tamasi 提交于
      Summary:
      RocksDB has historically stored uncompression dictionary objects in the block
      cache as opposed to storing just the block contents. This neccesitated
      evicting the object upon table close. With the new code, only the raw blocks
      are stored in the cache, eliminating the need for eviction.
      
      In addition, the patch makes the following improvements:
      
      1) Compression dictionary blocks are now prefetched/pinned similarly to
      index/filter blocks.
      2) A copy operation got eliminated when the uncompression dictionary is
      retrieved.
      3) Errors related to retrieving the uncompression dictionary are propagated as
      opposed to silently ignored.
      
      Note: the patch temporarily breaks the compression dictionary evicition stats.
      They will be fixed in a separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5584
      
      Test Plan: make asan_check
      
      Differential Revision: D16344151
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2962b295f5b19628f9da88a3fcebbce5a5017a7b
      092f4170
    • E
      Improve CPU Efficiency of ApproximateSize (part 1) (#5613) · 6b7fcc0d
      Eli Pozniansky 提交于
      Summary:
      1. Avoid creating the iterator in order to call BlockBasedTable::ApproximateOffsetOf(). Instead, directly call into it.
      2. Optimize BlockBasedTable::ApproximateOffsetOf() keeps the index block iterator in stack.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5613
      
      Differential Revision: D16442660
      
      Pulled By: elipoz
      
      fbshipit-source-id: 9320be3e918c139b10e758cbbb684706d172e516
      6b7fcc0d
    • S
      ldb sometimes specify a string-append merge operator (#5607) · 3782accf
      sdong 提交于
      Summary:
      Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607
      
      Test Plan: Run ldb against a DB with merge operands and see the outputs.
      
      Differential Revision: D16442634
      
      fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
      3782accf
    • A
      Parallelize file_reader_writer_test in order to reduce timeouts · 112702ac
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5608
      
      Test Plan:
      make check
      buck test mode/dev-tsan internal_repo_rocksdb/repo:file_reader_writer_test -- --run-disabled
      
      Differential Revision: D16441796
      
      Pulled By: anand1976
      
      fbshipit-source-id: afbb88a9fcb1c0ba22215118767e8eab3d1d6a4a
      112702ac
  12. 23 7月, 2019 1 次提交
    • M
      WriteUnPrepared: improve read your own write functionality (#5573) · eae83274
      Manuel Ung 提交于
      Summary:
      There are a number of fixes in this PR (with most bugs found via the added stress tests):
      1. Re-enable reseek optimization. This was initially disabled to avoid infinite loops in https://github.com/facebook/rocksdb/pull/3955 but this can be resolved by remembering not to reseek after a reseek has already been done. This problem only affects forward iteration in `DBIter::FindNextUserEntryInternal`, as we already disable reseeking in `DBIter::FindValueForCurrentKeyUsingSeek`.
      2. Verify that ReadOption.snapshot can be safely used for iterator creation. Some snapshots would not give correct results because snaphsot validation would not be enforced, breaking some assumptions in Prev() iteration.
      3. In the non-snapshot Get() case, reads done at `LastPublishedSequence` may not be enough, because unprepared sequence numbers are not published. Use `std::max(published_seq, max_visible_seq)` to do lookups instead.
      4. Add stress test to test reading own writes.
      5. Minor bug in the allow_concurrent_memtable_write case where we forgot to pass in batch_per_txn_.
      6. Minor performance optimization in `CalcMaxUnpreparedSequenceNumber` by assigning by reference instead of value.
      7. Add some more comments everywhere.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5573
      
      Differential Revision: D16276089
      
      Pulled By: lth
      
      fbshipit-source-id: 18029c944eb427a90a87dee76ac1b23f37ec1ccb
      eae83274