1. 27 7月, 2019 7 次提交
    • M
      Use int64_t instead of ssize_t (#5638) · 80d7067c
      Manuel Ung 提交于
      Summary:
      The ssize_t type was introduced in https://github.com/facebook/rocksdb/pull/5633, but it seems like it's a POSIX specific type.
      
      I just need a signed type to represent number of bytes, so use int64_t instead. It seems like we have a typedef from SSIZE_T for Windows, but it doesn't seem like we ever include "port/port.h" in our public header files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5638
      
      Differential Revision: D16526269
      
      Pulled By: lth
      
      fbshipit-source-id: 8d3a5c41003951b74b29bc5f1d949b2b22da0cee
      80d7067c
    • L
      Reduce the number of random iterations in compact_on_deletion_collector_test (#5635) · 3f89af1c
      Levi Tamasi 提交于
      Summary:
      This test frequently times out under TSAN; reducing the number of random
      iterations to make it complete faster.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5635
      
      Test Plan: buck test mode/dev-tsan internal_repo_rocksdb/repo:compact_on_deletion_collector_test
      
      Differential Revision: D16523505
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 6a69909bce9d204c891150fcb3d536547b3253d0
      3f89af1c
    • H
      Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610) · 70c7302f
      haoyuhuang 提交于
      Summary:
      This PR implements cache eviction using reinforcement learning. It includes two implementations:
      1. An implementation of Thompson Sampling for the Bernoulli Bandit [1].
      2. An implementation of LinUCB with disjoint linear models [2].
      
      The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss.
      Thompson Sampling is contextless and does not include any features.
      LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use.
      
      [1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070
      [2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610
      
      Differential Revision: D16435067
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb
      70c7302f
    • M
      WriteUnPrepared: Add new variable write_batch_flush_threshold (#5633) · 41df7348
      Manuel Ung 提交于
      Summary:
      Instead of reusing `TransactionOptions::max_write_batch_size` for determining when to flush a write batch for write unprepared, add a new variable called `write_batch_flush_threshold` for this use case instead.
      
      Also add `TransactionDBOptions::default_write_batch_flush_threshold` which sets the default value if `TransactionOptions::write_batch_flush_threshold` is unspecified.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5633
      
      Differential Revision: D16520364
      
      Pulled By: lth
      
      fbshipit-source-id: d75ae5a2141ce7708982d5069dc3f0b58d250e8c
      41df7348
    • L
      Parallelize db_bloom_filter_test (#5632) · 3617287e
      Levi Tamasi 提交于
      Summary:
      This test frequently times out under TSAN; parallelizing it should fix
      this issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5632
      
      Test Plan:
      make check
      buck test mode/dev-tsan internal_repo_rocksdb/repo:db_bloom_filter_test
      
      Differential Revision: D16519399
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 66e05a644d6f79c6d544255ffcf6de195d2d62fe
      3617287e
    • M
      Fix PopSavePoint to merge info into the previous savepoint (#5628) · 230b909d
      Manuel Ung 提交于
      Summary:
      Transaction::RollbackToSavePoint undos the modification made since the SavePoint beginning, and also unlocks the corresponding keys, which are tracked in the last SavePoint. Currently ::PopSavePoint simply discard these tracked keys, leaving them locked in the lock manager. This breaks a subsequent ::RollbackToSavePoint behavior as it loses track of such keys, and thus cannot unlock them. The patch fixes ::PopSavePoint by passing on the track key information to the previous SavePoint.
      Fixes https://github.com/facebook/rocksdb/issues/5618
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5628
      
      Differential Revision: D16505325
      
      Pulled By: lth
      
      fbshipit-source-id: 2bc3b30963ab4d36d996d1f66543c93abf358980
      230b909d
    • Y
      Fix target 'clean' to include parallel test binaries (#5629) · 74782cec
      Yanqin Jin 提交于
      Summary:
      current `clean` target in Makefile does not remove parallel test
      binaries. Fix this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5629
      
      Test Plan:
      (on devserver)
      Take file_reader_writer_test for instance.
      ```
      $make -j32 file_reader_writer_test
      $make clean
      ```
      Verify that binary file 'file_reader_writer_test' is delete by `make clean`.
      
      Differential Revision: D16513176
      
      Pulled By: riversand963
      
      fbshipit-source-id: 70acb9f56c928a494964121b86aacc0090f31ff6
      74782cec
  2. 26 7月, 2019 3 次提交
    • E
      Added SizeApproximationOptions to DB::GetApproximateSizes (#5626) · 9625a2bc
      Eli Pozniansky 提交于
      Summary:
      The new DB::GetApproximateSizes with SizeApproximationOptions argument, which allows to add more options/knobs to the DB::GetApproximateSizes call (beyond only the include_flags)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5626
      
      Differential Revision: D16496913
      
      Pulled By: elipoz
      
      fbshipit-source-id: ee8c6c182330a285fa056ecfc3905a592b451720
      9625a2bc
    • Y
      Avoid user key copying for Get/Put/Write with user-timestamp (#5502) · ae152ee6
      Yanqin Jin 提交于
      Summary:
      In previous https://github.com/facebook/rocksdb/issues/5079, we added user-specified timestamp to `DB::Get()` and `DB::Put()`. Limitation is that these two functions may cause extra memory allocation and key copy. The reason is that `WriteBatch` does not allocate extra memory for timestamps because it is not aware of timestamp size, and we did not provide an API to assign/update timestamp of each key within a `WriteBatch`.
      We address these issues in this PR by doing the following.
      1. Add a `timestamp_size_` to `WriteBatch` so that `WriteBatch` can take timestamps into account when calling `WriteBatch::Put`, `WriteBatch::Delete`, etc.
      2. Add APIs `WriteBatch::AssignTimestamp` and `WriteBatch::AssignTimestamps` so that application can assign/update timestamps for each key in a `WriteBatch`.
      3. Avoid key copy in `GetImpl` by adding new constructor to `LookupKey`.
      
      Test plan (on devserver):
      ```
      $make clean && COMPILE_WITH_ASAN=1 make -j32 all
      $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/*
      $make check
      ```
      If the API extension looks good, I will add more unit tests.
      
      Some simple benchmark using db_bench.
      ```
      $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000
      $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -disable_wal=true
      ```
      Master is at a78503bd.
      ```
      |        | readrandom | fillrandom |
      | master | 15.53 MB/s | 25.97 MB/s |
      | PR5502 | 16.70 MB/s | 25.80 MB/s |
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5502
      
      Differential Revision: D16340894
      
      Pulled By: riversand963
      
      fbshipit-source-id: 51132cf792be07d1efc3ac33f5768c4ee2608bb8
      ae152ee6
    • C
      rocksdb: build on macosx · 0d16fad5
      Chad Austin 提交于
      Summary:
      Make rocksdb build on macos:
      1) Reorganize OS-specific flags and deps in rocksdb/src/TARGETS
      2) Sandbox fbcode apple platform builds from repo root include path (which conflicts
          with layout of rocksdb headers).
      3) Fix dep-translation for bzip2.
      
      Reviewed By: andrewjcg
      
      Differential Revision: D15125826
      
      fbshipit-source-id: 8e143c689b88b5727e54881a5e80500f879a320b
      0d16fad5
  3. 25 7月, 2019 4 次提交
    • M
      Declare snapshot refresh incompatible with delete range (#5625) · d9dc6b46
      Maysam Yabandeh 提交于
      Summary:
      The ::snap_refresh_nanos option is incompatible with DeleteRange feature. Currently the code relies on range_del_agg.IsEmpty() to disable it if there are range delete tombstones. However ::IsEmpty does not guarantee that there is no RangeDelete tombstones in the SST files. The patch declares the two features incompatible in inline comments until we later figure how to properly detect the presence of RangeDelete tombstones in compaction inputs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5625
      
      Differential Revision: D16468218
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: bd7beca278bc7e1db75e7ee4522d05a3a6ca86f4
      d9dc6b46
    • S
      Auto Roll Logger to add some extra checking to avoid segfault. (#5623) · 7260347f
      sdong 提交于
      Summary:
      AutoRollLogger sets GetStatus() to be non-OK if the log file fails to be created and logger_ is set to null. It is left to the caller to check the status before calling function to this class. There is no harm to create another null checking to logger_ before we using it, so that in case users mis-use the logger, they don't get a segfault.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5623
      
      Test Plan: Run all existing tests.
      
      Differential Revision: D16466251
      
      fbshipit-source-id: 262b885eec28bf741d91e9191c3cb5ff964e1bce
      7260347f
    • S
      Fix regression bug of Auto rolling logger when handling failures (#5622) · 5daa426a
      sdong 提交于
      Summary:
      Auto roll logger fails to handle file creation error in the correct way, which may expose to seg fault condition to users. Fix it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5622
      
      Test Plan: Add a unit test on creating file under a non-existing directory. The test fails without the fix.
      
      Differential Revision: D16460853
      
      fbshipit-source-id: e96da4bef4f16db171ea04a11b2ec5a9448ddbde
      5daa426a
    • M
      Simplify WriteUnpreparedTxnReadCallback and fix some comments (#5621) · 66b524a9
      Manuel Ung 提交于
      Summary:
      Simplify WriteUnpreparedTxnReadCallback so we just have one function `CalcMaxVisibleSeq`. Also, there's no need for the read callback to hold onto the transaction any more, so just hold the set of unprep_seqs, reducing about of indirection in `IsVisibleFullCheck`.
      
      Also, some comments about using transaction snapshot were out of date, so remove them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5621
      
      Differential Revision: D16459883
      
      Pulled By: lth
      
      fbshipit-source-id: cd581323fd18982e817d99af57b6eaba59e599bb
      66b524a9
  4. 24 7月, 2019 6 次提交
    • S
      Fix wrong info log printing for num_range_deletions (#5617) · f5b951f7
      sdong 提交于
      Summary:
      num_range_deletions printing is wrong in this log line:
      
      2019/07/18-12:59:15.309271 7f869f9ff700 EVENT_LOG_v1 {"time_micros": 1563479955309228, "cf_name": "5", "job": 955, "event": "table_file_creation", "file_number": 34579, "file_size": 2239842, "table_properties": {"data_size": 1988792, "index_size": 3067, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 1, "filter_size": 170821, "raw_key_size": 1951792, "raw_average_key_size": 16, "raw_value_size": 1731720, "raw_average_value_size": 14, "num_data_blocks": 199, "num_entries": 121987, "num_deletions": 15184, "num_merge_operands": 86512, "num_range_deletions": 86512, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "5", "column_family_id": 5, "comparator": "leveldb.BytewiseComparator", "merge_operator": "PutOperator", "prefix_extractor_name": "rocksdb.FixedPrefix.7", "property_collectors": "[]", "compression": "ZSTD", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1563479951, "oldest_key_time": 0, "file_creation_time": 1563479954}}
      
      It actually prints "num_merge_operands" number. Fix it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5617
      
      Test Plan: Just build.
      
      Differential Revision: D16453110
      
      fbshipit-source-id: fc1024b3cd5650312ed47a1379f0d2cf8b2d8a8f
      f5b951f7
    • M
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293) · cfcf045a
      Mark Rambacher 提交于
      Summary:
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.  Objects are registered with the registry by Type (the class must implement the static const char *Type() method).
      
      This change is necessary for a few reasons:
      - By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
      - By having a class with instances, different units could have different objects registered.  This could be useful if, for example, one Option allowed for a dynamic library and one did not.
      
      When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.
      
      Test plan (on riversand963's  devserver)
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
      ```
      All tests pass.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293
      
      Differential Revision: D16363396
      
      Pulled By: riversand963
      
      fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
      cfcf045a
    • L
      Move the uncompression dictionary object out of the block cache (#5584) · 092f4170
      Levi Tamasi 提交于
      Summary:
      RocksDB has historically stored uncompression dictionary objects in the block
      cache as opposed to storing just the block contents. This neccesitated
      evicting the object upon table close. With the new code, only the raw blocks
      are stored in the cache, eliminating the need for eviction.
      
      In addition, the patch makes the following improvements:
      
      1) Compression dictionary blocks are now prefetched/pinned similarly to
      index/filter blocks.
      2) A copy operation got eliminated when the uncompression dictionary is
      retrieved.
      3) Errors related to retrieving the uncompression dictionary are propagated as
      opposed to silently ignored.
      
      Note: the patch temporarily breaks the compression dictionary evicition stats.
      They will be fixed in a separate phase.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5584
      
      Test Plan: make asan_check
      
      Differential Revision: D16344151
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2962b295f5b19628f9da88a3fcebbce5a5017a7b
      092f4170
    • E
      Improve CPU Efficiency of ApproximateSize (part 1) (#5613) · 6b7fcc0d
      Eli Pozniansky 提交于
      Summary:
      1. Avoid creating the iterator in order to call BlockBasedTable::ApproximateOffsetOf(). Instead, directly call into it.
      2. Optimize BlockBasedTable::ApproximateOffsetOf() keeps the index block iterator in stack.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5613
      
      Differential Revision: D16442660
      
      Pulled By: elipoz
      
      fbshipit-source-id: 9320be3e918c139b10e758cbbb684706d172e516
      6b7fcc0d
    • S
      ldb sometimes specify a string-append merge operator (#5607) · 3782accf
      sdong 提交于
      Summary:
      Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607
      
      Test Plan: Run ldb against a DB with merge operands and see the outputs.
      
      Differential Revision: D16442634
      
      fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
      3782accf
    • A
      Parallelize file_reader_writer_test in order to reduce timeouts · 112702ac
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5608
      
      Test Plan:
      make check
      buck test mode/dev-tsan internal_repo_rocksdb/repo:file_reader_writer_test -- --run-disabled
      
      Differential Revision: D16441796
      
      Pulled By: anand1976
      
      fbshipit-source-id: afbb88a9fcb1c0ba22215118767e8eab3d1d6a4a
      112702ac
  5. 23 7月, 2019 5 次提交
    • M
      WriteUnPrepared: improve read your own write functionality (#5573) · eae83274
      Manuel Ung 提交于
      Summary:
      There are a number of fixes in this PR (with most bugs found via the added stress tests):
      1. Re-enable reseek optimization. This was initially disabled to avoid infinite loops in https://github.com/facebook/rocksdb/pull/3955 but this can be resolved by remembering not to reseek after a reseek has already been done. This problem only affects forward iteration in `DBIter::FindNextUserEntryInternal`, as we already disable reseeking in `DBIter::FindValueForCurrentKeyUsingSeek`.
      2. Verify that ReadOption.snapshot can be safely used for iterator creation. Some snapshots would not give correct results because snaphsot validation would not be enforced, breaking some assumptions in Prev() iteration.
      3. In the non-snapshot Get() case, reads done at `LastPublishedSequence` may not be enough, because unprepared sequence numbers are not published. Use `std::max(published_seq, max_visible_seq)` to do lookups instead.
      4. Add stress test to test reading own writes.
      5. Minor bug in the allow_concurrent_memtable_write case where we forgot to pass in batch_per_txn_.
      6. Minor performance optimization in `CalcMaxUnpreparedSequenceNumber` by assigning by reference instead of value.
      7. Add some more comments everywhere.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5573
      
      Differential Revision: D16276089
      
      Pulled By: lth
      
      fbshipit-source-id: 18029c944eb427a90a87dee76ac1b23f37ec1ccb
      eae83274
    • M
      Disable refresh snapshot feature by default (#5606) · 327c4807
      Maysam Yabandeh 提交于
      Summary:
      There are concerns about the correctness of this patch. Disabling by default until the concerns are resolved.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5606
      
      Differential Revision: D16428064
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a89280f0ea85796c9c9dfbfd9a8e91dad9b000b3
      327c4807
    • S
      row_cache to share entry for recent snapshots (#5600) · 66b5613d
      sdong 提交于
      Summary:
      Right now, users cannot take advantage of row cache, unless no snapshot is used, or Get() is repeated for the same snapshots. This limits the usage of row cache.
      This change eliminate this restriction in some cases. If the snapshot used is newer than the largest sequence number in the file, and write callback function is not registered, the same row cache key is used as no snapshot is given. We still need the callback function restriction for now because the callback function may filter out different keys for different snapshots even if the snapshots are new.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5600
      
      Test Plan: Add a unit test.
      
      Differential Revision: D16386616
      
      fbshipit-source-id: 6b7d214bd215d191b03ccf55926ad4b703ec2e53
      66b5613d
    • H
      Block cache analyzer: Compute correlation of features and human readable trace file. (#5596) · 37784700
      haoyuhuang 提交于
      Summary:
      - Compute correlation between a few features and predictions, e.g., number of accesses since the last access vs number of accesses till the next access on a block.
      - Output human readable trace file so python can consume it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5596
      
      Test Plan: make clean && USE_CLANG=1 make check -j32
      
      Differential Revision: D16373200
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: c848d26bc2e9210461f317d7dbee42d55be5a0cc
      37784700
    • Y
      Temporarily disable snapshot list refresh for atomic flush stress test (#5581) · a78503bd
      Yanqin Jin 提交于
      Summary:
      Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after
      which the same error occurred much less frequently. However it still occur
      occasionally. Not sure what the root cause is. This PR disables the feature of
      snapshot list refresh, and we should keep an eye on the failure in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581
      
      Differential Revision: D16295985
      
      Pulled By: riversand963
      
      fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4
      a78503bd
  6. 20 7月, 2019 4 次提交
  7. 19 7月, 2019 2 次提交
  8. 18 7月, 2019 5 次提交
    • A
      Fix LITE mode build failure · ec2b996b
      anand76 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5588
      
      Test Plan: make LITE=1 all check
      
      Differential Revision: D16354543
      
      Pulled By: anand1976
      
      fbshipit-source-id: 327a171439e183ac3a5e5057c511d6bca445e97d
      ec2b996b
    • E
      Fix for ReadaheadSequentialFile crash in ldb_cmd_test (#5586) · 9f5cfb8e
      Eli Pozniansky 提交于
      Summary:
      Fixing a corner case crash when there was no data read from file, but status is still OK
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5586
      
      Differential Revision: D16348117
      
      Pulled By: elipoz
      
      fbshipit-source-id: f97973308024f020d8be79ca3c56466b84d80656
      9f5cfb8e
    • H
      Block access tracing: Trace referenced key for Get on non-data blocks. (#5548) · 8a008d41
      haoyuhuang 提交于
      Summary:
      This PR traces the referenced key for Get for all types of blocks. This is useful when evaluating hybrid row-block caches.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5548
      
      Test Plan: make clean && USE_CLANG=1 make check -j32
      
      Differential Revision: D16157979
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: f6327411c9deb74e35e22a35f66cdbae09ab9d87
      8a008d41
    • V
      Export Import sst files (#5495) · 22ce4624
      Venki Pallipadi 提交于
      Summary:
      Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135
      
      This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469
      "Add support for taking snapshot of a column family and creating column family from a given CF snapshot"
      
      We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.
      
      (1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
      // Exports all live SST files of a specified Column Family onto export_dir,
      // returning SST files information in metadata.
      // - SST files will be created as hard links when the directory specified
      //   is in the same partition as the db directory, copied otherwise.
      // - export_dir should not already exist and will be created by this API.
      // - Always triggers a flush.
      virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                        const std::string& export_dir,
                                        ExportImportFilesMetaData** metadata);
      
      Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
      metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
      returning the list of file metadata.
      
      (2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
      // CreateColumnFamilyWithImport() will create a new column family with
      // column_family_name and import external SST files specified in metadata into
      // this column family.
      // (1) External SST files can be created using SstFileWriter.
      // (2) External SST files can be exported from a particular column family in
      //     an existing DB.
      // Option in import_options specifies whether the external files are copied or
      // moved (default is copy). When option specifies copy, managing files at
      // external_file_path is caller's responsibility. When option specifies a
      // move, the call ensures that the specified files at external_file_path are
      // deleted on successful return and files are not modified on any error
      // return.
      // On error return, column family handle returned will be nullptr.
      // ColumnFamily will be present on successful return and will not be present
      // on error return. ColumnFamily may be present on any crash during this call.
      virtual Status CreateColumnFamilyWithImport(
          const ColumnFamilyOptions& options, const std::string& column_family_name,
          const ImportColumnFamilyOptions& import_options,
          const ExportImportFilesMetaData& metadata,
          ColumnFamilyHandle** handle);
      
      Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.
      
      If incoming sequence number is higher than current local sequence number, local sequence
      number is updated to reflect this.
      
      Note, as the sst files is are being moved across Column Families, Column Family name in sst file
      will no longer match the actual column family on destination DB. The API does not modify Column
      Family name or id in the sst files being imported.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495
      
      Differential Revision: D16018881
      
      fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
      22ce4624
    • Y
      Arm64 CRC32 parallel computation optimization for RocksDB (#5494) · a3c1832e
      Yuqi Gu 提交于
      Summary:
      Crc32c Parallel computation optimization:
      Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
       Input data is divided into three equal-sized blocks
      Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes
      One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes
      
      1. crc32c_test:
      ```
      [==========] Running 4 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 4 tests from CRC
      [ RUN      ] CRC.StandardResults
      [       OK ] CRC.StandardResults (1 ms)
      [ RUN      ] CRC.Values
      [       OK ] CRC.Values (0 ms)
      [ RUN      ] CRC.Extend
      [       OK ] CRC.Extend (0 ms)
      [ RUN      ] CRC.Mask
      [       OK ] CRC.Mask (0 ms)
      [----------] 4 tests from CRC (1 ms total)
      
      [----------] Global test environment tear-down
      [==========] 4 tests from 1 test case ran. (1 ms total)
      [  PASSED  ] 4 tests.
      ```
      
      2. RocksDB benchmark: db_bench --benchmarks="crc32c"
      
      ```
      Linear Arm crc32c:
        crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op)
      ```
      
      ```
      Parallel optimization with Armv8 crypto extension:
        crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op)
      ```
      
      It gets ~2.4x speedup compared to linear Arm crc32c instructions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494
      
      Differential Revision: D16340806
      
      fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945
      a3c1832e
  9. 17 7月, 2019 4 次提交