1. 05 9月, 2019 1 次提交
    • E
      fix checking the '-march' flag (#5766) · 3f2723a8
      ENDOH takanao 提交于
      Summary:
      Hi! guys,
      
      I got errors on the ARM machine.
      
      before:
      
      ```console
      $ make static_lib
      ...
      g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto'
      g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native
      ```
      
      Thanks!
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766
      
      Differential Revision: D17191117
      
      fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868
      3f2723a8
  2. 23 8月, 2019 1 次提交
    • S
      Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729) · d8a27d93
      sdong 提交于
      Summary:
      AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
      Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729
      
      Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.
      
      Differential Revision: D16969791
      
      fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
      d8a27d93
  3. 17 8月, 2019 2 次提交
  4. 15 8月, 2019 1 次提交
    • A
      Fix TSAN failures in DistributedMutex tests (#5684) · 77273d41
      Aaryaman Sagar 提交于
      Summary:
      TSAN was not able to correctly instrument atomic bts and btr instructions, so
      when TSAN is enabled implement those with std::atomic::fetch_or and
      std::atomic::fetch_and. Also disable tests that fail on TSAN with false
      negatives (we know these are false negatives because this other verifiably
      correct program fails with the same TSAN error <link>)
      
      ```
      make clean
      TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test
      ```
      
      This is the code that fails with the same false-negative with TSAN
      ```
      namespace {
      class ExceptionWithConstructionTrack : public std::exception {
       public:
        explicit ExceptionWithConstructionTrack(int id)
            : id_{folly::to<std::string>(id)}, constructionTrack_{id} {}
      
        const char* what() const noexcept override {
          return id_.c_str();
        }
      
       private:
        std::string id_;
        TestConstruction constructionTrack_;
      };
      
      template <typename Storage, typename Atomic>
      void transferCurrentException(Storage& storage, Atomic& produced) {
        assert(std::current_exception());
        new (&storage) std::exception_ptr(std::current_exception());
        produced->store(true, std::memory_order_release);
      }
      
      void concurrentExceptionPropagationStress(
          int numThreads,
          std::chrono::milliseconds milliseconds) {
        auto&& stop = std::atomic<bool>{false};
        auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{};
        auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{};
        auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{};
        auto&& consumers = std::vector<std::thread>{};
        for (auto i = 0; i < numThreads; ++i) {
          produced.emplace_back(new std::atomic<bool>{false});
          consumed.emplace_back(new std::atomic<bool>{false});
          exceptions.push_back({});
        }
      
        auto producer = std::thread{[&]() {
          auto counter = std::vector<int>(numThreads, 0);
          for (auto i = 0; true; i = ((i + 1) % numThreads)) {
            try {
              throw ExceptionWithConstructionTrack{counter.at(i)++};
            } catch (...) {
              transferCurrentException(exceptions.at(i), produced.at(i));
            }
      
            while (!consumed.at(i)->load(std::memory_order_acquire)) {
              if (stop.load(std::memory_order_acquire)) {
                return;
              }
            }
      
            consumed.at(i)->store(false, std::memory_order_release);
          }
        }};
      
        for (auto i = 0; i < numThreads; ++i) {
          consumers.emplace_back([&, i]() {
            auto counter = 0;
            while (true) {
              while (!produced.at(i)->load(std::memory_order_acquire)) {
                if (stop.load(std::memory_order_acquire)) {
                  return;
                }
              }
              produced.at(i)->store(false, std::memory_order_release);
      
              try {
                auto storage = &exceptions.at(i);
                auto exc = folly::launder(
                  reinterpret_cast<std::exception_ptr*>(storage));
                auto copy = std::move(*exc);
                exc->std::exception_ptr::~exception_ptr();
                std::rethrow_exception(std::move(copy));
              } catch (std::exception& exc) {
                auto value = std::stoi(exc.what());
                EXPECT_EQ(value, counter++);
              }
      
              consumed.at(i)->store(true, std::memory_order_release);
            }
          });
        }
      
        std::this_thread::sleep_for(milliseconds);
        stop.store(true);
        producer.join();
        for (auto& thread : consumers) {
          thread.join();
        }
      }
      } // namespace
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5684
      
      Differential Revision: D16746077
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d
      77273d41
  5. 08 8月, 2019 1 次提交
    • A
      Port folly/synchronization/DistributedMutex to rocksdb (#5642) · 38b03c84
      Aaryaman Sagar 提交于
      Summary:
      This ports `folly::DistributedMutex` into RocksDB. The PR includes everything else needed to compile and use DistributedMutex as a component within folly. Most files are unchanged except for some portability stuff and includes.
      
      For now, I've put this under `rocksdb/third-party`, but if there is a better folder to put this under, let me know. I also am not sure how or where to put unit tests for third-party stuff like this. It seems like gtest is included already, but I need to link with it from another third-party folder.
      
      This also includes some other common components from folly
      
      - folly/Optional
      - folly/ScopeGuard (In particular `SCOPE_EXIT`)
      - folly/synchronization/ParkingLot (A portable futex-like interface)
      - folly/synchronization/AtomicNotification (The standard C++ interface for futexes)
      - folly/Indestructible (For singletons that don't get destroyed without allocations)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5642
      
      Differential Revision: D16544439
      
      fbshipit-source-id: 179b98b5dcddc3075926d31a30f92fd064245731
      38b03c84
  6. 07 8月, 2019 1 次提交
    • V
      New API to get all merge operands for a Key (#5604) · d150e014
      Vijay Nadimpalli 提交于
      Summary:
      This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
      1. Update subset of columns and read subset of columns -
      Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
      2. Updating very few attributes in a value which is a JSON-like document -
      Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
      ----------------------------------------------------------------------------------------------------
      API :
      Status GetMergeOperands(
            const ReadOptions& options, ColumnFamilyHandle* column_family,
            const Slice& key, PinnableSlice* merge_operands,
            GetMergeOperandsOptions* get_merge_operands_options,
            int* number_of_operands)
      
      Example usage :
      int size = 100;
      int number_of_operands = 0;
      std::vector<PinnableSlice> values(size);
      GetMergeOperandsOptions merge_operands_info;
      db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);
      
      Description :
      Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
      merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604
      
      Test Plan:
      Added unit test and perf test in db_bench that can be run using the command:
      ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist
      
      Differential Revision: D16657366
      
      Pulled By: vjnadimpalli
      
      fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
      d150e014
  7. 06 8月, 2019 1 次提交
    • Y
      Fix make target 'all' and 'check' (#5672) · b1a02ffe
      Yanqin Jin 提交于
      Summary:
      If a test is one of parallel tests, then it should also be one of the 'tests'.
      Otherwise, `make all` won't build the binaries. For examle,
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all
      ```
      Then if you do
      ```
      $make check
      ```
      The second command will invoke the compilation and building for db_bloom_test
      and file_reader_writer_test **without** the `COMPILE_WITH_ASAN=1`, causing the
      command to fail.
      
      Test plan (on devserver):
      ```
      $make -j32 all
      ```
      Verify all binaries are built so that `make check` won't have to compile any
      thing.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5672
      
      Differential Revision: D16655834
      
      Pulled By: riversand963
      
      fbshipit-source-id: 050131412b5313496f85ae3deeeeb8d28af75746
      b1a02ffe
  8. 27 7月, 2019 3 次提交
    • H
      Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610) · 70c7302f
      haoyuhuang 提交于
      Summary:
      This PR implements cache eviction using reinforcement learning. It includes two implementations:
      1. An implementation of Thompson Sampling for the Bernoulli Bandit [1].
      2. An implementation of LinUCB with disjoint linear models [2].
      
      The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss.
      Thompson Sampling is contextless and does not include any features.
      LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use.
      
      [1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070
      [2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610
      
      Differential Revision: D16435067
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb
      70c7302f
    • L
      Parallelize db_bloom_filter_test (#5632) · 3617287e
      Levi Tamasi 提交于
      Summary:
      This test frequently times out under TSAN; parallelizing it should fix
      this issue.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5632
      
      Test Plan:
      make check
      buck test mode/dev-tsan internal_repo_rocksdb/repo:db_bloom_filter_test
      
      Differential Revision: D16519399
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 66e05a644d6f79c6d544255ffcf6de195d2d62fe
      3617287e
    • Y
      Fix target 'clean' to include parallel test binaries (#5629) · 74782cec
      Yanqin Jin 提交于
      Summary:
      current `clean` target in Makefile does not remove parallel test
      binaries. Fix this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5629
      
      Test Plan:
      (on devserver)
      Take file_reader_writer_test for instance.
      ```
      $make -j32 file_reader_writer_test
      $make clean
      ```
      Verify that binary file 'file_reader_writer_test' is delete by `make clean`.
      
      Differential Revision: D16513176
      
      Pulled By: riversand963
      
      fbshipit-source-id: 70acb9f56c928a494964121b86aacc0090f31ff6
      74782cec
  9. 24 7月, 2019 1 次提交
  10. 18 7月, 2019 2 次提交
    • V
      Export Import sst files (#5495) · 22ce4624
      Venki Pallipadi 提交于
      Summary:
      Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135
      
      This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469
      "Add support for taking snapshot of a column family and creating column family from a given CF snapshot"
      
      We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.
      
      (1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
      // Exports all live SST files of a specified Column Family onto export_dir,
      // returning SST files information in metadata.
      // - SST files will be created as hard links when the directory specified
      //   is in the same partition as the db directory, copied otherwise.
      // - export_dir should not already exist and will be created by this API.
      // - Always triggers a flush.
      virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                        const std::string& export_dir,
                                        ExportImportFilesMetaData** metadata);
      
      Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
      metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
      returning the list of file metadata.
      
      (2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
      // CreateColumnFamilyWithImport() will create a new column family with
      // column_family_name and import external SST files specified in metadata into
      // this column family.
      // (1) External SST files can be created using SstFileWriter.
      // (2) External SST files can be exported from a particular column family in
      //     an existing DB.
      // Option in import_options specifies whether the external files are copied or
      // moved (default is copy). When option specifies copy, managing files at
      // external_file_path is caller's responsibility. When option specifies a
      // move, the call ensures that the specified files at external_file_path are
      // deleted on successful return and files are not modified on any error
      // return.
      // On error return, column family handle returned will be nullptr.
      // ColumnFamily will be present on successful return and will not be present
      // on error return. ColumnFamily may be present on any crash during this call.
      virtual Status CreateColumnFamilyWithImport(
          const ColumnFamilyOptions& options, const std::string& column_family_name,
          const ImportColumnFamilyOptions& import_options,
          const ExportImportFilesMetaData& metadata,
          ColumnFamilyHandle** handle);
      
      Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.
      
      If incoming sequence number is higher than current local sequence number, local sequence
      number is updated to reflect this.
      
      Note, as the sst files is are being moved across Column Families, Column Family name in sst file
      will no longer match the actual column family on destination DB. The API does not modify Column
      Family name or id in the sst files being imported.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495
      
      Differential Revision: D16018881
      
      fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
      22ce4624
    • Y
      Arm64 CRC32 parallel computation optimization for RocksDB (#5494) · a3c1832e
      Yuqi Gu 提交于
      Summary:
      Crc32c Parallel computation optimization:
      Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
       Input data is divided into three equal-sized blocks
      Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes
      One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes
      
      1. crc32c_test:
      ```
      [==========] Running 4 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 4 tests from CRC
      [ RUN      ] CRC.StandardResults
      [       OK ] CRC.StandardResults (1 ms)
      [ RUN      ] CRC.Values
      [       OK ] CRC.Values (0 ms)
      [ RUN      ] CRC.Extend
      [       OK ] CRC.Extend (0 ms)
      [ RUN      ] CRC.Mask
      [       OK ] CRC.Mask (0 ms)
      [----------] 4 tests from CRC (1 ms total)
      
      [----------] Global test environment tear-down
      [==========] 4 tests from 1 test case ran. (1 ms total)
      [  PASSED  ] 4 tests.
      ```
      
      2. RocksDB benchmark: db_bench --benchmarks="crc32c"
      
      ```
      Linear Arm crc32c:
        crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op)
      ```
      
      ```
      Parallel optimization with Armv8 crypto extension:
        crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op)
      ```
      
      It gets ~2.4x speedup compared to linear Arm crc32c instructions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494
      
      Differential Revision: D16340806
      
      fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945
      a3c1832e
  11. 12 7月, 2019 1 次提交
  12. 10 7月, 2019 1 次提交
  13. 19 6月, 2019 1 次提交
  14. 18 6月, 2019 2 次提交
    • H
      Support computing miss ratio curves using sim_cache. (#5449) · 2d1dd5bc
      haoyuhuang 提交于
      Summary:
      This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G".
      
      When we replay the trace, we also perform lookups and inserts on the simulated caches.
      In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file.
      
      This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449
      
      Test Plan:
      Added tests for block_cache_trace_analyzer.
      COMPILE_WITH_ASAN=1 make check -j32.
      
      Differential Revision: D15797073
      
      Pulled By: HaoyuHuang
      
      fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408
      2d1dd5bc
    • Z
      Persistent Stats: persist stats history to disk (#5046) · 671d15cb
      Zhongyi Xie 提交于
      Summary:
      This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and  both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046
      
      Differential Revision: D15863138
      
      Pulled By: miasantreble
      
      fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
      671d15cb
  15. 14 6月, 2019 1 次提交
  16. 12 6月, 2019 1 次提交
  17. 07 6月, 2019 1 次提交
  18. 01 6月, 2019 2 次提交
  19. 31 5月, 2019 3 次提交
  20. 30 5月, 2019 1 次提交
  21. 03 5月, 2019 1 次提交
  22. 01 5月, 2019 1 次提交
  23. 27 3月, 2019 1 次提交
    • Y
      Support for single-primary, multi-secondary instances (#4899) · 9358178e
      Yanqin Jin 提交于
      Summary:
      This PR allows RocksDB to run in single-primary, multi-secondary process mode.
      The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
      Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
      
      This PR has several components:
      1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
      
      2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
      
      3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
      3.1 Tail the primary's MANIFEST during recovery.
      3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
      3.3 Tailing WAL will be in a future PR.
      
      4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
      
      Differential Revision: D14510945
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
      9358178e
  24. 23 2月, 2019 1 次提交
  25. 20 2月, 2019 1 次提交
  26. 14 2月, 2019 1 次提交
  27. 29 1月, 2019 1 次提交
    • Y
      Change the command to invoke parallel tests (#4922) · 95604d13
      Yanqin Jin 提交于
      Summary:
      We used to call `printf $(t_run)` and later feed the result to GNU parallel in the recipe of target `check_0`. However, this approach is problematic when the length of $(t_run) exceeds the
      maximum length of a command and the `printf` command cannot be executed. Instead we use 'find -print' to avoid generating an overly long command.
      
      **This PR is actually the last commit of #4916. Prefer to merge this PR separately.**
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4922
      
      Differential Revision: D13845883
      
      Pulled By: riversand963
      
      fbshipit-source-id: b56de7f7af43337c6ec89b931de843c9667cb679
      95604d13
  28. 25 1月, 2019 1 次提交
  29. 24 1月, 2019 1 次提交
  30. 12 1月, 2019 1 次提交
  31. 11 1月, 2019 1 次提交
  32. 20 12月, 2018 1 次提交