1. 10 7月, 2019 1 次提交
  2. 01 7月, 2019 1 次提交
    • A
      MultiGet parallel IO (#5464) · 7259e28d
      anand76 提交于
      Summary:
      Enhancement to MultiGet batching to read data blocks required for keys in a batch in parallel from disk. It uses Env::MultiRead() API to read multiple blocks and reduce latency.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5464
      
      Test Plan:
      1. make check
      2. make asan_check
      3. make asan_crash
      
      Differential Revision: D15911771
      
      Pulled By: anand1976
      
      fbshipit-source-id: 605036b9af0f90ca0020dc87c3a86b4da6e83394
      7259e28d
  3. 20 6月, 2019 1 次提交
  4. 31 5月, 2019 2 次提交
  5. 25 5月, 2019 1 次提交
  6. 30 1月, 2019 1 次提交
  7. 17 1月, 2019 1 次提交
  8. 10 11月, 2018 1 次提交
    • S
      Update all unique/shared_ptr instances to be qualified with namespace std (#4638) · dc352807
      Sagar Vemuri 提交于
      Summary:
      Ran the following commands to recursively change all the files under RocksDB:
      ```
      find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
      ```
      Running `make format` updated some formatting on the files touched.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
      
      Differential Revision: D12934992
      
      Pulled By: sagar0
      
      fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
      dc352807
  9. 16 10月, 2018 1 次提交
  10. 13 10月, 2018 1 次提交
    • Y
      Add listener to sample file io (#3933) · 729a617b
      Yanqin Jin 提交于
      Summary:
      We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running.
      To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933
      
      Differential Revision: D10219571
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
      729a617b
  11. 24 8月, 2018 1 次提交
  12. 14 8月, 2018 1 次提交
    • Z
      RocksDB Trace Analyzer (#4091) · 999d955e
      Zhichao Cao 提交于
      Summary:
      A framework of trace analyzing for RocksDB
      
      After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
      **Input:**
      1. trace file
      2. Whole keys space file
      
      **Statistics:**
      1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
      2. Key hotness (access count) of each one
      3. Key space separation based on given prefix
      4. Key size distribution
      5. Value size distribution if appliable
      6. Top K accessed keys
      7. QPS statistics including the average QPS and peak QPS
      8. Top K accessed prefix
      9. The query correlation analyzing, output the number of X after Y and the corresponding average time
          intervals
      
      **Output:**
      1. key access heat map (either in the accessed key space or whole key space)
      2. trace sequence file (interpret the raw trace file to line base text file for future use)
      3. Time serial (The key space ID and its access time)
      4. Key access count distritbution
      5. Key size distribution
      6. Value size distribution (in each intervals)
      7. whole key space separation by the prefix
      8. Accessed key space separation by the prefix
      9. QPS of each operation and each column family
      10. Top K QPS and their accessed prefix range
      
      **Test:**
      1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
      2. Generated the trace and analyze the trace
      
      **Implemented but not tested (due to the limitation of trace_replay):**
      1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
      2. Analyzing the number of Key found by Get
      
      **Future Work:**
      1.  Support execution time analyzing of each requests
      2.  Support cache hit situation and block read situation of Get
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091
      
      Differential Revision: D9256157
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
      999d955e
  13. 21 7月, 2018 1 次提交
    • S
      BlockBasedTableReader: automatically adjust tail prefetch size (#4156) · 8425c8bd
      Siying Dong 提交于
      Summary:
      Right now we use one hard-coded prefetch size to prefetch data from the tail of the SST files. However, this may introduce a waste for some use cases, while not efficient for others.
      Introduce a way to adjust this prefetch size by tracking 32 recent times, and pick a value with which the wasted read is less than 10%
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4156
      
      Differential Revision: D8916847
      
      Pulled By: siying
      
      fbshipit-source-id: 8413f9eb3987e0033ed0bd910f83fc2eeaaf5758
      8425c8bd
  14. 12 7月, 2018 1 次提交
  15. 22 6月, 2018 1 次提交
    • S
      Improve direct IO range scan performance with readahead (#3884) · 7103559f
      Sagar Vemuri 提交于
      Summary:
      This PR extends the improvements in #3282 to also work when using Direct IO.
      We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
      
      **Description:**
      This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
      
      **Implementation Details:**
      - Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
      - `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
      - `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
      - Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
      - Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
      
      **Constraints:**
      - Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
      - Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
      - Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
      
      **Benchmarks:**
      I used the same benchmark as used in #3282.
      Data fill:
      ```
      TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
      ```
      
      Do a long range scan: Seekrandom with large number of nexts
      ```
      TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
      ```
      
      ```
      Before:
      seekrandom   :   37939.906 micros/op 26 ops/sec;   29.2 MB/s (1636 of 1999 found)
      With this change:
      seekrandom   :   8527.720 micros/op 117 ops/sec;  129.7 MB/s (6530 of 7999 found)
      ```
      ~4.5X perf improvement. Taken on an average of 3 runs.
      Closes https://github.com/facebook/rocksdb/pull/3884
      
      Differential Revision: D8082143
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
      7103559f
  16. 21 6月, 2018 1 次提交
  17. 15 5月, 2018 1 次提交
  18. 27 3月, 2018 1 次提交
  19. 04 11月, 2017 1 次提交
    • P
      util: Fix coverity issues · 4c8f3364
      Prashant D 提交于
      Summary:
      util/concurrent_arena.h:
      CID 1396145 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      2. uninit_member: Non-static class member free_begin_ is not initialized in this constructor nor in any functions that it calls.
       94    Shard() : allocated_and_unused_(0) {}
      
      util/dynamic_bloom.cc:
      	1. Condition hash_func == NULL, taking true branch.
      
      CID 1322821 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      3. uninit_member: Non-static class member data_ is not initialized in this constructor nor in any functions that it calls.
      47      hash_func_(hash_func == nullptr ? &BloomHash : hash_func) {}
      48
      
      util/file_reader_writer.h:
      204 private:
      205  AlignedBuffer buffer_;
         	member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_offset_.
      206  uint64_t buffer_offset_;
      
      CID 1418246 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_len_.
      207  size_t buffer_len_;
      208};
      
      util/thread_local.cc:
      341#endif
      
      CID 1322795 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
      3. uninit_member: Non-static class member pthread_key_ is not initialized in this constructor nor in any functions that it calls.
      342}
      
      40struct ThreadData {
         	2. uninit_member: Non-static class member next is not initialized in this constructor nor in any functions that it calls.
      
      CID 1400668 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
      4. uninit_member: Non-static class member prev is not initialized in this constructor nor in any functions that it calls.
       41  explicit ThreadData(ThreadLocalPtr::StaticMeta* _inst) : entries(), inst(_inst) {}
       42  std::vector<Entry> entries;
         	1. member_decl: Class member declaration for next.
       43  ThreadData* next;
         	3. member_decl: Class member declaration for prev.
       44  ThreadData* prev;
       45  ThreadLocalPtr::StaticMeta* inst;
       46};
      Closes https://github.com/facebook/rocksdb/pull/3123
      
      Differential Revision: D6233566
      
      Pulled By: sagar0
      
      fbshipit-source-id: aa2068790ea69787a0035c0db39d59b0c25108db
      4c8f3364
  20. 01 11月, 2017 1 次提交
  21. 12 8月, 2017 1 次提交
    • S
      Support prefetch last 512KB with direct I/O in block based file reader · 666a005f
      Siying Dong 提交于
      Summary:
      Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
      Closes https://github.com/facebook/rocksdb/pull/2708
      
      Differential Revision: D5593091
      
      Pulled By: siying
      
      fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
      666a005f
  22. 16 7月, 2017 1 次提交
  23. 29 6月, 2017 1 次提交
    • M
      Improve Status message for block checksum mismatches · 397ab111
      Mike Kolupaev 提交于
      Summary:
      We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.
      
      It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
      Closes https://github.com/facebook/rocksdb/pull/2507
      
      Differential Revision: D5345702
      
      Pulled By: al13n321
      
      fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
      397ab111
  24. 14 6月, 2017 1 次提交
  25. 13 6月, 2017 1 次提交
    • S
      Make direct I/O write use incremental buffer · 0175d58c
      Siying Dong 提交于
      Summary:
      Currently for direct I/O, the large maximum buffer is always allocated. This will be wasteful if users flush the data in much smaller chunks. This diff fix this by changing the behavior of incremental buffer works. When we enlarge buffer, we try to copy the existing data in the buffer to the enlarged buffer, rather than flush the buffer first. This can make sure that no extra I/O is introduced because of buffer enlargement.
      Closes https://github.com/facebook/rocksdb/pull/2403
      
      Differential Revision: D5178403
      
      Pulled By: siying
      
      fbshipit-source-id: a8fe1e7304bdb8cab2973340022fe80ff83449fd
      0175d58c
  26. 12 6月, 2017 1 次提交
  27. 11 5月, 2017 1 次提交
  28. 28 4月, 2017 1 次提交
  29. 27 4月, 2017 1 次提交
  30. 15 4月, 2017 1 次提交
  31. 15 3月, 2017 1 次提交
  32. 03 3月, 2017 1 次提交
    • A
      Statistic for how often rate limiter is drained · 7c80a6d7
      Andrew Kryczka 提交于
      Summary:
      This is the metric I plan to use for adaptive rate limiting. The statistics are updated only if the rate limiter is drained by flush or compaction. I believe (but am not certain) that this is the normal case.
      
      The Statistics object is passed in RateLimiter::Request() to avoid requiring changes to client code, which would've been necessary if we passed it in the RateLimiter constructor.
      Closes https://github.com/facebook/rocksdb/pull/1946
      
      Differential Revision: D4646489
      
      Pulled By: ajkr
      
      fbshipit-source-id: d8e0161
      7c80a6d7
  33. 17 2月, 2017 1 次提交
  34. 14 1月, 2017 1 次提交
  35. 12 1月, 2017 1 次提交
    • A
      direct reads refactor · dc2584ee
      Aaron Gao 提交于
      Summary:
      direct IO reads refactoring
      remove unnecessary classes and unified interfaces
      tested with db_bench
      
      need more change for options and ON/OFF for different files.
      Since disabled is default, it should be fine now
      Closes https://github.com/facebook/rocksdb/pull/1636
      
      Differential Revision: D4307189
      
      Pulled By: lightmark
      
      fbshipit-source-id: 6991e22
      dc2584ee
  36. 23 12月, 2016 1 次提交
    • A
      direct io write support · 972f96b3
      Aaron Gao 提交于
      Summary:
      rocksdb direct io support
      
      ```
      [gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 5.0
      Date:       Wed Nov 23 13:17:43 2016
      CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
      CPUCache:   25600 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Compression: Snappy
      Memtablerep: skip_list
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      DB path: [/tmp/rocksdbtest-112628/dbbench]
      fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s
      
      [gzh@dev11575.prn2 ~/roc
      Closes https://github.com/facebook/rocksdb/pull/1564
      
      Differential Revision: D4241093
      
      Pulled By: lightmark
      
      fbshipit-source-id: 98c29e3
      972f96b3
  37. 17 12月, 2016 1 次提交
  38. 06 8月, 2016 1 次提交
  39. 10 2月, 2016 1 次提交