1. 25 8月, 2018 5 次提交
  2. 24 8月, 2018 7 次提交
    • A
      Fix clang build of db_stress (#4312) · e7bb8e9b
      Andrew Kryczka 提交于
      Summary:
      Blame: #4307
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312
      
      Differential Revision: D9494093
      
      Pulled By: ajkr
      
      fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
      e7bb8e9b
    • A
      Digest ZSTD compression dictionary once per SST file (#4251) · 6c40806e
      Andrew Kryczka 提交于
      Summary:
      In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.
      
      ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.
      
      There are a couple other changes included in this PR:
      
      - Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
      - Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251
      
      Differential Revision: D9257078
      
      Pulled By: ajkr
      
      fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
      6c40806e
    • A
      Invoke OnTableFileCreated for empty SSTs (#4307) · ee234e83
      Andrew Kryczka 提交于
      Summary:
      The API comment on `OnTableFileCreationStarted` (https://github.com/facebook/rocksdb/blob/b6280d01f9f9c4305c536dfb804775fce3956280/include/rocksdb/listener.h#L331-L333) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.
      
      This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307
      
      Differential Revision: D9485201
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
      ee234e83
    • Z
      Add the unit test of Iterator to trace_analyzer_test (#4282) · cf7150ac
      zhichao-cao 提交于
      Summary:
      Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282
      
      Differential Revision: D9436758
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
      cf7150ac
    • G
      Adding a method for memtable class for memtable getting flushed. (#4304) · ad789e4e
      Gauresh Rane 提交于
      Summary:
      Memtables are selected for flushing by the flush job. Currently we
      have listener which is invoked when memtables for a column family are
      flushed. That listener does not indicate which memtable was flushed in
      the notification. If clients want to know if particular data in the
      memtable was retired, there is no straight forward way to know this.
      This method will help users who implement memtablerep factory and extend
      interface for memtablerep, to know if the data in the memtable was
      retired.
      Another option that was tried, was to depend on memtable destructor to
      be called after flush to mark that data was persisted. This works all
      the time but sometimes there can huge delays between actual flush
      happening and memtable getting destroyed. Hence, if anyone who is
      waiting for data to persist will have to wait that longer.
      It is expected that anyone who is implementing this method to have
      return quickly as it blocks RocksDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304
      
      Reviewed By: riversand963
      
      Differential Revision: D9472312
      
      Pulled By: gdrane
      
      fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d
      ad789e4e
    • F
      DataBlockHashIndex: avoiding expensive iiter->Next when handling hash kNoEntry (#4296) · da40d452
      Fenggang Wu 提交于
      Summary:
      When returning `kNoEntry` from HashIndex lookup, previously we invalidate the
      `biter` by set `current_=restarts_`, so that the search can continue to the next
      block in case the search result may reside in the next block.
      
      There is one problem: when we are searching for a missing key, if the search
      finds a `kNoEntry` and continue the search to the next block, there is also a
      non-trivial possibility that the HashIndex return `kNoEntry` too, and the
      expensive index iterator `Next()` will happen several times for nothing.
      
      The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly.
      
      Microbenchmark script:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \
                --cache_size=20000000000  --use_data_block_hash_index={true|false}
      ```
      
      `readmissing` performance (lower is better):
      ```
      binary:                      3.6098 micros/op
      hash (before applying diff): 4.1048 micros/op
      hash (after  applying diff): 3.3502 micros/op
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296
      
      Differential Revision: D9419159
      
      Pulled By: fgwu
      
      fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05
      da40d452
    • Y
      Add path to WritableFileWriter. (#4039) · bb5dcea9
      Yanqin Jin 提交于
      Summary:
      We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039
      
      Differential Revision: D8670178
      
      Pulled By: riversand963
      
      fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
      bb5dcea9
  3. 23 8月, 2018 2 次提交
  4. 22 8月, 2018 4 次提交
  5. 21 8月, 2018 5 次提交
  6. 18 8月, 2018 3 次提交
    • J
      adds missing PopSavePoint method to Transaction (#4256) · 90f74494
      jsteemann 提交于
      Summary:
      Transaction has had methods to deal with SavePoints already, but
      was missing the PopSavePoint method provided by WriteBatch and
      WriteBatchWithIndex.
      This PR adds PopSavePoint to Transaction as well. Having the method
      on Transaction-level too is useful for applications that repeatedly
      execute a sequence of operations that normally succeed, but infrequently
      need to get rolled back. Using SavePoints here is sensible, but as
      operations normally succeed the application may pile up a lot of
      useless SavePoints inside a Transaction, leading to slightly increased
      memory usage for managing the unneeded SavePoints.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256
      
      Differential Revision: D9326932
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6
      90f74494
    • C
      Add CompactRangeOptions for Java (#4220) · c7cf981a
      Christian Esken 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/4195
      
      CompactRangeOptions are available the CPP API, but not in the Java API. This PR adds CompactRangeOptions to the Java API and adds an overloaded compactRange() method. See https://github.com/facebook/rocksdb/issues/4195 for the original discussion.
      
      This change supports all fields of CompactRangeOptions, including the required enum converters in the JNI portal.
      
      Significant changes:
      - Make CompactRangeOptions available in the compactRange() for Java.
      - Deprecate other compactRange() methods that have individual option params, like in the CPP code.
      - Migrate rocksdb_compactrange_helper() to  CompactRangeOptions.
      - Add Java unit tests for CompactRangeOptions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4220
      
      Differential Revision: D9380007
      
      Pulled By: sagar0
      
      fbshipit-source-id: 6af6c334f221427f1997b33fb24c3986b092fed6
      c7cf981a
    • A
      #3865 followup for fix performance degression introduced by switching order of operands (#4284) · fa4de6e3
      Andrey Zagrebin 提交于
      Summary:
      Followup for #4266. There is one more place in **get_context.cc** where **MergeOperator::ShouldMerge** should be called with reversed list of operands.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4284
      
      Differential Revision: D9380008
      
      Pulled By: sagar0
      
      fbshipit-source-id: 70ec26e607e5b88465e1acbdcd6c6171bd76b9f2
      fa4de6e3
  7. 17 8月, 2018 4 次提交
  8. 16 8月, 2018 1 次提交
    • F
      Improve point-lookup performance using a data block hash index (#4174) · 19ec44fd
      Fenggang Wu 提交于
      Summary:
      Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.`
      
      The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4174
      
      Differential Revision: D8965914
      
      Pulled By: fgwu
      
      fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198
      19ec44fd
  9. 15 8月, 2018 4 次提交
  10. 14 8月, 2018 4 次提交
    • H
      c-api: add some missing options · d916a110
      Huachao Huang 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4267
      
      Differential Revision: D9309505
      
      Pulled By: anand1976
      
      fbshipit-source-id: eb9fee8037f4ff24dc1cdd5cc5ef41c231a03e1f
      d916a110
    • S
      Add a unit test to verify iterators release data blocks after using them (#4170) · f3d91a0b
      Siying Dong 提交于
      Summary:
      Add a unit test to check that iterators release data blocks after it has moved away from it. Verify the same for compaction input iterators.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4170
      
      Differential Revision: D8962513
      
      Pulled By: siying
      
      fbshipit-source-id: 05a5b604d7d29887fb488f2cda7286f554a14407
      f3d91a0b
    • Z
      RocksDB Trace Analyzer (#4091) · 999d955e
      Zhichao Cao 提交于
      Summary:
      A framework of trace analyzing for RocksDB
      
      After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
      **Input:**
      1. trace file
      2. Whole keys space file
      
      **Statistics:**
      1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
      2. Key hotness (access count) of each one
      3. Key space separation based on given prefix
      4. Key size distribution
      5. Value size distribution if appliable
      6. Top K accessed keys
      7. QPS statistics including the average QPS and peak QPS
      8. Top K accessed prefix
      9. The query correlation analyzing, output the number of X after Y and the corresponding average time
          intervals
      
      **Output:**
      1. key access heat map (either in the accessed key space or whole key space)
      2. trace sequence file (interpret the raw trace file to line base text file for future use)
      3. Time serial (The key space ID and its access time)
      4. Key access count distritbution
      5. Key size distribution
      6. Value size distribution (in each intervals)
      7. whole key space separation by the prefix
      8. Accessed key space separation by the prefix
      9. QPS of each operation and each column family
      10. Top K QPS and their accessed prefix range
      
      **Test:**
      1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
      2. Generated the trace and analyze the trace
      
      **Implemented but not tested (due to the limitation of trace_replay):**
      1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
      2. Analyzing the number of Key found by Get
      
      **Future Work:**
      1.  Support execution time analyzing of each requests
      2.  Support cache hit situation and block read situation of Get
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091
      
      Differential Revision: D9256157
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
      999d955e
    • Y
      Remove an assersion about file size (#4268) · 1b1d2643
      Yanqin Jin 提交于
      Summary:
      Due to 4ea56b1b, we should also remove the
      assersion in stress test. This removal can be temporary, and we can add it back
      once we figure out the reason for the 0-byte SSTs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268
      
      Differential Revision: D9297186
      
      Pulled By: riversand963
      
      fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
      1b1d2643
  11. 12 8月, 2018 1 次提交
    • A
      Revert changes in PR #4003 (#4263) · 4ea56b1b
      Anand Ananthabhotla 提交于
      Summary:
      Revert this change. Not generating the OnTableFileCreated() notification for a 0 byte SST on flush breaks the assumption that every OnTableFileCreationStarted() notification is followed by a corresponding OnTableFileCreated().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4263
      
      Differential Revision: D9285623
      
      Pulled By: anand1976
      
      fbshipit-source-id: 808c3dcd498b4b4f4ed4be947a29a24b2296aa8d
      4ea56b1b