1. 29 8月, 2018 1 次提交
    • A
      Sync CURRENT file during checkpoint (#4322) · 42733637
      Andrew Kryczka 提交于
      Summary: For the CURRENT file forged during checkpoint, we were forgetting to `fsync` or `fdatasync` it after its creation. This PR fixes it.
      
      Differential Revision: D9525939
      
      Pulled By: ajkr
      
      fbshipit-source-id: a505483644026ee3f501cfc0dcbe74832165b2e3
      42733637
  2. 28 8月, 2018 4 次提交
    • Y
      BlobDB: Avoid returning garbage value on key not found (#4321) · 38ad3c9f
      Yi Wu 提交于
      Summary:
      When reading an expired key using `Get(..., std::string* value)` API, BlobDB first read the index entry and decode expiration from it. In this case, although BlobDB reset the PinnableSlice, the index entry is stored in user provided string `value`. The value will be returned as a garbage value, despite status being NotFound. Fixing it by use a different PinnableSlice to read the index entry.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4321
      
      Differential Revision: D9519042
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f054c951a1fa98265228be94f931904ed7056677
      38ad3c9f
    • J
      cmake: allow opting out debug runtime (#4317) · 6ed7f146
      Jay Lee 提交于
      Summary:
      Projects built in debug profile don't always link to debug runtime.
      Allowing opting out the debug runtime to make rocksdb get along well
      with other projects.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4317
      
      Differential Revision: D9518038
      
      Pulled By: sagar0
      
      fbshipit-source-id: 384901a0d12b8de20759756e8a19b4888a27c399
      6ed7f146
    • Y
      BlobDB: Implement DisableFileDeletions (#4314) · a6d3de4e
      Yi Wu 提交于
      Summary:
      `DB::DiableFileDeletions` and `DB::EnableFileDeletions` are used for applications to stop RocksDB background jobs to delete files while they are doing replication. Implement these methods for BlobDB. `DeleteObsolteFiles` now needs to check `disable_file_deletions_` before starting, and will hold `delete_file_mutex_` the whole time while it is running. `DisableFileDeletions` needs to wait on `delete_file_mutex_` for running `DeleteObsolteFiles` job and set `disable_file_deletions_` flag.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4314
      
      Differential Revision: D9501373
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 81064c1228f1724eff46da22b50ff765b16292cd
      a6d3de4e
    • S
      Download bzip2 packages from Internet Archive (#4306) · 2f871bc8
      Sagar Vemuri 提交于
      Summary:
      Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source.
      
      Fixes issue: #4305
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306
      
      Differential Revision: D9514868
      
      Pulled By: sagar0
      
      fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5
      2f871bc8
  3. 25 8月, 2018 5 次提交
  4. 24 8月, 2018 7 次提交
    • A
      Fix clang build of db_stress (#4312) · e7bb8e9b
      Andrew Kryczka 提交于
      Summary:
      Blame: #4307
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312
      
      Differential Revision: D9494093
      
      Pulled By: ajkr
      
      fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
      e7bb8e9b
    • A
      Digest ZSTD compression dictionary once per SST file (#4251) · 6c40806e
      Andrew Kryczka 提交于
      Summary:
      In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.
      
      ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.
      
      There are a couple other changes included in this PR:
      
      - Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
      - Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251
      
      Differential Revision: D9257078
      
      Pulled By: ajkr
      
      fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
      6c40806e
    • A
      Invoke OnTableFileCreated for empty SSTs (#4307) · ee234e83
      Andrew Kryczka 提交于
      Summary:
      The API comment on `OnTableFileCreationStarted` (https://github.com/facebook/rocksdb/blob/b6280d01f9f9c4305c536dfb804775fce3956280/include/rocksdb/listener.h#L331-L333) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.
      
      This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307
      
      Differential Revision: D9485201
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
      ee234e83
    • Z
      Add the unit test of Iterator to trace_analyzer_test (#4282) · cf7150ac
      zhichao-cao 提交于
      Summary:
      Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282
      
      Differential Revision: D9436758
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
      cf7150ac
    • G
      Adding a method for memtable class for memtable getting flushed. (#4304) · ad789e4e
      Gauresh Rane 提交于
      Summary:
      Memtables are selected for flushing by the flush job. Currently we
      have listener which is invoked when memtables for a column family are
      flushed. That listener does not indicate which memtable was flushed in
      the notification. If clients want to know if particular data in the
      memtable was retired, there is no straight forward way to know this.
      This method will help users who implement memtablerep factory and extend
      interface for memtablerep, to know if the data in the memtable was
      retired.
      Another option that was tried, was to depend on memtable destructor to
      be called after flush to mark that data was persisted. This works all
      the time but sometimes there can huge delays between actual flush
      happening and memtable getting destroyed. Hence, if anyone who is
      waiting for data to persist will have to wait that longer.
      It is expected that anyone who is implementing this method to have
      return quickly as it blocks RocksDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304
      
      Reviewed By: riversand963
      
      Differential Revision: D9472312
      
      Pulled By: gdrane
      
      fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d
      ad789e4e
    • F
      DataBlockHashIndex: avoiding expensive iiter->Next when handling hash kNoEntry (#4296) · da40d452
      Fenggang Wu 提交于
      Summary:
      When returning `kNoEntry` from HashIndex lookup, previously we invalidate the
      `biter` by set `current_=restarts_`, so that the search can continue to the next
      block in case the search result may reside in the next block.
      
      There is one problem: when we are searching for a missing key, if the search
      finds a `kNoEntry` and continue the search to the next block, there is also a
      non-trivial possibility that the HashIndex return `kNoEntry` too, and the
      expensive index iterator `Next()` will happen several times for nothing.
      
      The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly.
      
      Microbenchmark script:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \
                --cache_size=20000000000  --use_data_block_hash_index={true|false}
      ```
      
      `readmissing` performance (lower is better):
      ```
      binary:                      3.6098 micros/op
      hash (before applying diff): 4.1048 micros/op
      hash (after  applying diff): 3.3502 micros/op
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296
      
      Differential Revision: D9419159
      
      Pulled By: fgwu
      
      fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05
      da40d452
    • Y
      Add path to WritableFileWriter. (#4039) · bb5dcea9
      Yanqin Jin 提交于
      Summary:
      We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039
      
      Differential Revision: D8670178
      
      Pulled By: riversand963
      
      fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
      bb5dcea9
  5. 23 8月, 2018 2 次提交
  6. 22 8月, 2018 4 次提交
  7. 21 8月, 2018 5 次提交
  8. 18 8月, 2018 3 次提交
    • J
      adds missing PopSavePoint method to Transaction (#4256) · 90f74494
      jsteemann 提交于
      Summary:
      Transaction has had methods to deal with SavePoints already, but
      was missing the PopSavePoint method provided by WriteBatch and
      WriteBatchWithIndex.
      This PR adds PopSavePoint to Transaction as well. Having the method
      on Transaction-level too is useful for applications that repeatedly
      execute a sequence of operations that normally succeed, but infrequently
      need to get rolled back. Using SavePoints here is sensible, but as
      operations normally succeed the application may pile up a lot of
      useless SavePoints inside a Transaction, leading to slightly increased
      memory usage for managing the unneeded SavePoints.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256
      
      Differential Revision: D9326932
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6
      90f74494
    • C
      Add CompactRangeOptions for Java (#4220) · c7cf981a
      Christian Esken 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/issues/4195
      
      CompactRangeOptions are available the CPP API, but not in the Java API. This PR adds CompactRangeOptions to the Java API and adds an overloaded compactRange() method. See https://github.com/facebook/rocksdb/issues/4195 for the original discussion.
      
      This change supports all fields of CompactRangeOptions, including the required enum converters in the JNI portal.
      
      Significant changes:
      - Make CompactRangeOptions available in the compactRange() for Java.
      - Deprecate other compactRange() methods that have individual option params, like in the CPP code.
      - Migrate rocksdb_compactrange_helper() to  CompactRangeOptions.
      - Add Java unit tests for CompactRangeOptions.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4220
      
      Differential Revision: D9380007
      
      Pulled By: sagar0
      
      fbshipit-source-id: 6af6c334f221427f1997b33fb24c3986b092fed6
      c7cf981a
    • A
      #3865 followup for fix performance degression introduced by switching order of operands (#4284) · fa4de6e3
      Andrey Zagrebin 提交于
      Summary:
      Followup for #4266. There is one more place in **get_context.cc** where **MergeOperator::ShouldMerge** should be called with reversed list of operands.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4284
      
      Differential Revision: D9380008
      
      Pulled By: sagar0
      
      fbshipit-source-id: 70ec26e607e5b88465e1acbdcd6c6171bd76b9f2
      fa4de6e3
  9. 17 8月, 2018 4 次提交
  10. 16 8月, 2018 1 次提交
    • F
      Improve point-lookup performance using a data block hash index (#4174) · 19ec44fd
      Fenggang Wu 提交于
      Summary:
      Add hash index support to data blocks, which helps to reduce the CPU utilization of point-lookup operations. This feature is backward compatible with the data block created without the hash index. It is disabled by default unless `BlockBasedTableOptions::data_block_index_type` is set to `data_block_index_type = kDataBlockBinaryAndHash.`
      
      The DB size would be bigger with the hash index option as a hash table is added at the end of each data block. If the hash utilization ratio is 1:1, the space overhead is one byte per key. The hash table utilization ratio is adjustable using `BlockBasedTableOptions::data_block_hash_table_util_ratio`. A lower utilization ratio will improve more on the point-lookup efficiency, but take more space too.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4174
      
      Differential Revision: D8965914
      
      Pulled By: fgwu
      
      fbshipit-source-id: 1c6bae5d1fc39c80282d8890a72e9e67bc247198
      19ec44fd
  11. 15 8月, 2018 4 次提交