1. 08 9月, 2018 1 次提交
  2. 07 9月, 2018 3 次提交
  3. 06 9月, 2018 2 次提交
  4. 01 9月, 2018 2 次提交
    • A
      Reduce empty SST creation/deletion in compaction (#4336) · 1a88c437
      Andrew Kryczka 提交于
      Summary:
      This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.
      
      Also fixed an uninitialized variable in db_stress.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336
      
      Differential Revision: D9600080
      
      Pulled By: ajkr
      
      fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
      1a88c437
    • Y
      BlobDB: GetLiveFiles and GetLiveFilesMetadata return relative path (#4326) · 462ed70d
      Yi Wu 提交于
      Summary:
      `GetLiveFiles` and `GetLiveFilesMetadata` should return path relative to db path.
      
      It is a separate issue when `path_relative` is false how can we return relative path. But `DBImpl::GetLiveFiles` don't handle it as well when there are multiple `db_paths`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4326
      
      Differential Revision: D9545904
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 6762d879fcb561df2b612e6fdfb4a6b51db03f5d
      462ed70d
  5. 31 8月, 2018 2 次提交
    • Z
      Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323) · 1cf17ba5
      Zhongyi Xie 提交于
      Summary:
      Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
      Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323
      
      Differential Revision: D9599170
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
      1cf17ba5
    • Y
      BlobDB: Improve info log (#4324) · 3e801e5e
      Yi Wu 提交于
      Summary:
      Improve BlobDB info logs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4324
      
      Differential Revision: D9545074
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 678ab8820a78758fee451be3b123b0680c1081df
      3e801e5e
  6. 30 8月, 2018 4 次提交
    • S
      Remove trace_analyzer_tool from LIB_SOURCES (#4331) · f46dd5cb
      Sagar Vemuri 提交于
      Summary:
      trace_analyzer_tool should only be in ANALYZER_LIB_SOURCES and not in LIB_SOURCES.
      This fixes java_test travis build failures seen in jtest.
      Blame: a6d3de4e
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4331
      
      Differential Revision: D9560377
      
      Pulled By: sagar0
      
      fbshipit-source-id: 6b9636201a920b56ee0f61e367fee5d3dca692b0
      f46dd5cb
    • W
      use atomic O_CLOEXEC when available (#4328) · d00e5de7
      Wez Furlong 提交于
      Summary:
      In our application we spawn helper child processes concurrently with
      opening rocksdb.  In one situation I observed that the child process had inherited
      the rocksdb lock file as well as directory handles to the rocksdb storage location.
      
      The code in env_posix takes care to set CLOEXEC but doesn't use `O_CLOEXEC` at the
      time that the files are opened which means that there is a window of opportunity
      to leak the descriptors across a fork/exec boundary.
      
      This diff introduces a helper that can conditionally set the `O_CLOEXEC` bit for
      the open call using the same logic as that in the existing helper for setting
      that flag post-open.
      
      I've preserved the post-open logic for systems that don't have `O_CLOEXEC`.
      
      I've introduced setting `O_CLOEXEC` for what appears to be a number of temporary
      or transient files and directory handles; I suspect that none of the files
      opened by Rocks are intended to be inherited by a forked child process.
      
      In one case, `fopen` is used to open a file.  I've added the use of the glibc-specific `e`
      mode to turn on `O_CLOEXEC` for this case.  While this doesn't cover all posix systems,
      it is an improvement for our common deployment system.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4328
      
      Reviewed By: ajkr
      
      Differential Revision: D9553046
      
      Pulled By: wez
      
      fbshipit-source-id: acdb89f7a85ca649b22fe3c3bd76f82142bec2bf
      d00e5de7
    • M
      Avoiding write stall caused by manual flushes (#4297) · 927f2749
      Mikhail Antonov 提交于
      Summary:
      Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening.
      
      One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable.
      
      This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297
      
      Differential Revision: D9420705
      
      Pulled By: mikhail-antonov
      
      fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc
      927f2749
    • F
      data block hash index blog post · 5f63a89b
      Fenggang Wu 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4309
      
      Differential Revision: D9557843
      
      Pulled By: sagar0
      
      fbshipit-source-id: 190e4ccedfaeaacd96d945610de843f97c307540
      5f63a89b
  7. 29 8月, 2018 2 次提交
    • P
      Grab straggler files to explicitly import AutoHeaders · a876995e
      Philip Jameson 提交于
      Summary: There were a few files that were missed when AutoHeaders were moved to their own file. Add explicit loads
      
      Reviewed By: yfeldblum
      
      Differential Revision: D9499942
      
      fbshipit-source-id: 942bf3a683b8961e1b6244136f6337477dcc45af
      a876995e
    • A
      Sync CURRENT file during checkpoint (#4322) · 42733637
      Andrew Kryczka 提交于
      Summary: For the CURRENT file forged during checkpoint, we were forgetting to `fsync` or `fdatasync` it after its creation. This PR fixes it.
      
      Differential Revision: D9525939
      
      Pulled By: ajkr
      
      fbshipit-source-id: a505483644026ee3f501cfc0dcbe74832165b2e3
      42733637
  8. 28 8月, 2018 4 次提交
    • Y
      BlobDB: Avoid returning garbage value on key not found (#4321) · 38ad3c9f
      Yi Wu 提交于
      Summary:
      When reading an expired key using `Get(..., std::string* value)` API, BlobDB first read the index entry and decode expiration from it. In this case, although BlobDB reset the PinnableSlice, the index entry is stored in user provided string `value`. The value will be returned as a garbage value, despite status being NotFound. Fixing it by use a different PinnableSlice to read the index entry.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4321
      
      Differential Revision: D9519042
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f054c951a1fa98265228be94f931904ed7056677
      38ad3c9f
    • J
      cmake: allow opting out debug runtime (#4317) · 6ed7f146
      Jay Lee 提交于
      Summary:
      Projects built in debug profile don't always link to debug runtime.
      Allowing opting out the debug runtime to make rocksdb get along well
      with other projects.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4317
      
      Differential Revision: D9518038
      
      Pulled By: sagar0
      
      fbshipit-source-id: 384901a0d12b8de20759756e8a19b4888a27c399
      6ed7f146
    • Y
      BlobDB: Implement DisableFileDeletions (#4314) · a6d3de4e
      Yi Wu 提交于
      Summary:
      `DB::DiableFileDeletions` and `DB::EnableFileDeletions` are used for applications to stop RocksDB background jobs to delete files while they are doing replication. Implement these methods for BlobDB. `DeleteObsolteFiles` now needs to check `disable_file_deletions_` before starting, and will hold `delete_file_mutex_` the whole time while it is running. `DisableFileDeletions` needs to wait on `delete_file_mutex_` for running `DeleteObsolteFiles` job and set `disable_file_deletions_` flag.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4314
      
      Differential Revision: D9501373
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 81064c1228f1724eff46da22b50ff765b16292cd
      a6d3de4e
    • S
      Download bzip2 packages from Internet Archive (#4306) · 2f871bc8
      Sagar Vemuri 提交于
      Summary:
      Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source.
      
      Fixes issue: #4305
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306
      
      Differential Revision: D9514868
      
      Pulled By: sagar0
      
      fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5
      2f871bc8
  9. 25 8月, 2018 5 次提交
  10. 24 8月, 2018 7 次提交
    • A
      Fix clang build of db_stress (#4312) · e7bb8e9b
      Andrew Kryczka 提交于
      Summary:
      Blame: #4307
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312
      
      Differential Revision: D9494093
      
      Pulled By: ajkr
      
      fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
      e7bb8e9b
    • A
      Digest ZSTD compression dictionary once per SST file (#4251) · 6c40806e
      Andrew Kryczka 提交于
      Summary:
      In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.
      
      ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.
      
      There are a couple other changes included in this PR:
      
      - Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
      - Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251
      
      Differential Revision: D9257078
      
      Pulled By: ajkr
      
      fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
      6c40806e
    • A
      Invoke OnTableFileCreated for empty SSTs (#4307) · ee234e83
      Andrew Kryczka 提交于
      Summary:
      The API comment on `OnTableFileCreationStarted` (https://github.com/facebook/rocksdb/blob/b6280d01f9f9c4305c536dfb804775fce3956280/include/rocksdb/listener.h#L331-L333) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.
      
      This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307
      
      Differential Revision: D9485201
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
      ee234e83
    • Z
      Add the unit test of Iterator to trace_analyzer_test (#4282) · cf7150ac
      zhichao-cao 提交于
      Summary:
      Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282
      
      Differential Revision: D9436758
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
      cf7150ac
    • G
      Adding a method for memtable class for memtable getting flushed. (#4304) · ad789e4e
      Gauresh Rane 提交于
      Summary:
      Memtables are selected for flushing by the flush job. Currently we
      have listener which is invoked when memtables for a column family are
      flushed. That listener does not indicate which memtable was flushed in
      the notification. If clients want to know if particular data in the
      memtable was retired, there is no straight forward way to know this.
      This method will help users who implement memtablerep factory and extend
      interface for memtablerep, to know if the data in the memtable was
      retired.
      Another option that was tried, was to depend on memtable destructor to
      be called after flush to mark that data was persisted. This works all
      the time but sometimes there can huge delays between actual flush
      happening and memtable getting destroyed. Hence, if anyone who is
      waiting for data to persist will have to wait that longer.
      It is expected that anyone who is implementing this method to have
      return quickly as it blocks RocksDB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304
      
      Reviewed By: riversand963
      
      Differential Revision: D9472312
      
      Pulled By: gdrane
      
      fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d
      ad789e4e
    • F
      DataBlockHashIndex: avoiding expensive iiter->Next when handling hash kNoEntry (#4296) · da40d452
      Fenggang Wu 提交于
      Summary:
      When returning `kNoEntry` from HashIndex lookup, previously we invalidate the
      `biter` by set `current_=restarts_`, so that the search can continue to the next
      block in case the search result may reside in the next block.
      
      There is one problem: when we are searching for a missing key, if the search
      finds a `kNoEntry` and continue the search to the next block, there is also a
      non-trivial possibility that the HashIndex return `kNoEntry` too, and the
      expensive index iterator `Next()` will happen several times for nothing.
      
      The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly.
      
      Microbenchmark script:
      ```
      TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \
                --cache_size=20000000000  --use_data_block_hash_index={true|false}
      ```
      
      `readmissing` performance (lower is better):
      ```
      binary:                      3.6098 micros/op
      hash (before applying diff): 4.1048 micros/op
      hash (after  applying diff): 3.3502 micros/op
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296
      
      Differential Revision: D9419159
      
      Pulled By: fgwu
      
      fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05
      da40d452
    • Y
      Add path to WritableFileWriter. (#4039) · bb5dcea9
      Yanqin Jin 提交于
      Summary:
      We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039
      
      Differential Revision: D8670178
      
      Pulled By: riversand963
      
      fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
      bb5dcea9
  11. 23 8月, 2018 2 次提交
  12. 22 8月, 2018 4 次提交
  13. 21 8月, 2018 2 次提交
    • F
      DataBlockHashIndex: Remove the division from EstimateSize() (#4293) · 6d37fdb3
      Fenggang Wu 提交于
      Summary:
      `BlockBasedTableBuilder::Add()` eventually calls
      `DataBlockHashIndexBuilder::EstimateSize()`. The previous implementation
      divides the `num_keys` by the `util_ratio_` to get an estimizted `num_buckets`.
      Such division is expensive as it happens in every
      `BlockBasedTableBuilder::Add()`.
      
      This diff estimates the `num_buckets` by double addition instead of double
      division. Specifically, in each `Add()`, we add `bucket_per_key_`
      (inverse of `util_ratio_`) to the current `estimiated_num_buckets_`. The cost is
      that we are gonna have the `estimated_num_buckets_` stored as one extra field
      in the DataBlockHashIndexBuilder.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4293
      
      Differential Revision: D9412472
      
      Pulled By: fgwu
      
      fbshipit-source-id: 2925c8509a401e7bd3c1ab1d9e9c7244755c277a
      6d37fdb3
    • Y
      BlobDB: Fix expired file not being evicted (#4294) · 7188bd34
      Yi Wu 提交于
      Summary:
      Fix expired file not being evicted from the DB. We have a background task (previously called `CheckSeqFiles` and I rename it to `EvictExpiredFiles`) to scan and remove expired files, but it only close the files, not marking them as expired.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4294
      
      Differential Revision: D9415984
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: eff7bf0331c52a7ccdb02318602bff7f64f3ef3d
      7188bd34