1. 13 1月, 2017 1 次提交
  2. 12 1月, 2017 2 次提交
    • A
      direct reads refactor · dc2584ee
      Aaron Gao 提交于
      Summary:
      direct IO reads refactoring
      remove unnecessary classes and unified interfaces
      tested with db_bench
      
      need more change for options and ON/OFF for different files.
      Since disabled is default, it should be fine now
      Closes https://github.com/facebook/rocksdb/pull/1636
      
      Differential Revision: D4307189
      
      Pulled By: lightmark
      
      fbshipit-source-id: 6991e22
      dc2584ee
    • A
      Guarding extra fallocate call with TRAVIS because its not working pro… · 62384ebe
      Anirban Rahut 提交于
      Summary:
      …perly on travis
      
       There is some old code in PosixWritableFile::Close(), which
      truncates the file to the measured size and then does an extra fallocate
      with KEEP_SIZE. This is commented as a failsafe because in some
      cases ftruncate doesn't do the right job (I don't know of an instance of
      this btw). However doing an fallocate with KEEP_SIZE should not increase
      the file size. However on Travis Worker which is Docker (likely AUFS )
      its not working. There are comments on web that show that the AUFS
      author had initially not implemented fallocate, and then did it later.
      So not sure what is the quality of the implementation.
      Closes https://github.com/facebook/rocksdb/pull/1765
      
      Differential Revision: D4401340
      
      Pulled By: anirbanr-fb
      
      fbshipit-source-id: e2d8100
      62384ebe
  3. 11 1月, 2017 1 次提交
    • A
      Allow incrementing refcount on cache handles · fe395fb6
      Andrew Kryczka 提交于
      Summary:
      Previously the only way to increment a handle's refcount was to invoke Lookup(), which (1) did hash table lookup to get cache handle, (2) incremented that handle's refcount. For a future DeleteRange optimization, I added a function, Ref(), for when the caller already has a cache handle and only needs to do (2).
      Closes https://github.com/facebook/rocksdb/pull/1761
      
      Differential Revision: D4397114
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9addbe5
      fe395fb6
  4. 10 1月, 2017 1 次提交
    • D
      Fix Windows environment issues · 3c233ca4
      Dmitri Smirnov 提交于
      Summary:
      Enable directIO on WritableFileImpl::Append
           with offset being current length of the file.
           Enable UniqueID tests on Windows, disable others but
           leeting them to compile. Unique tests are valuable to
           detect failures on different filesystems and upcoming
           ReFS.
           Clear output in WinEnv Getchildren.This is different from
           previous strategy, do not touch output on failure.
           Make sure DBTest.OpenWhenOpen works with windows error message
      Closes https://github.com/facebook/rocksdb/pull/1746
      
      Differential Revision: D4385681
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: c07b702
      3c233ca4
  5. 09 1月, 2017 2 次提交
    • M
      Revert "PinnableSlice" · d0ba8ec8
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 54d94e9c.
      
      The pull request was landed by mistake.
      Closes https://github.com/facebook/rocksdb/pull/1755
      
      Differential Revision: D4391678
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 36d5149
      d0ba8ec8
    • M
      PinnableSlice · 54d94e9c
      Maysam Yabandeh 提交于
      Summary:
      Currently the point lookup values are copied to a string provided by the user.
      This incures an extra memcpy cost. This patch allows doing point lookup
      via a PinnableSlice which pins the source memory location (instead of
      copying their content) and releases them after the content is consumed
      by the user. The old API of Get(string) is translated to the new API
      underneath.
      
       Here is the summary for improvements:
       1. value 100 byte: 1.8%  regular, 1.2% merge values
       2. value 1k   byte: 11.5% regular, 7.5% merge values
       3. value 10k byte: 26% regular,    29.9% merge values
      
       The improvement for merge could be more if we extend this approach to
       pin the merge output and delay the full merge operation until the user
       actually needs it. We have put that for future work.
      
      PS:
      Sometimes we observe a small decrease in performance when switching from
      t5452014 to this patch but with the old Get(string) API. The difference
      is a little and could be noise. More importantly it is safely
      cancelled
      Closes https://github.com/facebook/rocksdb/pull/1732
      
      Differential Revision: D4374613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a077f1a
      54d94e9c
  6. 07 1月, 2017 1 次提交
  7. 04 1月, 2017 1 次提交
  8. 29 12月, 2016 1 次提交
    • S
      Always fsync the file after file copying · 17a4b75c
      Siying Dong 提交于
      Summary:
      File copying happens when creating checkpoints and bulkloading files from different FS partition. We should fsync the files when copying them to guarantee durability. A side effect will be that the dirty pages in file system buffers won't grow too large.
      Closes https://github.com/facebook/rocksdb/pull/1728
      
      Differential Revision: D4371083
      
      Pulled By: siying
      
      fbshipit-source-id: 579e14c
      17a4b75c
  9. 23 12月, 2016 2 次提交
    • Y
      Print cache options to info log · ab48c165
      Yi Wu 提交于
      Summary:
      Improve cache options logging to info log.
      Also print the value of
      cache_index_and_filter_blocks_with_high_priority.
      Closes https://github.com/facebook/rocksdb/pull/1709
      
      Differential Revision: D4358776
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8f030a0
      ab48c165
    • A
      direct io write support · 972f96b3
      Aaron Gao 提交于
      Summary:
      rocksdb direct io support
      
      ```
      [gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 5.0
      Date:       Wed Nov 23 13:17:43 2016
      CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
      CPUCache:   25600 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Compression: Snappy
      Memtablerep: skip_list
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      DB path: [/tmp/rocksdbtest-112628/dbbench]
      fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s
      
      [gzh@dev11575.prn2 ~/roc
      Closes https://github.com/facebook/rocksdb/pull/1564
      
      Differential Revision: D4241093
      
      Pulled By: lightmark
      
      fbshipit-source-id: 98c29e3
      972f96b3
  10. 22 12月, 2016 1 次提交
  11. 17 12月, 2016 2 次提交
  12. 15 12月, 2016 3 次提交
  13. 14 12月, 2016 3 次提交
    • D
      util/logging.cc: buffer of insufficient size (gcc-7 -Werror=format-length) · e097222e
      Daniel Black 提交于
      Summary:
      util/logging.cc:100:13: error: output may be truncated before the last format character [-Werror=format-length=]
       std::string NumberToHumanString(int64_t num) {
                   ^~~~~~~~~~~~~~~~~~~
      util/logging.cc:106:59: note: format output between 3 and 19 bytes into a destination of size 16
           snprintf(buf, sizeof(buf), "%" PRIi64 "K", num / 1000);
      Closes https://github.com/facebook/rocksdb/pull/1653
      
      Differential Revision: D4318687
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3a5c931
      e097222e
    • D
      Gcc 7 error expansion to defined · bfbcec23
      Daniel Black 提交于
      Summary:
      sorry if these gcc-7/clang-4 cleanups are getting tedious.
      Closes https://github.com/facebook/rocksdb/pull/1658
      
      Differential Revision: D4318792
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8e85891
      bfbcec23
    • D
      util/histogram.cc: HistogramStat::toString buffer insufficient · c3e5ee71
      Daniel Black 提交于
      Summary:
      Increased buffer size to 1650.
      
      util/histogram.cc: In member function 'std::__cxx11::string rocksdb::HistogramStat::ToString() const':
      util/histogram.cc:189:13: error: '%.2f' directive output truncated writing between 4 and 313 bytes into a region of size 0 [-Werror=format-length=]
       std::string HistogramStat::ToString() const {
                   ^~~~~~~~~~~~~
      util/histogram.cc:205:30: note: format output between 69 and 1614 bytes into a destination of size 200
                  Percentile(99.99));
                                    ^
      cc1plus: all warnings being treated as errors
      Makefile:1521: recipe for target 'util/histogram.o' failed
      Closes https://github.com/facebook/rocksdb/pull/1660
      
      Differential Revision: D4318820
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 45ae6ea
      c3e5ee71
  14. 13 12月, 2016 1 次提交
    • A
      Return finer-granularity status from Env::GetChildren* · f0c509e2
      Andrew Kryczka 提交于
      Summary:
      It'd be nice to use the error status type to distinguish
      between user error and system error. For example, GetChildren can fail
      listing a backup directory's contents either because a bad path was provided
      (user error) or because an operation failed, e.g., a remote storage service
      call failed (system error). In the former case, we want to continue and treat
      the backup directory as empty; in the latter case, we want to immediately
      propagate the error to the caller.
      
      This diff uses NotFound to indicate user error and IOError to indicate
      system error. Previously IOError indicated both.
      Closes https://github.com/facebook/rocksdb/pull/1644
      
      Differential Revision: D4312157
      
      Pulled By: ajkr
      
      fbshipit-source-id: 51b4f24
      f0c509e2
  15. 06 12月, 2016 2 次提交
    • M
      Implement non-exclusive locks · 2005c88a
      Manuel Ung 提交于
      Summary:
      This is an implementation of non-exclusive locks for pessimistic transactions. It is relatively simple and does not prevent starvation (ie. it's possible that request for exclusive access will never be granted if there are always threads holding shared access). It is done by changing `KeyLockInfo` to hold an set a transaction ids, instead of just one, and adding a flag specifying whether this lock is currently held with exclusive access or not.
      
      Some implementation notes:
      - Some lock diagnostic functions had to be updated to return a set of transaction ids for a given lock, eg. `GetWaitingTxn` and `GetLockStatusData`.
      - Deadlock detection is a bit more complicated since a transaction can now wait on multiple other transactions. A BFS is done in this case, and deadlock detection depth is now just a limit on the number of transactions we visit.
      - Expirable transactions do not work efficiently with shared locks at the moment, but that's okay for now.
      Closes https://github.com/facebook/rocksdb/pull/1573
      
      Differential Revision: D4239097
      
      Pulled By: lth
      
      fbshipit-source-id: da7c074
      2005c88a
    • A
      Made delete_obsolete_files_period_micros option dynamic · 9053fe2a
      Anton Safonov 提交于
      Summary:
      Made delete_obsolete_files_period_micros option dynamic. It can be updating using DB::SetDBOptions().
      Closes https://github.com/facebook/rocksdb/pull/1595
      
      Differential Revision: D4246569
      
      Pulled By: tonek
      
      fbshipit-source-id: d23f560
      9053fe2a
  16. 02 12月, 2016 2 次提交
    • I
      Cache heap::downheap() root comparison (optimize heap cmp call) · 4a21b140
      Islam AbdelRahman 提交于
      Summary:
      Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child)
      So next time we can compare directly against the smallest child
      
      I see that the total number of calls to comparator drops significantly when using this optimization
      
      Before caching (~2mil key comparison for iterating the DB)
      ```
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000  --perf_level=2
      readseq      :       0.338 micros/op 2959201 ops/sec;  327.4 MB/s user_key_comparison_count = 2000008
      ```
      After caching (~1mil key comparison for iterating the DB)
      ```
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2
      readseq      :       0.309 micros/op 3236801 ops/sec;  358.1 MB/s user_key_comparison_count = 1000011
      ```
      
      It also improves
      Closes https://github.com/facebook/rocksdb/pull/1600
      
      Differential Revision: D4256027
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: 76fcc66
      4a21b140
    • I
      Fix travis (compile for clang < 3.9) · e39d0808
      Islam AbdelRahman 提交于
      Summary:
      Travis fail because it uses clang 3.6 which don't recognize
      `__attribute__((__no_sanitize__("undefined")))`
      Closes https://github.com/facebook/rocksdb/pull/1601
      
      Differential Revision: D4257175
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: fb4d1ab
      e39d0808
  17. 29 11月, 2016 3 次提交
  18. 24 11月, 2016 1 次提交
    • S
      Improve Write Stalling System · cd7c4143
      Siying Dong 提交于
      Summary:
      Current write stalling system has the problem of lacking of positive feedback if the restricted rate is already too low. Users sometimes stack in very low slowdown value. With the diff, we add a positive feedback (increasing the slowdown value) if we recover from slowdown state back to normal. To avoid the positive feedback to keep the slowdown value to be to high, we add issue a negative feedback every time we are close to the stop condition. Experiments show it is easier to reach a relative balance than before.
      
      Also increase level0_stop_writes_trigger default from 24 to 32. Since level0_slowdown_writes_trigger default is 20, stop trigger 24 only gives four files as the buffer time to slowdown writes. In order to avoid stop in four files while 20 files have been accumulated, the slowdown value must be very low, which is amost the same as stop. It also doesn't give enough time for the slowdown value to converge. Increase it to 32 will smooth out the system.
      Closes https://github.com/facebook/rocksdb/pull/1562
      
      Differential Revision: D4218519
      
      Pulled By: siying
      
      fbshipit-source-id: 95e4088
      cd7c4143
  19. 22 11月, 2016 3 次提交
  20. 21 11月, 2016 1 次提交
    • C
      Fix deadlock when calling getMergedHistogram · a0deec96
      Changli Gao 提交于
      Summary:
      When calling StatisticsImpl::HistogramInfo::getMergedHistogram(), if
      there is a dying thread, which is calling
      ThreadLocalPtr::StaticMeta::OnThreadExit() to merge its thread values to
      HistogramInfo, deadlock will occur. Because the former try to hold
      merge_lock then ThreadMeta::mutex_, but the later try to hold
      ThreadMeta::mutex_ then merge_lock. In short, the locking order isn't
      the same.
      
      This patch addressed this issue by releasing merge_lock before folding
      thread values.
      Closes https://github.com/facebook/rocksdb/pull/1552
      
      Differential Revision: D4211942
      
      Pulled By: ajkr
      
      fbshipit-source-id: ef89bcb
      a0deec96
  21. 20 11月, 2016 1 次提交
    • M
      Use more efficient hash map for deadlock detection · e63350e7
      Manuel Ung 提交于
      Summary:
      Currently, deadlock cycles are held in std::unordered_map. The problem with it is that it allocates/deallocates memory on every insertion/deletion. This limits throughput since we're doing this expensive operation while holding a global mutex. Fix this by using a vector which caches memory instead.
      
      Running the deadlock stress test, this change increased throughput from 39k txns/s -> 49k txns/s. The effect is more noticeable in MyRocks.
      Closes https://github.com/facebook/rocksdb/pull/1545
      
      Differential Revision: D4205662
      
      Pulled By: lth
      
      fbshipit-source-id: ff990e4
      e63350e7
  22. 19 11月, 2016 2 次提交
  23. 17 11月, 2016 1 次提交
  24. 16 11月, 2016 1 次提交
  25. 15 11月, 2016 1 次提交