1. 04 4月, 2017 1 次提交
  2. 31 3月, 2017 1 次提交
    • S
      Option to fail a request as incomplete when skipping too many internal keys · c6d04f2e
      Sagar Vemuri 提交于
      Summary:
      Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.
      
      This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.
      
      This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
      Closes https://github.com/facebook/rocksdb/pull/2000
      
      Differential Revision: D4753223
      
      Pulled By: sagar0
      
      fbshipit-source-id: 1c973f7
      c6d04f2e
  3. 23 3月, 2017 1 次提交
  4. 22 3月, 2017 1 次提交
  5. 21 3月, 2017 1 次提交
  6. 17 3月, 2017 1 次提交
    • I
      Break stalls when no bg work is happening · d52f334c
      Islam AbdelRahman 提交于
      Summary:
      Current stall will keep sleeping even if there is no Flush/Compactions to wait for, I changed the logic to break the stall if we are not flushing or compacting
      
      db_bench command used
      ```
      # fillrandom
      # memtable size = 10MB
      # value size = 1 MB
      # num = 1000
      # use /dev/shm
      ./db_bench --benchmarks="fillrandom,stats" --value_size=1048576 --write_buffer_size=10485760 --num=1000 --delayed_write_rate=XXXXX  --db="/dev/shm/new_stall" | grep "Cumulative stall"
      ```
      
      ```
      Current results
      
      # delayed_write_rate = 1000 Kb/sec
      Cumulative stall: 00:00:9.031 H:M:S
      
      # delayed_write_rate = 200 Kb/sec
      Cumulative stall: 00:00:22.314 H:M:S
      
      # delayed_write_rate = 100 Kb/sec
      Cumulative stall: 00:00:42.784 H:M:S
      
      # delayed_write_rate = 50 Kb/sec
      Cumulative stall: 00:01:23.785 H:M:S
      
      # delayed_write_rate = 25 Kb/sec
      Cumulative stall: 00:02:45.702 H:M:S
      ```
      
      ```
      New results
      
      # delayed_write_rate = 1000 Kb/sec
      Cumulative stall: 00:00:9.017 H:M:S
      
      # delayed_write_rate = 200 Kb/sec
      Cumulative stall: 00
      Closes https://github.com/facebook/rocksdb/pull/1884
      
      Differential Revision: D4585439
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: aed2198
      d52f334c
  7. 16 3月, 2017 1 次提交
    • I
      Add macros to include file name and line number during Logging · e1916368
      Islam AbdelRahman 提交于
      Summary:
      current logging
      ```
      2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
      2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
      2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
      2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
      2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
      2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
      2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
      2017/03/14-14:20:31.
      Closes https://github.com/facebook/rocksdb/pull/1990
      
      Differential Revision: D4708695
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: cb8968f
      e1916368
  8. 14 3月, 2017 1 次提交
    • M
      Pinnableslice (2nd attempt) · 11526252
      Maysam Yabandeh 提交于
      Summary:
      PinnableSlice
      
          Summary:
          Currently the point lookup values are copied to a string provided by the
          user. This incures an extra memcpy cost. This patch allows doing point lookup
          via a PinnableSlice which pins the source memory location (instead of
          copying their content) and releases them after the content is consumed
          by the user. The old API of Get(string) is translated to the new API
          underneath.
      
          Here is the summary for improvements:
      
          value 100 byte: 1.8% regular, 1.2% merge values
          value 1k byte: 11.5% regular, 7.5% merge values
          value 10k byte: 26% regular, 29.9% merge values
          The improvement for merge could be more if we extend this approach to
          pin the merge output and delay the full merge operation until the user
          actually needs it. We have put that for future work.
      
          PS:
          Sometimes we observe a small decrease in performance when switching from
          t5452014 to this patch but with the old Get(string) API. The d
      Closes https://github.com/facebook/rocksdb/pull/1756
      
      Differential Revision: D4391738
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6f3edd3
      11526252
  9. 08 3月, 2017 2 次提交
    • S
      Add a memtable-only iterator · 97edc72d
      Sagar Vemuri 提交于
      Summary:
      This PR is to support a way to iterate over all the keys that are just in memtables.
      Closes https://github.com/facebook/rocksdb/pull/1953
      
      Differential Revision: D4663500
      
      Pulled By: sagar0
      
      fbshipit-source-id: 144e177
      97edc72d
    • L
      fix db_sst_test flakiness · 72202962
      Leonidas Galanis 提交于
      Summary:
      db_sst_test had been flaky occasionally in the following way: reached_max_space_on_compaction can in very rare cases be 0. This happens when the limit on maximum allowable space set using SetMaxAllowedSpaceUsage is hit during flush for all test db sizes (1,2,4,8 and 10MB).The fix clears the error returned when the the space limit is reached during flush. This ensures that the compaction call back will always be called. The runtime is increased slightly because the 1MB loop writes more data and hits the limit during multiple flushes until compaction is scheduled.
      Closes https://github.com/facebook/rocksdb/pull/1861
      
      Differential Revision: D4557396
      
      Pulled By: lgalanis
      
      fbshipit-source-id: ff778d1
      72202962
  10. 07 3月, 2017 1 次提交
    • R
      Set logs as getting flushed before releasing lock, race condition fix · 58b12dfe
      Reid Horuff 提交于
      Summary:
      Relating to #1903:
      
      In MaybeFlushColumnFamilies() we want to modify the 'getting_flushed' flag before releasing the db mutex when SwitchMemtable() is called.
      
      The following 2 actions need to be atomic in MaybeFlushColumnFamilies()
      - getting_flushed is false on oldest log
      - we determine that all CFs can be flushed to successfully release oldest log
      - we set getting_flushed = true on the oldest log.
      -------
      - getting_flushed is false on oldest log
      - we determine that all CFs can NOT be flushed to successfully release oldest log
      - we set unable_to_flush_oldest_log_ = true on the oldest log.
      
      #### In the 2pc case:
      
      T1 enters function but is unable to flush all CFs to release log
      T1 sets unable_to_flush_oldest_log_ = true
      T1 begins flushing all CFs possible
      
      T2 enters function but is unable to flush all CFs to release log
      T2 sees unable_to_flush_oldes_log_ has been set so exits
      
      T3 enters function and will be able to flush all CFs to release oldest log
      T3 sets getting_flushed = true on oldes
      Closes https://github.com/facebook/rocksdb/pull/1909
      
      Differential Revision: D4646235
      
      Pulled By: reidHoruff
      
      fbshipit-source-id: c8d0447
      58b12dfe
  11. 03 3月, 2017 1 次提交
  12. 01 3月, 2017 1 次提交
  13. 28 2月, 2017 1 次提交
  14. 24 2月, 2017 1 次提交
  15. 22 2月, 2017 1 次提交
    • M
      Fix interference between max_total_wal_size and db_write_buffer_size checks · 18eeb7b9
      Mike Kolupaev 提交于
      Summary:
      This is a trivial fix for OOMs we've seen a few days ago in logdevice.
      
      RocksDB get into the following state:
      (1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time.
      (2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind.
      (3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes).
      (4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number.
      (5) No write stalling trigger is hit. We rely on max_write_buffer_number
      Closes https://github.com/facebook/rocksdb/pull/1893
      
      Differential Revision: D4593590
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: af79c5f
      18eeb7b9
  16. 18 2月, 2017 2 次提交
  17. 14 2月, 2017 2 次提交
    • Y
      Make DBImpl::has_unpersisted_data_ atomic · c2247dc1
      Yi Wu 提交于
      Summary:
      Seems to me `has_unpersisted_data_` is read from read thread and write
      from write thread concurrently without synchronization. Making it an
      atomic.
      
      I update the logic not because seeing any problem with it, but it just
      feel confusing.
      Closes https://github.com/facebook/rocksdb/pull/1869
      
      Differential Revision: D4555837
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: eff2ab8
      c2247dc1
    • S
      Remove disableDataSync option · eb912a92
      Sagar Vemuri 提交于
      Summary:
      Remove disableDataSync, and another similarly named disable_data_sync options.
      This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
      Closes https://github.com/facebook/rocksdb/pull/1859
      
      Differential Revision: D4541292
      
      Pulled By: sagar0
      
      fbshipit-source-id: 5b3a6ca
      eb912a92
  18. 07 2月, 2017 1 次提交
  19. 04 2月, 2017 1 次提交
  20. 03 2月, 2017 1 次提交
  21. 27 1月, 2017 1 次提交
  22. 26 1月, 2017 2 次提交
  23. 25 1月, 2017 1 次提交
  24. 21 1月, 2017 3 次提交
  25. 20 1月, 2017 2 次提交
  26. 12 1月, 2017 1 次提交
    • M
      Abort compactions more reliably when closing DB · d18dd2c4
      Mike Kolupaev 提交于
      Summary:
      DB shutdown aborts running compactions by setting an atomic shutting_down=true that CompactionJob periodically checks. Without this PR it checks it before processing every _output_ value. If compaction filter filters everything out, the compaction is uninterruptible. This PR adds checks for shutting_down on every _input_ value (in CompactionIterator and MergeHelper).
      
      There's also some minor code cleanup along the way.
      Closes https://github.com/facebook/rocksdb/pull/1639
      
      Differential Revision: D4306571
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: f050890
      d18dd2c4
  27. 09 1月, 2017 2 次提交
    • M
      Revert "PinnableSlice" · d0ba8ec8
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 54d94e9c.
      
      The pull request was landed by mistake.
      Closes https://github.com/facebook/rocksdb/pull/1755
      
      Differential Revision: D4391678
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 36d5149
      d0ba8ec8
    • M
      PinnableSlice · 54d94e9c
      Maysam Yabandeh 提交于
      Summary:
      Currently the point lookup values are copied to a string provided by the user.
      This incures an extra memcpy cost. This patch allows doing point lookup
      via a PinnableSlice which pins the source memory location (instead of
      copying their content) and releases them after the content is consumed
      by the user. The old API of Get(string) is translated to the new API
      underneath.
      
       Here is the summary for improvements:
       1. value 100 byte: 1.8%  regular, 1.2% merge values
       2. value 1k   byte: 11.5% regular, 7.5% merge values
       3. value 10k byte: 26% regular,    29.9% merge values
      
       The improvement for merge could be more if we extend this approach to
       pin the merge output and delay the full merge operation until the user
       actually needs it. We have put that for future work.
      
      PS:
      Sometimes we observe a small decrease in performance when switching from
      t5452014 to this patch but with the old Get(string) API. The difference
      is a little and could be noise. More importantly it is safely
      cancelled
      Closes https://github.com/facebook/rocksdb/pull/1732
      
      Differential Revision: D4374613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a077f1a
      54d94e9c
  28. 29 12月, 2016 1 次提交
  29. 23 12月, 2016 1 次提交
    • A
      direct io write support · 972f96b3
      Aaron Gao 提交于
      Summary:
      rocksdb direct io support
      
      ```
      [gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      RocksDB:    version 5.0
      Date:       Wed Nov 23 13:17:43 2016
      CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
      CPUCache:   25600 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      Prefix:    0 bytes
      Keys per prefix:    0
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Write rate: 0 bytes/second
      Compression: Snappy
      Memtablerep: skip_list
      Perf Level: 1
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      DB path: [/tmp/rocksdbtest-112628/dbbench]
      fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s
      
      [gzh@dev11575.prn2 ~/roc
      Closes https://github.com/facebook/rocksdb/pull/1564
      
      Differential Revision: D4241093
      
      Pulled By: lightmark
      
      fbshipit-source-id: 98c29e3
      972f96b3
  30. 22 12月, 2016 1 次提交
  31. 14 12月, 2016 2 次提交