1. 24 3月, 2017 1 次提交
    • W
      Add C API functions (and tests) for WriteBatchWithIndex · 41ccae6d
      Warren Falk 提交于
      Summary:
      I've added functions to the C API to support WriteBatchWithIndex as requested in #1833.
      
      I've also added unit tests to c_test
      
      I've implemented the WriteBatchWithIndex variation of every function available for regular WriteBatch.  And added additional functions unique to WriteBatchWithIndex.
      
      For now, the following is omitted:
        1. The ability to create WriteBatchWithIndex's custom batch-only iterator as I'm not sure what its purpose is.  It should be possible to add later if anyone wants it.
        2. The ability to create the batch with a fallback comparator, since it appears to be unnecessary.  I believe the column family comparator will be used for this, meaning those using a custom comparator can just use the column family variations.
      Closes https://github.com/facebook/rocksdb/pull/1985
      
      Differential Revision: D4760039
      
      Pulled By: siying
      
      fbshipit-source-id: 393227e
      41ccae6d
  2. 23 3月, 2017 5 次提交
  3. 22 3月, 2017 1 次提交
  4. 21 3月, 2017 1 次提交
  5. 17 3月, 2017 2 次提交
    • I
      Break stalls when no bg work is happening · d52f334c
      Islam AbdelRahman 提交于
      Summary:
      Current stall will keep sleeping even if there is no Flush/Compactions to wait for, I changed the logic to break the stall if we are not flushing or compacting
      
      db_bench command used
      ```
      # fillrandom
      # memtable size = 10MB
      # value size = 1 MB
      # num = 1000
      # use /dev/shm
      ./db_bench --benchmarks="fillrandom,stats" --value_size=1048576 --write_buffer_size=10485760 --num=1000 --delayed_write_rate=XXXXX  --db="/dev/shm/new_stall" | grep "Cumulative stall"
      ```
      
      ```
      Current results
      
      # delayed_write_rate = 1000 Kb/sec
      Cumulative stall: 00:00:9.031 H:M:S
      
      # delayed_write_rate = 200 Kb/sec
      Cumulative stall: 00:00:22.314 H:M:S
      
      # delayed_write_rate = 100 Kb/sec
      Cumulative stall: 00:00:42.784 H:M:S
      
      # delayed_write_rate = 50 Kb/sec
      Cumulative stall: 00:01:23.785 H:M:S
      
      # delayed_write_rate = 25 Kb/sec
      Cumulative stall: 00:02:45.702 H:M:S
      ```
      
      ```
      New results
      
      # delayed_write_rate = 1000 Kb/sec
      Cumulative stall: 00:00:9.017 H:M:S
      
      # delayed_write_rate = 200 Kb/sec
      Cumulative stall: 00
      Closes https://github.com/facebook/rocksdb/pull/1884
      
      Differential Revision: D4585439
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: aed2198
      d52f334c
    • I
      Support SstFileManager::SetDeleteRateBytesPerSecond() · 995618a8
      Islam AbdelRahman 提交于
      Summary:
      Update DeleteScheduler component to support changing delete rate in runtime by introducing
      SstFileManager::SetDeleteRateBytesPerSecond()
      Closes https://github.com/facebook/rocksdb/pull/1994
      
      Differential Revision: D4719906
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: e6b8d9e
      995618a8
  6. 16 3月, 2017 1 次提交
    • I
      Add macros to include file name and line number during Logging · e1916368
      Islam AbdelRahman 提交于
      Summary:
      current logging
      ```
      2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
      2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
      2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
      2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
      2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
      2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
      2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
      2017/03/14-14:20:31.
      Closes https://github.com/facebook/rocksdb/pull/1990
      
      Differential Revision: D4708695
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: cb8968f
      e1916368
  7. 14 3月, 2017 4 次提交
    • M
      Pinnableslice (2nd attempt) · 11526252
      Maysam Yabandeh 提交于
      Summary:
      PinnableSlice
      
          Summary:
          Currently the point lookup values are copied to a string provided by the
          user. This incures an extra memcpy cost. This patch allows doing point lookup
          via a PinnableSlice which pins the source memory location (instead of
          copying their content) and releases them after the content is consumed
          by the user. The old API of Get(string) is translated to the new API
          underneath.
      
          Here is the summary for improvements:
      
          value 100 byte: 1.8% regular, 1.2% merge values
          value 1k byte: 11.5% regular, 7.5% merge values
          value 10k byte: 26% regular, 29.9% merge values
          The improvement for merge could be more if we extend this approach to
          pin the merge output and delay the full merge operation until the user
          actually needs it. We have put that for future work.
      
          PS:
          Sometimes we observe a small decrease in performance when switching from
          t5452014 to this patch but with the old Get(string) API. The d
      Closes https://github.com/facebook/rocksdb/pull/1756
      
      Differential Revision: D4391738
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6f3edd3
      11526252
    • S
      Add a new SstFileWriter constructor without explicit comparator · 1ffbdfd9
      Sagar Vemuri 提交于
      Summary:
      The comparator param in SstFileWriter constructor is redundant as it already exists as a field in options. So the current SstFileWriter constructor should be deprecated in favor of a new one which does not take a comparator.
      Note that the jni/java apis have not been touched yet.
      Closes https://github.com/facebook/rocksdb/pull/1978
      
      Differential Revision: D4685629
      
      Pulled By: sagar0
      
      fbshipit-source-id: 372ce96
      1ffbdfd9
    • R
      Add ability to search for key prefix in sst_dump tool · ebd5639b
      Reid Horuff 提交于
      Summary:
      Add the flag --prefix to the sst_dump tool
      This flag is similar to, and exclusive from, the --from flag.
      
      --prefix=0x00FF will return all rows prefixed with 0x00FF.
      The --to flag may also be specified and will work as expected.
      
      These changes were used to help in debugging the power cycle corruption issue and theses changes were tested by scanning through a udb.
      Closes https://github.com/facebook/rocksdb/pull/1984
      
      Differential Revision: D4691814
      
      Pulled By: reidHoruff
      
      fbshipit-source-id: 027f261
      ebd5639b
    • M
      Fix some bugs in MockEnv · e6725e8c
      Maysam Yabandeh 提交于
      Summary:
      Fixing some bugs in MockEnv so it be actually used.
      Closes https://github.com/facebook/rocksdb/pull/1914
      
      Differential Revision: D4609923
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ca25735
      e6725e8c
  8. 09 3月, 2017 4 次提交
  9. 08 3月, 2017 2 次提交
    • S
      Add a memtable-only iterator · 97edc72d
      Sagar Vemuri 提交于
      Summary:
      This PR is to support a way to iterate over all the keys that are just in memtables.
      Closes https://github.com/facebook/rocksdb/pull/1953
      
      Differential Revision: D4663500
      
      Pulled By: sagar0
      
      fbshipit-source-id: 144e177
      97edc72d
    • L
      fix db_sst_test flakiness · 72202962
      Leonidas Galanis 提交于
      Summary:
      db_sst_test had been flaky occasionally in the following way: reached_max_space_on_compaction can in very rare cases be 0. This happens when the limit on maximum allowable space set using SetMaxAllowedSpaceUsage is hit during flush for all test db sizes (1,2,4,8 and 10MB).The fix clears the error returned when the the space limit is reached during flush. This ensures that the compaction call back will always be called. The runtime is increased slightly because the 1MB loop writes more data and hits the limit during multiple flushes until compaction is scheduled.
      Closes https://github.com/facebook/rocksdb/pull/1861
      
      Differential Revision: D4557396
      
      Pulled By: lgalanis
      
      fbshipit-source-id: ff778d1
      72202962
  10. 07 3月, 2017 1 次提交
    • R
      Set logs as getting flushed before releasing lock, race condition fix · 58b12dfe
      Reid Horuff 提交于
      Summary:
      Relating to #1903:
      
      In MaybeFlushColumnFamilies() we want to modify the 'getting_flushed' flag before releasing the db mutex when SwitchMemtable() is called.
      
      The following 2 actions need to be atomic in MaybeFlushColumnFamilies()
      - getting_flushed is false on oldest log
      - we determine that all CFs can be flushed to successfully release oldest log
      - we set getting_flushed = true on the oldest log.
      -------
      - getting_flushed is false on oldest log
      - we determine that all CFs can NOT be flushed to successfully release oldest log
      - we set unable_to_flush_oldest_log_ = true on the oldest log.
      
      #### In the 2pc case:
      
      T1 enters function but is unable to flush all CFs to release log
      T1 sets unable_to_flush_oldest_log_ = true
      T1 begins flushing all CFs possible
      
      T2 enters function but is unable to flush all CFs to release log
      T2 sees unable_to_flush_oldes_log_ has been set so exits
      
      T3 enters function and will be able to flush all CFs to release oldest log
      T3 sets getting_flushed = true on oldes
      Closes https://github.com/facebook/rocksdb/pull/1909
      
      Differential Revision: D4646235
      
      Pulled By: reidHoruff
      
      fbshipit-source-id: c8d0447
      58b12dfe
  11. 06 3月, 2017 1 次提交
    • M
      Fix a bug in tests in options operator= · 534581a3
      Maysam Yabandeh 提交于
      Summary:
      Note: Using the default operator= is an unsafe approach for Options since it destructs shared_ptr in
      the same order of their creation, in contrast to destructors which
      destructs them in the opposite order of creation. One particular problme is
      that the cache destructor might invoke callback functions that use Option
      members such as statistics. To work around this problem, we manually call
      destructor of table_facotry which eventually clears the block cache.
      Closes https://github.com/facebook/rocksdb/pull/1950
      
      Differential Revision: D4655473
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 6c4bbff
      534581a3
  12. 04 3月, 2017 1 次提交
  13. 03 3月, 2017 3 次提交
  14. 01 3月, 2017 3 次提交
  15. 28 2月, 2017 2 次提交
  16. 24 2月, 2017 4 次提交
  17. 23 2月, 2017 2 次提交
  18. 22 2月, 2017 2 次提交
    • M
      Fix interference between max_total_wal_size and db_write_buffer_size checks · 18eeb7b9
      Mike Kolupaev 提交于
      Summary:
      This is a trivial fix for OOMs we've seen a few days ago in logdevice.
      
      RocksDB get into the following state:
      (1) Write throughput is too high for flushes to keep up. Compactions are out of the picture - automatic compactions are disabled, and for manual compactions we don't care that much if they fall behind. We write to many CFs, with only a few L0 sst files in each, so compactions are not needed most of the time.
      (2) total_log_size_ is consistently greater than GetMaxTotalWalSize(). It doesn't get smaller since flushes are falling ever further behind.
      (3) Total size of memtables is way above db_write_buffer_size and keeps growing. But the write_buffer_manager_->ShouldFlush() is not checked because (2) prevents it (for no good reason, afaict; this is what this commit fixes).
      (4) Every call to WriteImpl() hits the MaybeFlushColumnFamilies() path. This keeps flushing the memtables one by one in order of increasing log file number.
      (5) No write stalling trigger is hit. We rely on max_write_buffer_number
      Closes https://github.com/facebook/rocksdb/pull/1893
      
      Differential Revision: D4593590
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: af79c5f
      18eeb7b9
    • A
      level compaction expansion · 2a0f3d0d
      Aaron Gao 提交于
      Summary:
      reimplement the compaction expansion on lower level.
      
      Considering such a case:
      input level file: 1[B E] 2[F G] 3[H I] 4 [J M]
      output level file: 5[A C] 6[D K] 7[L O]
      
      If we initially pick file 2, now we will compact file 2 and 6. But we can safely compact 2, 3 and 6 without expanding the output level.
      
      The previous code is messy and wrong.
      
      In this diff, I first determine the input range [a, b], and output range [c, d],
      then we get the range [e,f] = [min(a, c), max(b, d] and put all eligible clean-cut files within [e, f] into this compaction.
      
      **Note: clean-cut means the files don't have the same user key on the boundaries of some files that are not chosen in this compaction**.
      Closes https://github.com/facebook/rocksdb/pull/1760
      
      Differential Revision: D4395564
      
      Pulled By: lightmark
      
      fbshipit-source-id: 2dc2c5c
      2a0f3d0d