1. 14 2月, 2017 1 次提交
    • Y
      Make DBImpl::has_unpersisted_data_ atomic · c2247dc1
      Yi Wu 提交于
      Summary:
      Seems to me `has_unpersisted_data_` is read from read thread and write
      from write thread concurrently without synchronization. Making it an
      atomic.
      
      I update the logic not because seeing any problem with it, but it just
      feel confusing.
      Closes https://github.com/facebook/rocksdb/pull/1869
      
      Differential Revision: D4555837
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: eff2ab8
      c2247dc1
  2. 07 2月, 2017 1 次提交
  3. 25 1月, 2017 1 次提交
  4. 21 1月, 2017 1 次提交
  5. 20 1月, 2017 1 次提交
    • R
      Fix for 2PC causing WAL to grow too large · 5cf176ca
      Reid Horuff 提交于
      Summary:
      Consider the following single column family scenario:
      prepare in log A
      commit in log B
      *WAL is too large, flush all CFs to releast log A*
      *CFA is on log B so we do not see CFA is depending on log A so no flush is requested*
      
      To fix this we must also consider the log containing the prepare section when determining what log a CF is dependent on.
      Closes https://github.com/facebook/rocksdb/pull/1768
      
      Differential Revision: D4403265
      
      Pulled By: reidHoruff
      
      fbshipit-source-id: ce800ff
      5cf176ca
  6. 09 1月, 2017 2 次提交
    • M
      Revert "PinnableSlice" · d0ba8ec8
      Maysam Yabandeh 提交于
      Summary:
      This reverts commit 54d94e9c.
      
      The pull request was landed by mistake.
      Closes https://github.com/facebook/rocksdb/pull/1755
      
      Differential Revision: D4391678
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 36d5149
      d0ba8ec8
    • M
      PinnableSlice · 54d94e9c
      Maysam Yabandeh 提交于
      Summary:
      Currently the point lookup values are copied to a string provided by the user.
      This incures an extra memcpy cost. This patch allows doing point lookup
      via a PinnableSlice which pins the source memory location (instead of
      copying their content) and releases them after the content is consumed
      by the user. The old API of Get(string) is translated to the new API
      underneath.
      
       Here is the summary for improvements:
       1. value 100 byte: 1.8%  regular, 1.2% merge values
       2. value 1k   byte: 11.5% regular, 7.5% merge values
       3. value 10k byte: 26% regular,    29.9% merge values
      
       The improvement for merge could be more if we extend this approach to
       pin the merge output and delay the full merge operation until the user
       actually needs it. We have put that for future work.
      
      PS:
      Sometimes we observe a small decrease in performance when switching from
      t5452014 to this patch but with the old Get(string) API. The difference
      is a little and could be noise. More importantly it is safely
      cancelled
      Closes https://github.com/facebook/rocksdb/pull/1732
      
      Differential Revision: D4374613
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: a077f1a
      54d94e9c
  7. 29 12月, 2016 1 次提交
  8. 07 12月, 2016 1 次提交
  9. 06 12月, 2016 1 次提交
  10. 22 11月, 2016 1 次提交
  11. 15 11月, 2016 1 次提交
  12. 12 11月, 2016 1 次提交
  13. 05 11月, 2016 1 次提交
    • A
      DeleteRange user iterator support · 9e7cf346
      Andrew Kryczka 提交于
      Summary:
      Note: reviewed in  https://reviews.facebook.net/D65115
      
      - DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
      - DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
      - DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
      Closes https://github.com/facebook/rocksdb/pull/1464
      
      Differential Revision: D4131753
      
      Pulled By: ajkr
      
      fbshipit-source-id: be86559
      9e7cf346
  14. 22 10月, 2016 1 次提交
  15. 21 10月, 2016 1 次提交
    • I
      Support IngestExternalFile (remove AddFile restrictions) · 869ae5d7
      Islam AbdelRahman 提交于
      Summary:
      Changes in the diff
      
      API changes:
      - Introduce IngestExternalFile to replace AddFile (I think this make the API more clear)
      - Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file)
      - Deprecate AddFile() API
      
      Logic changes:
      - If our file overlap with the memtable we will flush the memtable
      - We will find the first level in the LSM tree that our file key range overlap with the keys in it
      - We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it
      - We will assign a global sequence number to our new file
      - Remove AddFile restrictions by using global sequence numbers
      
      Other changes:
      - Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob
      
      Test Plan:
      unit tests (still need to add more)
      addfile_stress (https://reviews.facebook.net/D65037)
      
      Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: jkedgar, hcz, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D65061
      869ae5d7
  16. 19 10月, 2016 1 次提交
  17. 15 10月, 2016 1 次提交
  18. 14 10月, 2016 1 次提交
    • I
      Fix compaction conflict with running compaction · 5691a1d8
      Islam AbdelRahman 提交于
      Summary:
      Issue scenario:
      (1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2
      (2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files)
      (3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency
      
      Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile()
      
      Test Plan: unit tests
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D64947
      5691a1d8
  19. 29 9月, 2016 1 次提交
    • I
      Fix conflict between AddFile() and CompactRange() · 87dfc1d2
      Islam AbdelRahman 提交于
      Summary:
      Fix the conflict bug between AddFile() and CompactRange() by
      - Make sure that no AddFile calls are running when asking CompactionPicker to pick compaction for manual compaction
      - If AddFile() run after we pick the compaction for the manual compaction it will be aware of it since we will add the manual compaction to running_compactions_ after picking it
      
      This will solve these 2 scenarios
      - If AddFile() is running, we will wait for it to finish before we pick a compaction for the manual compaction
      - If we already picked a manual compaction and then AddFile() started ... we ensure that it never ingest a file in a level that will overlap with the manual compaction
      
      Test Plan: unit tests
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, yoshinorim, jkedgar, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64449
      87dfc1d2
  20. 27 9月, 2016 1 次提交
    • I
      Fix AddFile() conflict with compaction output [WaitForAddFile()] · 5c64fb67
      Islam AbdelRahman 提交于
      Summary:
      Since AddFile unlock/lock the mutex inside LogAndApply() we need to ensure that during this period other compactions cannot run since such compactions are not aware of the file we are ingesting and could create a compaction that overlap wit this file
      
      this diff add
      - WaitForAddFile() call that will ensure that no AddFile() calls are being processed right now
      - Call `WaitForAddFile()` in 3 locations
      -- When doing manual Compaction
      -- When starting automatic Compaction
      -- When  doing CompactFiles()
      
      Test Plan: unit test
      
      Reviewers: lightmark, yiwu, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, yoshinorim, jkedgar, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64383
      5c64fb67
  21. 24 9月, 2016 2 次提交
    • Y
      Split DBOptions into ImmutableDBOptions and MutableDBOptions · 9ed928e7
      Yi Wu 提交于
      Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.
      
      Test Plan:
        make all check
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64065
      9ed928e7
    • Y
      Recover same sequence id from WAL (#1350) · 4bc8c88e
      yiwu-arbug 提交于
      Summary:
      Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later.
      
      This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id.
      
      Test Plan: Added unit test.
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64275
      4bc8c88e
  22. 20 9月, 2016 2 次提交
    • S
      DBImpl::GetWalPreallocateBlockSize() should return size_t · d78a4401
      sdong 提交于
      Summary: WritableFile::SetPreallocationBlockSize() requires parameter as size_t, and options used in DBImpl::GetWalPreallocateBlockSize() are all size_t. WritableFile::SetPreallocationBlockSize() should return size_t to avoid build break if size_t is not uint64_t.
      
      Test Plan: Run existing tests.
      
      Reviewers: andrewkr, IslamAbdelRahman, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64137
      d78a4401
    • S
      Consider more factors when determining preallocation size of WAL files · b666f854
      sdong 提交于
      Summary: Currently the WAL file preallocation size is 1.1 * write_buffer_size. This, however, will be over-estimated if options.db_write_buffer_size or options.max_total_wal_size is set and is much smaller.
      
      Test Plan: Add a unit test.
      
      Reviewers: andrewkr, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D63957
      b666f854
  23. 15 9月, 2016 1 次提交
  24. 09 9月, 2016 1 次提交
  25. 30 8月, 2016 1 次提交
    • A
      support Prev() in prefix seek mode · 2482d5fb
      Aaron Gao 提交于
      Summary: As title, make sure Prev() works as expected with Next() when the current iter->key() in the range of the same prefix in prefix seek mode
      
      Test Plan: make all check -j64 (add prefix_test with PrefixSeekModePrev test case)
      
      Reviewers: andrewkr, sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: yoshinorim, andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D61419
      2482d5fb
  26. 20 8月, 2016 1 次提交
    • W
      TableBuilder / TableReader support for range deletion · 78837f5d
      Wanning Jiang 提交于
      Summary: 1. Range Deletion Tombstone structure 2. Modify Add() in table_builder to make it usable for adding range del tombstones 3. Expose NewTombstoneIterator() API in table_reader
      
      Test Plan: table_test.cc (now BlockBasedTableBuilder::Add() only accepts InternalKey. I make table_test only pass InternalKey to BlockBasedTableBuidler. Also test writing/reading range deletion tombstones in table_test )
      
      Reviewers: sdong, IslamAbdelRahman, lightmark, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D61473
      78837f5d
  27. 11 8月, 2016 1 次提交
    • S
      read_options.background_purge_on_iterator_cleanup to cover forward iterator... · 56dd0341
      sdong 提交于
      read_options.background_purge_on_iterator_cleanup to cover forward iterator and log file closing too.
      
      Summary: With read_options.background_purge_on_iterator_cleanup=true, File deletion and closing can still happen in forward iterator, or WAL file closing. Cover those cases too.
      
      Test Plan: I am adding unit tests.
      
      Reviewers: andrewkr, IslamAbdelRahman, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D61503
      56dd0341
  28. 10 8月, 2016 1 次提交
  29. 03 8月, 2016 1 次提交
    • Y
      Ignore write stall triggers when auto-compaction is disabled · ee027fc1
      Yi Wu 提交于
      Summary:
      My understanding is that the purpose of write stall triggers are to wait for auto-compaction to catch up. Without auto-compaction, we don't need to stall writes.
      
      Also with this diff, flush/compaction conditions are recalculated on dynamic option change. Previously the conditions are recalculate only when write stall options are changed.
      
      Test Plan: See the new test. Removed two tests that are no longer valid.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D61437
      ee027fc1
  30. 22 7月, 2016 1 次提交
    • S
      Need to make sure log file synced before flushing memtable of one column family · d5a51d4d
      sdong 提交于
      Summary: Multiput atomiciy is broken across multiple column families if we don't sync WAL before flushing one column family. The WAL file may contain a write batch containing writes to a key to the CF to be flushed and a key to other CF. If we don't sync WAL before flushing, if machine crashes after flushing, the write batch will only be partial recovered. Data to other CFs are lost.
      
      Test Plan: Add a new unit test which will fail without the diff.
      
      Reviewers: yhchiang, IslamAbdelRahman, igor, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: yiwu, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60915
      d5a51d4d
  31. 13 7月, 2016 1 次提交
    • Y
      Fix deadlock when trying update options when write stalls · 6ea41f85
      Yi Wu 提交于
      Summary:
      When write stalls because of auto compaction is disabled, or stop write trigger is reached,
      user may change these two options to unblock writes. Unfortunately we had issue where the write
      thread will block the attempt to persist the options, thus creating a deadlock. This diff
      fix the issue and add two test cases to detect such deadlock.
      
      Test Plan:
      Run unit tests.
      
      Also, revert db_impl.cc to master (but don't revert `DBImpl::BackgroundCompaction:Finish` sync point) and run db_options_test. Both tests should hit deadlock.
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D60627
      6ea41f85
  32. 12 7月, 2016 1 次提交
    • A
      update DB::AddFile to ingest list of sst files · 8e6b38d8
      Aaron Gao 提交于
      Summary:
      DB::AddFile(std::string file_path) API that allow them to ingest an SST file created using SstFileWriter
      We want to update this interface to be able to accept a list of files that will be ingested, DB::AddFile(std::vector<std::string> file_path_list).
      
      Test Plan:
      Add test case `AddExternalSstFileList` in `DBSSTTest`. To make sure:
      1. files key ranges are not overlapping with each other
      2. each file key range dont overlap with the DB key range
      3. make sure no snapshots are held
      
      Reviewers: andrewkr, sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D58587
      8e6b38d8
  33. 07 7月, 2016 1 次提交
    • S
      Add More Logging to track total_log_size · a00bf1b3
      sdong 提交于
      Summary: We saw instances where total_log_size is off the real value, but I'm not able to reproduce it. Add more logging to help debugging when it happens again.
      
      Test Plan: Run the unit test and see the logging.
      
      Reviewers: andrewkr, yhchiang, igor, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D60081
      a00bf1b3
  34. 06 7月, 2016 1 次提交
  35. 25 6月, 2016 2 次提交
    • C
      fix simple typos (#1183) · 4f2b0946
      charsyam 提交于
      4f2b0946
    • A
      Refactor to use VersionSet [CF + RepairDB part 1/3] · 343507af
      Andrew Kryczka 提交于
      Summary:
      To support column families, it is easiest to use VersionSet to manage
      our column families (if we don't have Versions then ColumnFamilyData always
      behaves as a dummy column family). This diff only refactors the existing repair
      logic to use VersionSet; the next two parts will add support for multiple
      column families.
      
      Test Plan:
        $ ./repair_test
      
      Reviewers: yhchiang, IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D59775
      343507af
  36. 22 6月, 2016 1 次提交
    • O
      Add a read option to enable background purge when cleaning up iterators · c4e19b77
      omegaga 提交于
      Summary:
      Add a read option `background_purge_on_iterator_cleanup` to avoid deleting files in foreground when destroying iterators.
      Instead, a job is scheduled in high priority queue and would be executed in a separate background thread.
      
      Test Plan: Add a variant of PurgeObsoleteFileTest. Turn on background purge option in the new test, and use sleeping task to ensure files are deleted in background.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59499
      c4e19b77