1. 27 7月, 2018 2 次提交
  2. 26 7月, 2018 1 次提交
  3. 24 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Implement unprepared batches for transactions (#4104) · ea212e53
      Manuel Ung 提交于
      Summary:
      This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
      
      Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
      
      A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
      
      Differential Revision: D8785717
      
      Pulled By: lth
      
      fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
      ea212e53
  4. 21 7月, 2018 3 次提交
  5. 20 7月, 2018 2 次提交
    • Y
      Fix a bug in MANIFEST group commit (#4157) · 2736752b
      Yanqin Jin 提交于
      Summary:
      PR #3944 introduces group commit of `VersionEdit` in MANIFEST. The
      implementation has a bug. When updating the log file number of each column
      family, we must consider only `VersionEdit`s that operate on the same column
      family. Otherwise, a column family may accidentally set its log file number
      higher than actual value, indicating that log files with smaller file number
      will be ignored, thus causing some updates to be lost.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4157
      
      Differential Revision: D8916650
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8f456cf688f17bf35ad87b38e30e899aa162f201
      2736752b
    • D
      Return new operator for Status allocations for Windows (#4128) · 78ab11cd
      Dmitri Smirnov 提交于
      Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive.
      
      Differential Revision: D8878047
      
      Pulled By: siying
      
      fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36
      78ab11cd
  6. 19 7月, 2018 1 次提交
  7. 18 7月, 2018 4 次提交
    • S
      DBSSTTest.DeleteSchedulerMultipleDBPaths data race (#4146) · 37e0fdc8
      Siying Dong 提交于
      Summary:
      Fix a minor data race in DBSSTTest.DeleteSchedulerMultipleDBPaths reported by TSAN
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4146
      
      Differential Revision: D8880945
      
      Pulled By: siying
      
      fbshipit-source-id: 25c632f685757735c59ad4ff26b2f346a443a446
      37e0fdc8
    • Y
      Fix write get stuck when pipelined write is enabled (#4143) · d538ebdf
      Yi Wu 提交于
      Summary:
      Fix the issue when pipelined write is enabled, writers can get stuck indefinitely and not able to finish the write. It can show with the following example: Assume there are 4 writers W1, W2, W3, W4 (W1 is the first, W4 is the last).
      
      T1: all writers pending in WAL writer queue:
      WAL writer queue: W1, W2, W3, W4
      memtable writer queue: empty
      
      T2. W1 finish WAL writer and move to memtable writer queue:
      WAL writer queue: W2, W3, W4,
      memtable writer queue: W1
      
      T3. W2 and W3 finish WAL write as a batch group. W2 enter ExitAsBatchGroupLeader and move the group to memtable writer queue, but before wake up next leader.
      WAL writer queue: W4
      memtable writer queue: W1, W2, W3
      
      T4. W1, W2, W3 finish memtable write as a batch group. Note that W2 still in the previous ExitAsBatchGroupLeader, although W1 have done memtable write for W2.
      WAL writer queue: W4
      memtable writer queue: empty
      
      T5. The thread corresponding to W3 create another writer W3' with the same address as W3.
      WAL writer queue: W4, W3'
      memtable writer queue: empty
      
      T6. W2 continue with ExitAsBatchGroupLeader. Because the address of W3' is the same as W3, the last writer in its group, it thinks there are no pending writers, so it reset newest_writer_ to null, emptying the queue. W4 and W3' are deleted from the queue and will never be wake up.
      
      The issue exists since pipelined write was introduced in 5.5.0.
      
      Closes #3704
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4143
      
      Differential Revision: D8871599
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 3502674e51066a954a0660257e24ac588f815e2a
      d538ebdf
    • S
      Remove managed iterator · ddc07b40
      Siying Dong 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4124
      
      Differential Revision: D8829910
      
      Pulled By: siying
      
      fbshipit-source-id: f3e952ccf3a631071a5d77c48e327046f8abb560
      ddc07b40
    • S
      Pending output file number should be released after bulkload failure (#4145) · 995fcf75
      Siying Dong 提交于
      Summary:
      If bulkload fails for an input error, the pending output file number wasn't released. This bug can cause all future files with larger number than the current number won't be deleted, even they are compacted. This commit fixes the bug.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4145
      
      Differential Revision: D8877900
      
      Pulled By: siying
      
      fbshipit-source-id: 080be92a23d43305ca1e13fe1c06eb4cd0b01466
      995fcf75
  8. 17 7月, 2018 2 次提交
  9. 14 7月, 2018 7 次提交
    • N
      Support range deletion tombstones in IngestExternalFile SSTs (#3778) · ef7815b8
      Nathan VanBenschoten 提交于
      Summary:
      Fixes #3391.
      
      This change adds a `DeleteRange` method to `SstFileWriter` and adds
      support for ingesting SSTs with range deletion tombstones. This is
      important for applications that need to atomically ingest SSTs while
      clearing out any existing keys in a given key range.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3778
      
      Differential Revision: D8821836
      
      Pulled By: anand1976
      
      fbshipit-source-id: ca7786c1947ff129afa703dab011d524c7883844
      ef7815b8
    • P
      Relax VersionStorageInfo::GetOverlappingInputs check (#4050) · 90fc4069
      Peter Mattis 提交于
      Summary:
      Do not consider the range tombstone sentinel key as causing 2 adjacent
      sstables in a level to overlap. When a range tombstone's end key is the
      largest key in an sstable, the sstable's end key is so to a "sentinel"
      value that is the smallest key in the next sstable with a sequence
      number of kMaxSequenceNumber. This "sentinel" is guaranteed to not
      overlap in internal-key space with the next sstable. Unfortunately,
      GetOverlappingFiles uses user-keys to determine overlap and was thus
      considering 2 adjacent sstables in a level to overlap if they were
      separated by this sentinel key. This in turn would cause compactions to
      be larger than necessary.
      
      Note that this conflicts with
      https://github.com/facebook/rocksdb/pull/2769 and cases
      `DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to
      fail.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050
      
      Differential Revision: D8844423
      
      Pulled By: ajkr
      
      fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
      90fc4069
    • Y
      Reduce execution time of IngestFileWithGlobalSeqnoRandomized (#4131) · 21171615
      Yanqin Jin 提交于
      Summary:
      Make `ExternalSSTFileTest.IngestFileWithGlobalSeqnoRandomized` run faster.
      
      `make format`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4131
      
      Differential Revision: D8839952
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4a7e842fde1cde4dc902e928a1cf511322578521
      21171615
    • M
      Per-thread unique test db names (#4135) · 8581a93a
      Maysam Yabandeh 提交于
      Summary:
      The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
      Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
      
      Differential Revision: D8846653
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
      8581a93a
    • F
      Converted db/merge_test.cc to use gtest (#4114) · 8527012b
      Fosco Marotto 提交于
      Summary:
      Picked up a task to convert this to use the gtest framework.  It can't be this simple, can it?
      
      It works, but should all the std::cout be removed?
      
      ```
      [$] ~/git/rocksdb [gft !]: ./merge_test
      [==========] Running 2 tests from 1 test case.
      [----------] Global test environment set-up.
      [----------] 2 tests from MergeTest
      [ RUN      ] MergeTest.MergeDbTest
      Test read-modify-write counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Compaction started ...
      Compaction ended
      a: 3
      b: 1225
      Test merge-based counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test merge in memtable...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test Partial-Merge
      Test merge-operator not set after reopen
      [       OK ] MergeTest.MergeDbTest (93 ms)
      [ RUN      ] MergeTest.MergeDbTtlTest
      Opening database with TTL
      Test read-modify-write counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Compaction started ...
      Compaction ended
      a: 3
      b: 1225
      Test merge-based counters...
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test merge in memtable...
      Opening database with TTL
      a: 3
      1
      2
      a: 3
      b: 1225
      3
      Test Partial-Merge
      Opening database with TTL
      Opening database with TTL
      Opening database with TTL
      Opening database with TTL
      Test merge-operator not set after reopen
      [       OK ] MergeTest.MergeDbTtlTest (97 ms)
      [----------] 2 tests from MergeTest (190 ms total)
      
      [----------] Global test environment tear-down
      [==========] 2 tests from 1 test case ran. (190 ms total)
      [  PASSED  ] 2 tests.
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4114
      
      Differential Revision: D8822886
      
      Pulled By: gfosco
      
      fbshipit-source-id: c299d008e883c3bb911d2b357a2e9e4423f8e91a
      8527012b
    • A
      Re-enable kUniversalSubcompactions option_config (#4125) · e3eba52a
      Anand Ananthabhotla 提交于
      Summary:
      1. Move kUniversalSubcompactions up before kEnd in db_test_util.h, so
      tests that cycle through all the option_configs include this
      2. Skip kUniversalSubcompactions wherever kUniversalCompaction and
      kUniversalCompactionMultilevel are skipped
      
      Related to #3935
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4125
      
      Differential Revision: D8828637
      
      Pulled By: anand1976
      
      fbshipit-source-id: 650dee15fd27d85281cf9bb4ca8ab460e04cac6f
      e3eba52a
    • T
      Add GCC 8 to Travis (#3433) · 7bee48bd
      Tamir Duberstein 提交于
      Summary:
      - Avoid `strdup` to use jemalloc on Windows
      - Use `size_t` for consistency
      - Add GCC 8 to Travis
      - Add CMAKE_BUILD_TYPE=Release to Travis
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3433
      
      Differential Revision: D6837948
      
      Pulled By: sagar0
      
      fbshipit-source-id: b8543c3a4da9cd07ee9a33f9f4623188e233261f
      7bee48bd
  10. 13 7月, 2018 3 次提交
    • Y
      Reduce execution time of a test. (#4127) · 90ebf1a2
      Yanqin Jin 提交于
      Summary:
      Reduce the number of key ranges in `ExternalSSTFileTest.OverlappingRanges` so
      that the test completes in shorter time to avoid timeouts.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4127
      
      Differential Revision: D8827851
      
      Pulled By: riversand963
      
      fbshipit-source-id: a16387b0cc92a7c872b1c50f0cfbadc463afc9db
      90ebf1a2
    • Y
      Reduce #iterations to shorten execution time. (#4123) · dbeaa0d3
      Yanqin Jin 提交于
      Summary:
      Reduce #iterations from 5000 to 1000 so that
      `ExternalSSTFileTest.CompactDuringAddFileRandom` can finish faster.
      On the one hand, 5000 iterations does not seem to improve the quality of unit
      test in comparison with 1000. On the other hand, long running tests should belong to stress tests.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4123
      
      Differential Revision: D8822514
      
      Pulled By: riversand963
      
      fbshipit-source-id: 0f439b8d5ccd9a4aed84638f8bac16382de17245
      dbeaa0d3
    • N
      Range deletion performance improvements + cleanup (#4014) · 5f3088d5
      Nikhil Benesch 提交于
      Summary:
      This fixes the same performance issue that #3992 fixes but with much more invasive cleanup.
      
      I'm more excited about this PR because it paves the way for fixing another problem we uncovered at Cockroach where range deletion tombstones can cause massive compactions. For example, suppose L4 contains deletions from [a, c) and [x, z) and no other keys, and L5 is entirely empty. L6, however, is full of data. When compacting L4 -> L5, we'll end up with one file that spans, massively, from [a, z). When we go to compact L5 -> L6, we'll have to rewrite all of L6! If, instead of range deletions in L4, we had keys a, b, x, y, and z, RocksDB would have been smart enough to create two files in L5: one for a and b and another for x, y, and z.
      
      With the changes in this PR, it will be possible to adjust the compaction logic to split tombstones/start new output files when they would span too many files in the grandparent level.
      
      ajkr please take a look when you have a minute!
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4014
      
      Differential Revision: D8773253
      
      Pulled By: ajkr
      
      fbshipit-source-id: ec62fa85f648fdebe1380b83ed997f9baec35677
      5f3088d5
  11. 12 7月, 2018 2 次提交
    • N
      Test range deletions with more configurations (#4021) · 5cd8240b
      Nikhil Benesch 提交于
      Summary:
      Run the basic range deletion tests against the standard set of
      configurations. This testing exposed that files with hash indexes and
      partitioned indexes were not handling the case where the file contained
      only range deletions--i.e., where the index was empty.
      
      Additionally file a TODO about the fact that range deletions are broken
      when allow_mmap_reads = true is set.
      
      /cc ajkr nvanbenschoten
      
      Best viewed with ?w=1: https://github.com/facebook/rocksdb/pull/4021/files?w=1
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4021
      
      Differential Revision: D8811860
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3cc07e6d6210a2a00b932866481b3d5c59775343
      5cd8240b
    • Y
      SetOptions Backup Race Condition (#4108) · 331cb636
      Yanqin Jin 提交于
      Summary:
      Prior to this PR, there was a race condition between `DBImpl::SetOptions` and `BackupEngine::CreateNewBackup`, as illustrated below.
      ```
      Time                  thread 1                           thread 2
        |   CreateNewBackup -> GetLiveFiles
        |                                         SetOptions -> RenameTempFileToOptionsFile
        |                                         SetOptions -> RenameTempFileToOptionsFile
        |                                         SetOptions -> RenameTempFileToOptionsFile // unlink oldest OPTIONS file
        |   copy the oldest OPTIONS // IO error!
        V
      ```
      Proposed fix is to check the value of `DBImpl::disable_obsolete_files_deletion_` before calling `DeleteObsoleteOptionsFiles`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4108
      
      Differential Revision: D8796360
      
      Pulled By: riversand963
      
      fbshipit-source-id: 02045317f793ea4c7d4400a5bf333b8502fa3e82
      331cb636
  12. 07 7月, 2018 2 次提交
    • M
      WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) · b9846370
      Manuel Ung 提交于
      Summary:
      This adds support for recovering WriteUnprepared transactions through the following changes:
      - The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
      - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
      - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.
      
      Commit/Rollback of live transactions is still unimplemented and will come later.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078
      
      Differential Revision: D8703382
      
      Pulled By: lth
      
      fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
      b9846370
    • Y
      Fix a map lookup that may throw exception. (#4098) · db7ae0a4
      Yanqin Jin 提交于
      Summary:
      `std::map::at(key)` throws std::out_of_range if key does not exist. Current
      code does not handle this. Although this case is unlikely, I feel it's safe to
      use `std::map::find`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4098
      
      Differential Revision: D8753865
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9a9ba43badb0fb5e0d24cd87903931fd12f3f8ec
      db7ae0a4
  13. 06 7月, 2018 1 次提交
  14. 29 6月, 2018 4 次提交
    • Z
      fix clang analyzer warnings (#4072) · b3efb1cb
      Zhongyi Xie 提交于
      Summary:
      clang analyze is giving the following warnings:
      > db/compaction_job.cc:1178:16: warning: Called C++ object pointer is null
          } else if (meta->smallest.size() > 0) {
                     ^~~~~~~~~~~~~~~~~~~~~
      db/compaction_job.cc:1201:33: warning: Access to field 'marked_for_compaction' results in a dereference of a null pointer (loaded from variable 'meta')
          meta->marked_for_compaction = sub_compact->builder->NeedCompact();
          ~~~~
      db/version_set.cc:2770:26: warning: Called C++ object pointer is null
              uint32_t cf_id = last_writer->cfd->GetID();
                               ^~~~~~~~~~~~~~~~~~~~~~~~~
      Closes https://github.com/facebook/rocksdb/pull/4072
      
      Differential Revision: D8685852
      
      Pulled By: miasantreble
      
      fbshipit-source-id: b0e2fd9dfc1cbba2317723e09886384b9b1c9085
      b3efb1cb
    • M
      WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) · 8ad63a4b
      Manuel Ung 提交于
      Summary:
      This adds a new WAL marker of type kTypeBeginUnprepareXID.
      
      Also, DBImpl now contains a field called batch_per_txn (meaning one WriteBatch per transaction, or possibly multiple WriteBatches). This would also indicate that this DB is using WriteUnprepared policy.
      
      Recovery code would be able to make use of this extra field on DBImpl in a separate diff. For now, it is just used to determine whether the WAL is compatible or not.
      Closes https://github.com/facebook/rocksdb/pull/4069
      
      Differential Revision: D8675099
      
      Pulled By: lth
      
      fbshipit-source-id: ca27cae1738e46d65f2bb92860fc759deb874749
      8ad63a4b
    • A
      Allow DB resume after background errors (#3997) · 52d4c9b7
      Anand Ananthabhotla 提交于
      Summary:
      Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
      1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
      2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
      3. Provide an API for the user to clear the error and resume the DB instance
      
      This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
      Closes https://github.com/facebook/rocksdb/pull/3997
      
      Differential Revision: D8653831
      
      Pulled By: anand1976
      
      fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
      52d4c9b7
    • Y
      Support group commits of version edits (#3944) · 26d67e35
      Yanqin Jin 提交于
      Summary:
      This PR supports the group commit of multiple version edit entries corresponding to different column families. Column family drop/creation still cannot be grouped. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
      Closes https://github.com/facebook/rocksdb/pull/3944
      
      Differential Revision: D8432536
      
      Pulled By: riversand963
      
      fbshipit-source-id: 8f11bd05193b6c0d9272d82e44b676abfac113cb
      26d67e35
  15. 28 6月, 2018 5 次提交