1. 01 6月, 2019 1 次提交
  2. 31 5月, 2019 2 次提交
  3. 09 2月, 2019 1 次提交
  4. 03 1月, 2019 1 次提交
  5. 14 12月, 2018 1 次提交
    • Y
      Improve flushing multiple column families (#4708) · 4fce44fc
      Yanqin Jin 提交于
      Summary:
      If one column family is dropped, we should simply skip it and continue to flush
      other active ones.
      Currently we use Status::ShutdownInProgress to notify caller of column families
      being dropped. In the future, we should consider using a different Status code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4708
      
      Differential Revision: D13378954
      
      Pulled By: riversand963
      
      fbshipit-source-id: 42f248cdf2d32d4c0f677cd39012694b8f1328ca
      4fce44fc
  6. 21 8月, 2018 1 次提交
  7. 10 8月, 2018 1 次提交
    • M
      Index value delta encoding (#3983) · caf0f53a
      Maysam Yabandeh 提交于
      Summary:
      Given that index value is a BlockHandle, which is basically an <offset, size> pair we can apply delta encoding on the values. The first value at each index restart interval encoded the full BlockHandle but the rest encode only the size. Refer to IndexBlockIter::DecodeCurrentValue for the detail of the encoding. This reduces the index size which helps using the  block cache more efficiently. The feature is enabled with using format_version 4.
      
      The feature comes with a bit of cpu overhead which should be paid back by the higher cache hits due to smaller index block size.
      Results with sysbench read-only using 4k blocks and using 16 index restart interval:
      Format 2:
      19585   rocksdb read-only range=100
      Format 3:
      19569   rocksdb read-only range=100
      Format 4:
      19352   rocksdb read-only range=100
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/3983
      
      Differential Revision: D8361343
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f882ee082322acac32b0072e2bdbb0b5f854e651
      caf0f53a
  8. 28 7月, 2018 1 次提交
    • Y
      Remove random writes from SST file ingestion (#4172) · 54de5684
      Yanqin Jin 提交于
      Summary:
      RocksDB used to store global_seqno in external SST files written by
      SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
      `global_seqno`. Since random write is not supported in some non-POSIX compliant
      file systems, external SST file ingestion is not supported on these file
      systems. To address this limitation, we no longer update `global_seqno` during
      file ingestion. Later RocksDB uses the MANIFEST and other information in table
      properties to deduce global seqno for externally-ingested SST files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172
      
      Differential Revision: D8961465
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
      54de5684
  9. 24 5月, 2018 1 次提交
  10. 04 5月, 2018 1 次提交
    • S
      Skip deleted WALs during recovery · d5954929
      Siying Dong 提交于
      Summary:
      This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      
      Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)
      
      This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
      Closes https://github.com/facebook/rocksdb/pull/3765
      
      Differential Revision: D7747618
      
      Pulled By: siying
      
      fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
      d5954929
  11. 24 4月, 2018 1 次提交
    • S
      Revert "Skip deleted WALs during recovery" · d5afa737
      Siying Dong 提交于
      Summary:
      This reverts commit 73f21a7b.
      
      It breaks compatibility. When created a DB using a build with this new change, opening the DB and reading the data will fail with this error:
      
      "Corruption: Can't access /000000.sst: IO error: while stat a file for size: /tmp/xxxx/000000.sst: No such file or directory"
      
      This is because the dummy AddFile4 entry generated by the new code will be treated as a real entry by an older build. The older build will think there is a real file with number 0, but there isn't such a file.
      Closes https://github.com/facebook/rocksdb/pull/3762
      
      Differential Revision: D7730035
      
      Pulled By: siying
      
      fbshipit-source-id: f2051859eff20ef1837575ecb1e1bb96b3751e77
      d5afa737
  12. 31 3月, 2018 1 次提交
    • M
      Skip deleted WALs during recovery · 73f21a7b
      Maysam Yabandeh 提交于
      Summary:
      This patch record the deleted WAL numbers in the manifest to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      Closes https://github.com/facebook/rocksdb/pull/3488
      
      Differential Revision: D6967893
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 13119feb155a08ab6d4909f437c7a750480dc8a1
      73f21a7b
  13. 06 3月, 2018 1 次提交
  14. 23 2月, 2018 2 次提交
  15. 22 7月, 2017 2 次提交
  16. 16 7月, 2017 1 次提交
  17. 28 4月, 2017 1 次提交
  18. 07 4月, 2017 1 次提交
    • S
      Move various string utility functions into string_util · 343b59d6
      Sagar Vemuri 提交于
      Summary:
      This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.
      
      Check the sub-commits for easier reviewing.
      Closes https://github.com/facebook/rocksdb/pull/2094
      
      Differential Revision: D4837730
      
      Pulled By: sagar0
      
      fbshipit-source-id: 344278a
      343b59d6
  19. 13 7月, 2016 1 次提交
    • J
      Miscellaneous performance improvements · efd013d6
      Jay Edgar 提交于
      Summary:
      I was investigating performance issues in the SstFileWriter and found all of the following:
      
      - The SstFileWriter::Add() function created a local InternalKey every time it was called generating a allocation and free each time.  Changed to have an InternalKey member variable that can be reset with the new InternalKey::Set() function.
      - In SstFileWriter::Add() the smallest_key and largest_key values were assigned the result of a ToString() call, but it is simpler to just assign them directly from the user's key.
      - The Slice class had no move constructor so each time one was returned from a function a new one had to be allocated, the old data copied to the new, and the old one was freed.  I added the move constructor which also required a copy constructor and assignment operator.
      - The BlockBuilder::CurrentSizeEstimate() function calculates the current estimate size, but was being called 2 or 3 times for each key added.  I changed the class to maintain a running estimate (equal to the original calculation) so that the function can return an already calculated value.
      - The code in BlockBuilder::Add() that calculated the shared bytes between the last key and the new key duplicated what Slice::difference_offset does, so I replaced it with the standard function.
      - BlockBuilder::Add() had code to copy just the changed portion into the last key value (and asserted that it now matched the new key).  It is more efficient just to copy the whole new key over.
      - Moved this same code up into the 'if (use_delta_encoding_)' since the last key value is only needed when delta encoding is on.
      - FlushBlockBySizePolicy::BlockAlmostFull calculated a standard deviation value each time it was called, but this information would only change if block_size of block_size_deviation changed, so I created a member variable to hold the value to avoid the calculation each time.
      - Each PutVarint??() function has a buffer and calls std::string::append().  Two or three calls in a row could share a buffer and a single call to std::string::append().
      
      Some of these will be helpful outside of the SstFileWriter.  I'm not 100% the addition of the move constructor is appropriate as I wonder why this wasn't done before - maybe because of compiler compatibility?  I tried it on gcc 4.8 and 4.9.
      
      Test Plan: The changes should not affect the results so the existing tests should all still work and no new tests were added.  The value of the changes was seen by manually testing the SstFileWriter class through MyRocks and adding timing code to identify problem areas.
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59607
      efd013d6
  20. 10 2月, 2016 1 次提交
  21. 09 10月, 2015 1 次提交
    • S
      New Manifest format to allow customized fields in NewFile. · b77eb16a
      sdong 提交于
      Summary: With this commit, we add a new format in manifest when adding a new file. Now path ID and need-compaction hint are first two customized fields.
      
      Test Plan: Add a test case in version_edit_test to verify the encoding and decoding logic. Add a unit test in db_test to verify need compaction is persistent after DB restarting.
      
      Reviewers: kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, rven, igor
      
      Reviewed By: igor
      
      Subscribers: javigon, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48123
      b77eb16a
  22. 18 7月, 2015 1 次提交
    • A
      Added JSON manifest dump option to ldb command · 74c755c5
      Ari Ekmekji 提交于
      Summary:
      Added a new flag --json to the ldb manifest_dump command
      that prints out the version edits as JSON objects for easier
      reading and parsing of information.
      
      Test Plan:
      **Sample usage: **
      ```
      ./ldb manifest_dump --json --path=path/to/manifest/file
      ```
      
      **Sample output:**
      ```
      {"EditNumber": 0, "Comparator": "leveldb.BytewiseComparator", "ColumnFamily": 0}
      {"EditNumber": 1, "LogNumber": 0, "ColumnFamily": 0}
      {"EditNumber": 2, "LogNumber": 4, "PrevLogNumber": 0, "NextFileNumber": 7, "LastSeq": 35356, "AddedFiles": [{"Level": 0, "FileNumber": 5, "FileSize": 1949284, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0}
      ...
      {"EditNumber": 13, "PrevLogNumber": 0, "NextFileNumber": 36, "LastSeq": 290994, "DeletedFiles": [{"Level": 0, "FileNumber": 17}, {"Level": 0, "FileNumber": 20}, {"Level": 0, "FileNumber": 22}, {"Level": 0, "FileNumber": 24}, {"Level": 1, "FileNumber": 13}, {"Level": 1, "FileNumber": 14}, {"Level": 1, "FileNumber": 15}, {"Level": 1, "FileNumber": 18}], "AddedFiles": [{"Level": 1, "FileNumber": 25, "FileSize": 2114340, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 26, "FileSize": 2115213, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 27, "FileSize": 2114807, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 30, "FileSize": 2115271, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 31, "FileSize": 2115165, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 32, "FileSize": 2114683, "SmallestIKey": "'", "LargestIKey": "'"}, {"Level": 1, "FileNumber": 35, "FileSize": 1757512, "SmallestIKey": "'", "LargestIKey": "'"}], "ColumnFamily": 0}
      ...
      ```
      
      Reviewers: sdong, anthony, yhchiang, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D41727
      74c755c5
  23. 01 11月, 2014 1 次提交
    • I
      Turn on -Wshadow · 9f7fc3ac
      Igor Canadi 提交于
      Summary:
      ...and fix all the errors :)
      
      Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
      
      Test Plan: compiles
      
      Reviewers: yhchiang, rven, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27711
      9f7fc3ac
  24. 30 10月, 2014 2 次提交
  25. 29 10月, 2014 1 次提交
    • Y
      Fix the bug where compaction does not fail when RocksDB can't create a new file. · 3772a3d0
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff has two fixes.
      
      1. Fix the bug where compaction does not fail when RocksDB can't create a new file.
      2. When NewWritableFiles() fails in OpenCompactionOutputFiles(), previously such fail-to-created file will be still be included as a compaction output.  This patch also fixes this bug.
      3. Allow VersionEdit::EncodeTo() to return Status and add basic check.
      
      Test Plan:
      ./version_edit_test
      export ROCKSDB_TESTS=FileCreationRandomFailure
      ./db_test
      
      Reviewers: ljin, sdong, nkg-, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D25581
      3772a3d0
  26. 24 10月, 2014 1 次提交
  27. 03 7月, 2014 1 次提交
  28. 17 6月, 2014 1 次提交
    • S
      Refactor: group metadata needed to open an SST file to a separate copyable struct · cadc1adf
      sdong 提交于
      Summary:
      We added multiple fields to FileMetaData recently and are planning to add more.
      This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:
      
      (1) use it to design a more efficient data structure to speed up read queries.
      (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.
      
      The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.
      
      Test Plan: make all check
      
      Reviewers: haobo, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D18933
      cadc1adf
  29. 08 5月, 2014 1 次提交
    • I
      [fix] SIGSEGV when VersionEdit in MANIFEST is corrupted · 768d424d
      Igor Canadi 提交于
      Summary:
      This was reported by our customers in task #4295529.
      
      Cause:
      * MANIFEST file contains a VersionEdit, which contains file entries whose 'smallest' and 'largest' internal keys are empty. String with zero characters. Root cause of corruption was not investigated. We should report corruption when this happens. However, we currently SIGSEGV.
      
      Here's what happens:
      * VersionEdit encodes zero-strings happily and stores them in smallest and largest InternalKeys. InternalKey::Encode() does assert when `rep_.empty()`, but we don't assert in production environemnts. Also, we should never assert as a result of DB corruption.
      * As part of our ConsistencyCheck, we call GetLiveFilesMetaData()
      * GetLiveFilesMetadata() calls `file->largest.user_key().ToString()`
      * user_key() function does: 1. assert(size > 8) (ooops, no assert), 2. returns `Slice(internal_key.data(), internal_key.size() - 8)`
      * since `internal_key.size()` is unsigned int, this call translates to `Slice(whatever, 1298471928561892576182756)`. Bazinga.
      
      Fix:
      * VersionEdit checks if InternalKey is valid in `VersionEdit::GetInternalKey()`. If it's invalid, returns corruption.
      
      Lessons learned:
      * Always keep in mind that even if you `assert()`, production code will continue execution even if assert fails.
      * Never `assert` based on DB corruption. Assert only if the code should guarantee that assert can't fail.
      
      Test Plan: dumped offending manifest. Before: assert. Now: corruption
      
      Reviewers: dhruba, haobo, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D18507
      768d424d
  30. 01 4月, 2014 1 次提交
  31. 19 3月, 2014 1 次提交
  32. 12 3月, 2014 2 次提交
  33. 06 3月, 2014 1 次提交
    • I
      [CF] Dont reuse dropped column family IDs · 9625acbf
      Igor Canadi 提交于
      Summary:
      Column family IDs should be unique, even if column family is dropped. To achieve this, we save max column family in manifest.
      
      Note that the diff is still not ready. I'm only using differential to move the patch to my Mac machine.
      
      Test Plan: added a test to column_family_test
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16581
      9625acbf
  34. 30 1月, 2014 1 次提交
  35. 17 1月, 2014 1 次提交
    • I
      Remove compaction pointers · 6d6fb709
      Igor Canadi 提交于
      Summary: The only thing we do with compaction pointers is set them to some values, we never actually read them. I don't know what we used them for, but it doesn't look like we use them anymore.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15225
      6d6fb709