1. 01 7月, 2017 3 次提交
  2. 30 6月, 2017 4 次提交
    • A
      Regression test for empty dedicated range deletion file · d310e0f3
      Andrew Kryczka 提交于
      Summary:
      Issue: #2478
      Fix: #2503
      
      The bug happened when all of these conditions were satisfied:
      
      - A subcompaction generates no keys
      - `RangeDelAggregator::ShouldAddTombstones()` returns true because there's at least one non-obsoleted range deletion in its map
      - None of the non-obsolete tombstones overlap with the subcompaction key-range
      
      Under those conditions, we were creating a dedicated file for range deletions which was left empty, thus causing an error in VersionEdit.
      
      I verified this test case fails before the #2503 fix and passes after.
      Closes https://github.com/facebook/rocksdb/pull/2521
      
      Differential Revision: D5352568
      
      Pulled By: ajkr
      
      fbshipit-source-id: f619cae39984ce9bb9b7a4e7a9ac0f2bb2ce43e9
      d310e0f3
    • M
      Add a fetch_add variation to AddDBStats · e9f91a51
      Maysam Yabandeh 提交于
      Summary:
      AddDBStats is in two steps of load and store, which is more efficient than fetch_add. This is however not thread-safe. Currently we have to protect concurrent access to AddDBStats with a mutex which is less efficient that fetch_add.
      
      This patch adds the option to do fetch_add when AddDBStats. The results for my 2pc benchmark on sysbench is:
      - vanilla: 68618 tps
      - removing mutex on AddDBStats (unsafe): 69767 tps
      - fetch_add for all AddDBStats: 69200 tps
      - fetch_add only for concurrently access AddDBStats (this patch): 69579 tps
      Closes https://github.com/facebook/rocksdb/pull/2505
      
      Differential Revision: D5330656
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: af64d7bee135b0e86b4fac323a4f9d9113eaa383
      e9f91a51
    • Z
      skip generating empty sst · c1b375e9
      zhangjinpeng1987 提交于
      Summary:
      When a compaction job output nothing, there is no necessary to generate a empty sst file which will cause `VersionEdit::EncodeTo` failed.
      ref https://github.com/facebook/rocksdb/issues/2478
      Closes https://github.com/facebook/rocksdb/pull/2503
      
      Differential Revision: D5350799
      
      Pulled By: ajkr
      
      fbshipit-source-id: df0b4fcf3507fe1c3c435208b762e75478e00143
      c1b375e9
    • Y
      fix format compatible test · 67b417d6
      Yi Wu 提交于
      Summary:
      The comma "," is not a valid separator for bash arrays.
      Closes https://github.com/facebook/rocksdb/pull/2516
      
      Differential Revision: D5348101
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8f0afdac368e21076eb7366b7df7dbaaf158cf96
      67b417d6
  3. 29 6月, 2017 4 次提交
    • S
      Bug fix: Fast CRC Support printing is not honest · afbef651
      Siying Dong 提交于
      Summary:
      11c5d474 introduces a bug that IsFastCrc32Supported() returns wrong result. Fix it. Also fix some FB internal scripts.
      Closes https://github.com/facebook/rocksdb/pull/2513
      
      Differential Revision: D5343802
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 057dc7ae3b262fe951413d1190ce60afc788cc05
      afbef651
    • M
      Improve Status message for block checksum mismatches · 397ab111
      Mike Kolupaev 提交于
      Summary:
      We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.
      
      It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
      Closes https://github.com/facebook/rocksdb/pull/2507
      
      Differential Revision: D5345702
      
      Pulled By: al13n321
      
      fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
      397ab111
    • S
      Make "make analyze" happy · 18c63af6
      Siying Dong 提交于
      Summary:
      "make analyze" is reporting some errors. It's complicated to look but it seems to me that they are all false positive. Anyway, I think cleaning them up is a good idea. Some of the changes are hacky but I don't know a better way.
      Closes https://github.com/facebook/rocksdb/pull/2508
      
      Differential Revision: D5341710
      
      Pulled By: siying
      
      fbshipit-source-id: 6070e430e0e41a080ef441e05e8ec827d45efab6
      18c63af6
    • M
      Fix the reported asan issues · 01534db2
      Maysam Yabandeh 提交于
      Summary:
      This is to resolve the asan complains. In the meanwhile I am working on clarifying/revisiting the sync rules.
      Closes https://github.com/facebook/rocksdb/pull/2510
      
      Differential Revision: D5338660
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: ce6f6e0826d43a2c0bfa4328a00c78f73cd6498a
      01534db2
  4. 28 6月, 2017 5 次提交
    • S
      FIFO Compaction with TTL · 1cd45cd1
      Sagar Vemuri 提交于
      Summary:
      Introducing FIFO compactions with TTL.
      
      FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.
      
      To address that request:
      - Added a new TTL option to FIFO compaction options.
      - Updated FIFO compaction score to take TTL into consideration.
      - Added a new table property, creation_time, to keep track of when the SST file is created.
      - Creation_time is set as below:
        - On Flush: Set to the time of flush.
        - On Compaction: Set to the max creation_time of all the files involved in the compaction.
        - On Repair and Recovery: Set to the time of repair/recovery.
        - Old files created prior to this code change will have a creation_time of 0.
      - FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day.
      - FIFO compaction will fall back to the prior way of deleting files based on size if:
        - the creation_time of all files involved in compaction is 0.
        - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted.
      
      This feature is not supported if max_open_files != -1 or with table formats other than Block-based.
      
      **Test Plan:**
      Added tests.
      
      **Benchmark results:**
      Base: FIFO with max size: 100MB ::
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100
      
      readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)
      ```
      
      With TTL (a low one for testing) ::
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20
      
      readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)
      ```
      Example Log lines:
      ```
      2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
      2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
      ...
      2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
      2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}
      ```
      
      SST Files remaining in the dbbench dir, after db_bench execution completed:
      ```
      svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
      -rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
      -rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
      -rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst
      ```
      Closes https://github.com/facebook/rocksdb/pull/2480
      
      Differential Revision: D5305116
      
      Pulled By: sagar0
      
      fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174
      1cd45cd1
    • Y
      Fix TARGETS file tests list · 982cec22
      Yi Wu 提交于
      Summary:
      1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
      2. add blob_db_test
      Closes https://github.com/facebook/rocksdb/pull/2506
      
      Differential Revision: D5331517
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
      982cec22
    • Y
      allow numa >= 2.0.8 · b49b3710
      Yi Wu 提交于
      Summary:
      Allow numa >= 2.0.8 in buck TARGET file.
      Closes https://github.com/facebook/rocksdb/pull/2504
      
      Differential Revision: D5330550
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 8ffb6167b4ad913877eac16a20a91023b31f8d41
      b49b3710
    • S
      CLANG Tidy · e517bfa2
      Siying Dong 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2502
      
      Differential Revision: D5326498
      
      Pulled By: siying
      
      fbshipit-source-id: 2f0ac6dc6ca5ddb23cecf67a278c086e52646714
      e517bfa2
    • Y
      update compatible test · dc3d2e4d
      Yi Wu 提交于
      Summary:
      update compatible test to include 5.5 and 5.6 branch.
      Closes https://github.com/facebook/rocksdb/pull/2501
      
      Differential Revision: D5325220
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 5f5271491e6dd2d7b2cf73a7142f38a571553bc4
      dc3d2e4d
  5. 27 6月, 2017 9 次提交
  6. 25 6月, 2017 2 次提交
    • M
      Update rename of ParanoidCheck · 8e6345d2
      Maysam Yabandeh 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/2494
      
      Differential Revision: D5317902
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 097330292180816b3d0c9f4cbbdb6f68f0180200
      8e6345d2
    • M
      Optimize for serial commits in 2PC · 499ebb3a
      Maysam Yabandeh 提交于
      Summary:
      Throughput: 46k tps in our sysbench settings (filling the details later)
      
      The idea is to have the simplest change that gives us a reasonable boost
      in 2PC throughput.
      
      Major design changes:
      1. The WAL file internal buffer is not flushed after each write. Instead
      it is flushed before critical operations (WAL copy via fs) or when
      FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
      via mutex_.
      2. Use two sequence numbers: last seq, and last seq for write. Last seq
      is the last visible sequence number for reads. Last seq for write is the
      next sequence number that should be used to write to WAL/memtable. This
      allows to have a memtable write be in parallel to WAL writes.
      3. BatchGroup is not used for writes. This means that we can have
      parallel writers which changes a major assumption in the code base. To
      accommodate for that i) allow only 1 WriteImpl that intends to write to
      memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
      come via group commit phase which is serial anyway, ii) make all the
      parts in the code base that assumed to be the only writer (via
      EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
      protected via a stat_mutex_.
      
      Note: the first commit has the approach figured out but is not clean.
      Submitting the PR anyway to get the early feedback on the approach. If
      we are ok with the approach I will go ahead with this updates:
      0) Rebase with Yi's pipelining changes
      1) Currently batching is disabled by default to make sure that it will be
      consistent with all unit tests. Will make this optional via a config.
      2) A couple of unit tests are disabled. They need to be updated with the
      serial commit of 2PC taken into account.
      3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
      releasing mutex_ beforehand (the same way EnterUnbatched does). This
      needs to be cleaned up.
      Closes https://github.com/facebook/rocksdb/pull/2345
      
      Differential Revision: D5210732
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
      499ebb3a
  7. 24 6月, 2017 2 次提交
  8. 23 6月, 2017 3 次提交
    • A
      Introduce OnBackgroundError callback · 71f5bcb7
      Andrew Kryczka 提交于
      Summary:
      Some users want to prevent rocksdb from entering read-only mode in certain error cases. This diff gives them a callback, `OnBackgroundError`, that they can use to achieve it.
      
      - call `OnBackgroundError` every time we consider setting `bg_error_`. Use its result to assign `bg_error_` but not to change the function's return status.
      - classified calls using `BackgroundErrorReason` to give the callback some info about where the error happened
      - renamed `ParanoidCheck` to something more specific so we can provide a clear `BackgroundErrorReason`
      - unit tests for the most common cases: flush or compaction errors
      Closes https://github.com/facebook/rocksdb/pull/2477
      
      Differential Revision: D5300190
      
      Pulled By: ajkr
      
      fbshipit-source-id: a0ea4564249719b83428e3f4c6ca2c49e366e9b3
      71f5bcb7
    • S
      Downgrade option sanitiy check level for prefix_extractor · 88cd2d96
      Siying Dong 提交于
      Summary:
      With c7004840, it's safe to open a DB with different prefix extractor. So it's safe to skip prefix extractor check.
      Closes https://github.com/facebook/rocksdb/pull/2474
      
      Differential Revision: D5294700
      
      Pulled By: siying
      
      fbshipit-source-id: eeb500da795eecb29b8c9c56a14cfd4afda12ecc
      88cd2d96
    • S
      Fix Data Race Between CreateColumnFamily() and GetAggregatedIntProperty() · 6837a176
      Siying Dong 提交于
      Summary:
      CreateColumnFamily() releases DB mutex after adding column family to the set and install super version (to write option file), so if users call GetAggregatedIntProperty() in the middle, then super version will be null and the process will crash. Fix it by skipping those column families without super version installed.
      
      Maybe we should also fix the problem of releasing the lock when reading option file, but it is more risky. so I'm doing a quick and safer fix and we can investigate it later.
      Closes https://github.com/facebook/rocksdb/pull/2475
      
      Differential Revision: D5298053
      
      Pulled By: siying
      
      fbshipit-source-id: 4b3c8f91c60400b163fcc6cda8a0c77723be0ef6
      6837a176
  9. 22 6月, 2017 1 次提交
  10. 21 6月, 2017 3 次提交
  11. 20 6月, 2017 1 次提交
  12. 19 6月, 2017 1 次提交
  13. 17 6月, 2017 1 次提交
  14. 16 6月, 2017 1 次提交