1. 07 3月, 2018 3 次提交
    • D
      Windows cumulative patch · c364eb42
      Dmitri Smirnov 提交于
      Summary:
      This patch addressed several issues.
        Portability including db_test std::thread -> port::Thread Cc: @
        and %z to ROCKSDB portable macro. Cc: maysamyabandeh
      
        Implement Env::AreFilesSame
      
        Make the implementation of file unique number more robust
      
        Get rid of C-runtime and go directly to Windows API when dealing
        with file primitives.
      
        Implement GetSectorSize() and aling unbuffered read on the value if
        available.
      
        Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976
      
        Fix test running script issue where $status var was of incorrect scope
        so the failures were swallowed and not reported.
      
        DestroyDB() creates a logger and opens a LOG file in the directory
        being cleaned up. This holds a lock on the folder and the cleanup is
        prevented. This fails one of the checkpoin tests. We observe the same in production.
        We close the log file in this change.
      
       Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
       attempts to open a directory with NewRandomAccessFile which does not
       work on Windows.
        Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
      Closes https://github.com/facebook/rocksdb/pull/3552
      
      Differential Revision: D7156304
      
      Pulled By: siying
      
      fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
      c364eb42
    • Y
      Blob DB: Improve FIFO eviction · b864bc9b
      Yi Wu 提交于
      Summary:
      Improving blob db FIFO eviction with the following changes,
      * Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
      * FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
      * FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
      * If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
      * Compaction filter also filter those blob indexes where corresponding blob file is gone.
      * Add an event listener to listen compaction/flush event and update sst file size.
      * Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
      * More blob db statistics around FIFO.
      * Fix some locking issue when accessing a blob file.
      Closes https://github.com/facebook/rocksdb/pull/3556
      
      Differential Revision: D7139328
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
      b864bc9b
    • P
      Added bytes XOR merge operator · 0a2354ca
      Pooya Shareghi 提交于
      Summary:
      Closes https://github.com/facebook/rocksdb/pull/575
      
      I fixed the merge conflicts etc.
      Closes https://github.com/facebook/rocksdb/pull/3065
      
      Differential Revision: D7128233
      
      Pulled By: sagar0
      
      fbshipit-source-id: 2c23a48c9f0432c290b0cd16a12fb691bb37820c
      0a2354ca
  2. 06 3月, 2018 6 次提交
  3. 03 3月, 2018 4 次提交
  4. 02 3月, 2018 2 次提交
    • M
      Fix a leak in prepared_section_completed_ · d060421c
      Maysam Yabandeh 提交于
      Summary:
      The zeroed entries were not removed from prepared_section_completed_ map. This patch adds a unit test to show the problem and fixes that by refactoring the code. The new code is more efficient since i) it uses two separate mutex to avoid contention between commit and prepare threads, ii) it uses a sorted vector for maintaining uniq log entires with prepare which avoids a very large heap with many duplicate entries.
      Closes https://github.com/facebook/rocksdb/pull/3545
      
      Differential Revision: D7106071
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b3ae17cb6cd37ef10b6b35e0086c15c758768a48
      d060421c
    • Y
      Add "rocksdb.live-sst-files-size" DB property · bf937cf1
      Yi Wu 提交于
      Summary:
      Add "rocksdb.live-sst-files-size" DB property which only include files of latest version. Existing "rocksdb.total-sst-files-size" include files from all versions and thus include files that's obsolete but not yet deleted. I'm going to use this new property to cap blob db sst + blob files size.
      Closes https://github.com/facebook/rocksdb/pull/3548
      
      Differential Revision: D7116939
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: c6a52e45ce0f24ef78708156e1a923c1dd6bc79a
      bf937cf1
  5. 01 3月, 2018 1 次提交
  6. 28 2月, 2018 2 次提交
    • A
      skip CompactRange flush based on memtable contents · 3ae00472
      Andrew Kryczka 提交于
      Summary:
      CompactRange has a call to Flush because we guarantee that, at the time it's called, all existing keys in the range will be pushed through the user's compaction filter. However, previously the flush was done blindly, so it'd happen even if the memtable does not contain keys in the range specified by the user. This caused unnecessarily many L0 files to be created, leading to write stalls in some cases. This PR checks the memtable's contents, and decides to flush only if it overlaps with `CompactRange`'s range.
      
      - Move the memtable overlap check logic from `ExternalSstFileIngestionJob` to `ColumnFamilyData::RangesOverlapWithMemtables`
      - Reuse the above logic in `CompactRange` and skip flushing if no overlap
      Closes https://github.com/facebook/rocksdb/pull/3520
      
      Differential Revision: D7018897
      
      Pulled By: ajkr
      
      fbshipit-source-id: a3c6b1cfae56687b49dd89ccac7c948e53545934
      3ae00472
    • S
      Update comments in DB::Close() · c287c098
      Siying Dong 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/3543
      
      Differential Revision: D7093251
      
      Pulled By: siying
      
      fbshipit-source-id: 4066b82c95ecb65866c5842d68ab13ab9f85d567
      c287c098
  7. 27 2月, 2018 3 次提交
    • I
      Adding CentOS 7 Vagrantfile & build script · d6336563
      Istvan Szukacs 提交于
      Summary:
      I have updated the Vagrantfile to have an entry for CentOS 7. Also created a simple build script which is pretty similar to the one in Beringei.
      
      How to test:
      ```
      vagrant up centos7
      ```
      Todo:
      
      Implement -j X for the build.
      Closes https://github.com/facebook/rocksdb/pull/3530
      
      Differential Revision: D7090739
      
      Pulled By: ajkr
      
      fbshipit-source-id: 9f9eda5b507568993543d08de7ce168dfc12282e
      d6336563
    • Z
      DB:Open should fail on tmpfs when use_direct_reads=true · ad05cbb1
      Zhongyi Xie 提交于
      Summary:
      Before:
      
      > $ TEST_TMPDIR=/dev/shm ./db_bench -use_direct_reads=true -benchmarks=readrandomwriterandom -num=10000000 -reads=100000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -readwritepercent=50 -key_size=16 -value_size=48 -threads=32
      DB path: [/dev/shm/dbbench]
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      db_bench: tpp.c:84: __pthread_tpp_change_priority: Assertion `new_prio == -1 || (new_prio >= fifo_min_prio && new_prio <= fifo_max_prio)' failed.
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      put error: IO error: While open a file for random read: /dev/shm/dbbench/000007.sst: Invalid argument
      
      After:
      > TEST_TMPDIR=/dev/shm ./db_bench -use_direct_reads=true -benchmarks=readrandomwriterandom -num=10000000 -reads=100000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -max_background_jobs=12 -readwritepercent=50 -key_size=16 -value_size=48 -threads=32
      Initializing RocksDB Options from the specified file
      Initializing RocksDB Options from command-line flags
      open error: Not implemented: Direct I/O is not supported by the specified DB.
      Closes https://github.com/facebook/rocksdb/pull/3539
      
      Differential Revision: D7082658
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f9d9c6ec3b5e9e049cab52154940ee101ba4d342
      ad05cbb1
    • D
      Fix a memory leak in WindowsThread · 7eb292da
      Dmitri Smirnov 提交于
      Summary:
      _endthreadex does not return and thus objects
        for stack destructors do not run. This creates a memory leak.
        We remove the calls since _enthreadex called automatically after the
        threadproc returns i.e. thread exits.
      Closes https://github.com/facebook/rocksdb/pull/3542
      
      Differential Revision: D7088713
      
      Pulled By: ajkr
      
      fbshipit-source-id: 749ecafc6a9572f587f76e516547e07734349a54
      7eb292da
  8. 24 2月, 2018 2 次提交
  9. 23 2月, 2018 5 次提交
  10. 22 2月, 2018 2 次提交
    • A
      BackupEngine gluster-friendly file naming convention · b0929776
      Andrew Kryczka 提交于
      Summary:
      Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
      Closes https://github.com/facebook/rocksdb/pull/3463
      
      Differential Revision: D6893333
      
      Pulled By: ajkr
      
      fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
      b0929776
    • M
      WritePrepared Txn: fix non-emptied PreparedHeap bug · 828211e9
      Maysam Yabandeh 提交于
      Summary:
      Under a certain sequence of accessing PreparedHeap, there was a bug that would not successfully empty the heap. This would result in performance issues when the heap content is moved to old_prepared_ after max_evicted_seq_ advances the orphan prepared sequence numbers. The patch fixed the bug and add more unit tests. It also does more logging when the unlikely scenarios are faced
      Closes https://github.com/facebook/rocksdb/pull/3526
      
      Differential Revision: D7038486
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: f1e40bea558f67b03d2a29131fcb8734c65fce97
      828211e9
  11. 21 2月, 2018 4 次提交
    • S
      Add rocksdb.iterator.internal-key property · 8ada876d
      Sagar Vemuri 提交于
      Summary:
      Added a new iterator property: `rocksdb.iterator.internal-key` to get the internal-key (converted to user key) at which the iterator stopped.
      Closes https://github.com/facebook/rocksdb/pull/3525
      
      Differential Revision: D7033694
      
      Pulled By: sagar0
      
      fbshipit-source-id: d51e6c00f5e9d766c6276ef79774b81c6c5216f8
      8ada876d
    • J
      save redundant key lookup in map of locked keys · e9c31ab1
      jsteemann 提交于
      Summary:
      In case it is found that a key is already marked as locked in a
      stripe's map of locked keys, it is not necessary to look it up
      again using `std::unordered_map<std::string, ...>::at(size_t)`.
      
      Instead, we can use the already found position using the iterator
      produced by the previous `find` operation. Reusing the iterator
      will avoid having to hash the key again and do additional "random"
      memory lookups in the map of keys (though the data will very
      likely sit available in caches here already due to the previous
      find operation)
      Closes https://github.com/facebook/rocksdb/pull/3505
      
      Differential Revision: D7036446
      
      Pulled By: sagar0
      
      fbshipit-source-id: cced51547b2bd2d49394f6bc8c5896f09fa80f68
      e9c31ab1
    • A
      fix handling of empty string as checkpoint directory · 1960e73e
      Andrew Kryczka 提交于
      Summary:
      - made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
      - made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.
      
      Differential Revision: D6874562
      
      fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
      1960e73e
    • I
      fix shift UBSAN error in col_buf_encoder.cc · 5263da63
      Igor Sugak 提交于
      Summary:
      Add a static cast to perform the left shift as with an unsigned type.
      
      make ubsan_check
      Closes https://github.com/facebook/rocksdb/pull/3517
      
      Reviewed By: sagar0
      
      Differential Revision: D7016044
      
      Pulled By: igorsugak
      
      fbshipit-source-id: baf72f6197edd8f7220d010b15a23d6de6a72c49
      5263da63
  12. 17 2月, 2018 3 次提交
    • P
      Fix build with USE_RTTI=0 · ab446dc2
      Po-Chuan Hsieh 提交于
      Summary:
      utilities/column_aware_encoding_util.cc:61:23: error: cannot use dynamic_cast with -fno-rtti
        table_reader_.reset(dynamic_cast<BlockBasedTable*>(table_reader.release()));
                            ^
      1 error generated.
      
      It was added as a [local patch](https://svnweb.freebsd.org/ports/head/databases/rocksdb/files/patch-utilities-column_aware_encoding_util.cc) on FreeBSD since RocksDB 5.8.
      It also fixes #2707.
      Closes https://github.com/facebook/rocksdb/pull/3514
      
      Differential Revision: D7005571
      
      Pulled By: siying
      
      fbshipit-source-id: 351a9055d21d0accdd7a932e8e7bfcd3c8e22068
      ab446dc2
    • M
      WritePrepared Txn: optimizations for sysbench update_noindex · c178da05
      Maysam Yabandeh 提交于
      Summary:
      These are optimization that we applied to improve sysbech's update_noindex performance.
      1. Make use of LIKELY compiler hint
      2. Move std::atomic so the subclass
      3. Make use of skip_prepared in non-2pc transactions.
      Closes https://github.com/facebook/rocksdb/pull/3512
      
      Differential Revision: D7000075
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1ab8292584df1f6305a4992973fb1b7933632181
      c178da05
    • M
      Fix deadlock in ColumnFamilyData::InstallSuperVersion() · 97307d88
      Mike Kolupaev 提交于
      Summary:
      Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.
      
      This deadlock is hit all the time on our workload. It blocks our release.
      
      In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.
      
      So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.
      
      This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.
      
      I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
      Closes https://github.com/facebook/rocksdb/pull/3510
      
      Reviewed By: sagar0
      
      Differential Revision: D7005346
      
      Pulled By: al13n321
      
      fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
      97307d88
  13. 16 2月, 2018 3 次提交
    • A
      fix advance reservation of arena block addresses · 0454f781
      Andrew Kryczka 提交于
      Summary:
      Calling `std::vector::reserve()` causes memory to be reallocated and then data to be moved. It was called prior to adding every block. This reallocation could be done a huge amount of times, e.g., for users with large index blocks.
      
      Instead, we can simply use `std::vector::emplace_back()` in such a way that preserves the no-memory-leak guarantee, while letting the vector decide when to reallocate space. Now I see reallocation/moving happen O(logN) times, rather than O(N) times, where N is the final size of vector.
      Closes https://github.com/facebook/rocksdb/pull/3508
      
      Differential Revision: D6994228
      
      Pulled By: ajkr
      
      fbshipit-source-id: ab7c11e13ff37c8c6c8249be7a79566a4068cd27
      0454f781
    • Y
      Legocastle job to report lite build binary size to scuba · 989d1231
      Yi Wu 提交于
      Summary:
      Add a legocastle job to continuously build the last 10 commits every 4 hours and report lite build binary size to scuba.
      Closes https://github.com/facebook/rocksdb/pull/3511
      
      Differential Revision: D7001730
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 7c8ca87c46d663c786a0d32be69ebbe7b19a5eb9
      989d1231
    • M
      Unbreak MemTableRep API change · 8eb1d445
      Maysam Yabandeh 提交于
      Summary:
      The MemTableRep API was broken by this commit: 813719e9
      This patch reverts the changes and instead adds InsertKey (and etc.) overloads to extend the MemTableRep API without breaking the existing classes that inherit from it.
      Closes https://github.com/facebook/rocksdb/pull/3513
      
      Differential Revision: D7004134
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: e568d91fe1e17dd76c0c1f6c7dd51a18633b1c4f
      8eb1d445