1. 05 4月, 2019 2 次提交
    • A
      Fix many bugs in log statement arguments (#5089) · c06c4c01
      Adam Simpkins 提交于
      Summary:
      Annotate all of the logging functions to inform the compiler that these
      use printf-style formatting arguments.  This allows the compiler to emit
      warnings if the format arguments are incorrect.
      
      This also fixes many problems reported now that format string checking
      is enabled.  Many of these are simply mix-ups in the argument type (e.g,
      int vs uint64_t), but in several cases the wrong number of arguments
      were being passed in which can cause the code to crash.
      
      The primary motivation for this was to fix the log message in
      `DBImpl::SwitchMemtable()` which caused a segfault due to an extra %s
      format parameter with no argument supplied.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5089
      
      Differential Revision: D14574795
      
      Pulled By: simpkins
      
      fbshipit-source-id: 0921b03f0743652bf4ae21e414ff54b3bb65422a
      c06c4c01
    • D
      #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when... · f0edf9d5
      datonli 提交于
      #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152)
      
      Summary:
      mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152
      
      Differential Revision: D14779409
      
      Pulled By: siying
      
      fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
      f0edf9d5
  2. 04 4月, 2019 3 次提交
  3. 03 4月, 2019 6 次提交
    • Z
      add assert to silence clang analyzer and fix variable shadowing (#5140) · e8480d4d
      Zhongyi Xie 提交于
      Summary:
      This PR address two open issues:
      
      1.  clang analyzer is paranoid about db_ being nullptr after DB::Open calls in the test.
      See https://github.com/facebook/rocksdb/pull/5043#discussion_r271394579
      Add an assert to keep clang happy
      2. PR https://github.com/facebook/rocksdb/pull/5049 introduced a  variable shadowing:
      ```
      db/db_iterator_test.cc: In constructor ‘rocksdb::DBIteratorWithReadCallbackTest_ReadCallback_Test::TestBody()::TestReadCallback::TestReadCallback(rocksdb::SequenceNumber)’:
      db/db_iterator_test.cc:2484:9: error: declaration of ‘max_visible_seq’ shadows a member of 'this' [-Werror=shadow]
               : ReadCallback(max_visible_seq) {}
               ^
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5140
      
      Differential Revision: D14735497
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 3219ea75cf4ae04f64d889323f6779e84be98144
      e8480d4d
    • M
      Mark logs with prepare in PreReleaseCallback (#5121) · 5234fc1b
      Maysam Yabandeh 提交于
      Summary:
      In prepare phase of 2PC, the db promises to remember the prepared data, for possible future commits. To fulfill the promise the prepared data must be persisted in the WAL so that they could be recovered after a crash. The log that contains a prepare batch that is not committed yet, is marked so that it is not garbage collected before the transaction commits/rollbacks. The bug was that the write to the log file and the mark of the file was not atomic, and WAL gc could have happened before the WAL log is actually marked. This patch moves the marking logic to PreReleaseCallback so that the WAL gc logic that joins both write threads would see the WAL write and WAL mark atomically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5121
      
      Differential Revision: D14665210
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 1d66aeb1c66a296cb4899a5a20c4d40c59e4b534
      5234fc1b
    • Z
      add compression options to table properties (#5081) · 26015f3b
      Zhongyi Xie 提交于
      Summary:
      Since we are planning to use dictionary compression and to use different compression level, it is quite useful to add compression options to TableProperties. For example, in MyRocks, if the feature is available, we can query from information_schema.rocksdb_sst_props to see if all sst files are converted to ZSTD dictionary compressions. Resolves https://github.com/facebook/rocksdb/issues/4992
      
      With this PR, user can query table properties through `GetPropertiesOfAllTables` API and get compression options as std::string:
      `window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0;`
      or table_properties->ToString() will also contain it
      `# data blocks=1; # entries=13; # deletions=0; # merge operands=0; # range deletions=0; raw key size=143; raw average key size=11.000000; raw value size=39; raw average value size=3.000000; data block size=120; index block size (user-key? 0, delta-value? 0)=27; filter block size=0; (estimated) table size=147; filter policy name=N/A; prefix extractor name=nullptr; column family ID=0; column family name=default; comparator name=leveldb.BytewiseComparator; merge operator name=nullptr; property collectors names=[]; SST file compression algo=Snappy; SST file compression options=window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ; creation time=1552946632; time stamp of earliest key=1552946632;`
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5081
      
      Differential Revision: D14716692
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 7d2f2cf84e052bff876e71b4212cfdebf5be32dd
      26015f3b
    • M
      WriteUnPrepared: less virtual in iterator callback (#5049) · 14b3f683
      Maysam Yabandeh 提交于
      Summary:
      WriteUnPrepared adds a virtual function, MaxUnpreparedSequenceNumber, to ReadCallback, which returns 0 unless WriteUnPrepared is enabled and the transaction has uncommitted data written to the DB. Together with snapshot sequence number, this determines the last sequence that is visible to reads.
      The patch clarifies the guarantees of the GetIterator API in WriteUnPrepared transactions and make use of that to statically initialize the read callback and thus avoid the virtual call.
      Furthermore it increases the minimum value for min_uncommitted from 0 to 1 as seq 0 is used only for last level keys that are committed in all snapshots.
      
      The following benchmark shows +0.26% higher throughput in seekrandom benchmark.
      
      Benchmark:
      ./db_bench --benchmarks=fillrandom --use_existing_db=0 --num=1000000 --db=/dev/shm/dbbench
      
      ./db_bench --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20355 ops/sec;  225.2 MB/sec
      seekrandom [MEDIAN 10 runs] : 20425 ops/sec;  225.9 MB/sec
      
      ./db_bench_lessvirtual3 --benchmarks=seekrandom[X10] --use_existing_db=1 --db=/dev/shm/dbbench --num=1000000 --duration=60 --seek_nexts=100
      seekrandom [AVG    10 runs] : 20409 ops/sec;  225.8 MB/sec
      seekrandom [MEDIAN 10 runs] : 20487 ops/sec;  226.6 MB/sec
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5049
      
      Differential Revision: D14366459
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: ebaff8908332a5ae9af7defeadabcb624be660ef
      14b3f683
    • S
      Add a missing define to monitoring/iostats_context_imp.h (#5136) · d9d3caca
      Simon Grätzer 提交于
      Summary:
      I think when PR https://github.com/facebook/rocksdb/pull/4889 added the `IOSTATS_CPU_TIMER_GUARD` define to this header file, the noop version in the `#else` branch was forgotten.
      
      Not sure if this is common, but on my MacOS machine it breaks my build
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5136
      
      Differential Revision: D14727727
      
      Pulled By: siying
      
      fbshipit-source-id: 1076e56bdbe6ecda01d461b371dabf7f1593a149
      d9d3caca
    • S
      Revert "Avoid per-key upper bound check in BlockBasedTableIterator (#5101)" (#5132) · ebcc8ae1
      Siying Dong 提交于
      Summary:
      This reverts commit f29dc1b9.
      
      In BlockBasedTableIterator, index_iter_->key() is sometimes a user key, so it is wrong to call ExtractUserKey() against it. This is a bug introduced by #5101.
      Temporarily revert the diff to keep the branch clean.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5132
      
      Differential Revision: D14718584
      
      Pulled By: siying
      
      fbshipit-source-id: 0ac55dc9b5dbc18c7809092146bdf7eb9364b9ad
      ebcc8ae1
  4. 02 4月, 2019 2 次提交
    • X
      Add LevelDB repository link in the Readme · fa1b5582
      xinbenlv 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5134
      
      Differential Revision: D14719068
      
      Pulled By: siying
      
      fbshipit-source-id: c09a544f06ff414dbe2f90792aaf2bb5b8550bee
      fa1b5582
    • M
      Add DBOptions. avoid_unnecessary_blocking_io to defer file deletions (#5043) · 120bc471
      Mike Kolupaev 提交于
      Summary:
      Just like ReadOptions::background_purge_on_iterator_cleanup but for ColumnFamilyHandle instead of Iterator.
      
      In our use case we sometimes call ColumnFamilyHandle's destructor from low-latency threads, and sometimes it blocks the thread for a few seconds deleting the files. To avoid that, we can either offload ColumnFamilyHandle's destruction to a background thread on our side, or add this option on rocksdb side. This PR does the latter, to be consistent with how we solve exactly the same problem for iterators using background_purge_on_iterator_cleanup option.
      
      (EDIT: It's avoid_unnecessary_blocking_io now, and affects both CF drops and iterator destructors.)
      I'm not quite comfortable with having two separate options (background_purge_on_iterator_cleanup and background_purge_on_cf_cleanup) for such a rarely used thing. Maybe we should merge them? Rename background_purge_on_cf_cleanup to something like delete_files_on_background_threads_only or avoid_blocking_io_in_unexpected_places, and make iterators use it instead of the one in ReadOptions? I can do that here if you guys think it's better.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5043
      
      Differential Revision: D14339233
      
      Pulled By: al13n321
      
      fbshipit-source-id: ccf7efa11c85c9a5b91d969bb55627d0fb01e7b8
      120bc471
  5. 30 3月, 2019 4 次提交
    • R
      Fix arena allocation size in NewEmptyInternalIterator (#4905) · 127a850b
      Remington Brasga 提交于
      Summary:
      NewEmptyInternalIterator with arena mistakenly used EmptyIterator to allocate the size from area but then initialized it to a totally different object: EmptyInternalIterator. The patch fixes that.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4905
      
      Differential Revision: D14689840
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: af64fd8ee93d5a4ad54691c792e5ecc5efabc887
      127a850b
    • M
      WriteUnPrepared: Enable auto-compaction after max_evicted_seq_ init (#5128) · a703f16d
      Maysam Yabandeh 提交于
      Summary:
      Compaction would depend on max_evicted_seq_ value. The ::Initialize method should do that after max_evicted_seq_ is properly initialized. The patch also back ports #4853 from WritePrepared txn to WriteUnPrepared.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5128
      
      Differential Revision: D14686562
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: b2355025712a72676ac3b20a95258adcf4774490
      a703f16d
    • Y
      Avoid per-key upper bound check in BlockBasedTableIterator (#5101) · f29dc1b9
      Yi Wu 提交于
      Summary:
      `BlockBasedTableIterator` avoid reading next block on `Next()` if it detects the iterator will be out of bound, by checking against index key. The optimization was added in #2239, and by the time it only check the bound per block. It seems later change make it a per-key check, which introduce unnecessary key comparisons.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5101
      
      Differential Revision: D14678707
      
      Pulled By: siying
      
      fbshipit-source-id: 2372446116753c7892ea4cec7b4b49ef87ba463e
      f29dc1b9
    • Y
      Update RepeatableThreadTest with MockTimeEnv (#5107) · 09957ded
      Yanqin Jin 提交于
      Summary:
      **This PR updates RepeatableThread::wait, breaking some tests on OS X. The rest of the PR fixes the tests on OS X.**
      `RepeatableThreadTest.MockEnvTest` uses `MockTimeEnv` and `RepeatableThread`. If `RepeatableThread::wait` calls `TimedWait` with a time smaller than or equal to the current (real) time, `TimedWait` returns immediately on certain platforms, e.g. OS X. #4560 addresses this issue by replacing `TimedWait` with `Wait` in test. This fixes the test but makes test/production code diverge, which is not optimal for test coverage. This PR proposes an alternative fix which unifies test and production code path for `RepeatableThread::wait`. We obtain the current (real) time in seconds and add 10 extra seconds to ensure that `RepeatableThread::wait` invokes `TimedWait` with a time greater than (real) current time. This is to prevent the `TimedWait` function from returning immediately without sleeping and releasing the mutex. If `TimedWait` returns immediately, the mutex will not be released, and `RepeatableThread::TEST_WaitForRun` never has a chance to execute the callback which, in this case, updates the result returned by `mock_env->NowMicros()`. Consequently, `RepeatableThread::wait` cannot break out of the loop, causing test to hang. The extra 10 seconds is a best-effort approach because there seems no reliable and deterministic way to provide the aforementioned guarantee. By the time `RepeatableThread::wait` is called, there is no guarantee that the `delay + mock_env->NowMicros()` will be greater than the current real time. However, 10 seconds should be sufficient in most cases. We will keep an eye for possible flakiness of this test.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5107
      
      Differential Revision: D14680885
      
      Pulled By: riversand963
      
      fbshipit-source-id: d1ecbe10e1dacd110bd464cd01e188bfee72b89e
      09957ded
  6. 29 3月, 2019 4 次提交
    • Y
      Fix db_stress for custom env (#5122) · d77476ef
      Yanqin Jin 提交于
      Summary:
      Fix some hdfs-related code so that it can compile and run 'db_stress'
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122
      
      Differential Revision: D14675495
      
      Pulled By: riversand963
      
      fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
      d77476ef
    • A
      Smooth the deletion of WAL files (#5116) · dae3b554
      anand76 提交于
      Summary:
      WAL files are currently not subject to deletion rate limiting by DeleteScheduler. If the size of the WAL files is significant, this can cause a high delete rate on SSDs that may affect other operations. To fix it, force WAL file deletions to go through the SstFileManager. Original PR for this is #2768
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5116
      
      Differential Revision: D14669437
      
      Pulled By: anand1976
      
      fbshipit-source-id: c5f62d0640cebaa1574de841a1d01e4ce2faadf0
      dae3b554
    • S
      Option string/map can set merge operator from object registry (#5123) · a98317f5
      Siying Dong 提交于
      Summary:
      Allow customized merge operator to be loaded from option file/map/string
      by allowing users to pre-regiester merge operators to object registry.
      
      Also update HISTORY.md and header files for the same feature for comparator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5123
      
      Differential Revision: D14658488
      
      Pulled By: siying
      
      fbshipit-source-id: 86ea2fbd2a0a04632d8ea9fceaffefd041f6ae61
      a98317f5
    • S
      Improve obsolete_files_test (#5125) · 106a94af
      Siying Dong 提交于
      Summary:
      We see a failure of obsolete_files_test but aren't able to identify
      the issue. Improve the test in following way and hope we can debug
      better next time:
      1. Place sync point before automatic compaction runs so race condition
         will always trigger.
      2. Disable sync point before test finishes.
      3. ASSERT_OK() instead of ASSERT_TRUE(status.ok())
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5125
      
      Differential Revision: D14669456
      
      Pulled By: siying
      
      fbshipit-source-id: dccb7648e334501ad651eb212880096eef1f4ab2
      106a94af
  7. 28 3月, 2019 7 次提交
  8. 27 3月, 2019 7 次提交
    • Y
      Support for single-primary, multi-secondary instances (#4899) · 9358178e
      Yanqin Jin 提交于
      Summary:
      This PR allows RocksDB to run in single-primary, multi-secondary process mode.
      The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
      Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
      
      This PR has several components:
      1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
      
      2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
      
      3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
      3.1 Tail the primary's MANIFEST during recovery.
      3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
      3.3 Tailing WAL will be in a future PR.
      
      4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
      
      Differential Revision: D14510945
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
      9358178e
    • J
      remove bundled but unused fbson library (#5108) · 2a5463ae
      jsteemann 提交于
      Summary:
      fbson library is still included in `third-party` directory, but is not needed by RocksDB anymore.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5108
      
      Differential Revision: D14622272
      
      Pulled By: siying
      
      fbshipit-source-id: 52b24ed17d8d870a71364f85e5bac4eafb192df5
      2a5463ae
    • S
      Introduce CPU timers for iterator seek and next (#5076) · 01e6badb
      Shi Feng 提交于
      Summary:
      Introduce CPU timers for iterator seek and next operations. Seek
      counter includes SeekToFirst, SeekToLast and SeekForPrev, w/ the
      caveat that SeekToLast timer doesn't include some post processing
      time if upper bound is defined.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5076
      
      Differential Revision: D14525218
      
      Pulled By: fredfsh
      
      fbshipit-source-id: 03ba25df3b22b06c072621e4de0eacfa1445f0d9
      01e6badb
    • S
      Allow option string to get comparator from object registry (#5106) · 4774a940
      Siying Dong 提交于
      Summary:
      Even customized ldb may not be able to read data from some databases if
      comparator is not standard. We modify option helper to get comparator from
      object registry so that we can use customized ldb to read non-standard
      comparator.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5106
      
      Differential Revision: D14622107
      
      Pulled By: siying
      
      fbshipit-source-id: 151dcb295a35a4c7d54f919cd4e322a89dc601c9
      4774a940
    • S
      BlobDB::Open() should put all existing trash files to delete scheduler (#5103) · fe2bd190
      Siying Dong 提交于
      Summary:
      Right now, BlobDB::Open() fails to put all trash files to delete scheduler,
      which causes some trash files permanently untracked.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5103
      
      Differential Revision: D14606095
      
      Pulled By: siying
      
      fbshipit-source-id: 41a9437a2948abb235c0ed85f9a04612d0e50183
      fe2bd190
    • Y
      Fix SstFileReader not able to open ingested file (#5097) · 75133b1b
      Yi Wu 提交于
      Summary:
      Since `SstFileReader` don't know largest seqno of a file, it will fail this check when it open a file with global seqno: https://github.com/facebook/rocksdb/blob/ca89ac2ba997dfa0e135bd75d4ccf6f5774a7eff/table/block_based_table_reader.cc#L730
      Changes:
      * Pass largest_seqno=kMaxSequenceNumber from `SstFileReader` and allow it to bypass the above check.
      * `BlockBasedTable::VerifyChecksum` also double check if checksum will match when excluding global seqno (this is to make the new test in sst_table_reader_test pass).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5097
      
      Differential Revision: D14607434
      
      Pulled By: riversand963
      
      fbshipit-source-id: 9008599227c5fccbf9b73fee46b3bf4a1523f023
      75133b1b
    • Y
      Fix BlockBasedTableIterator construction missing index_key_is_full parameter · 7ca9eb75
      Yi Wu 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5104
      
      Differential Revision: D14619000
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: c2895794a3f31b826c149dcb698c1952dacc2332
      7ca9eb75
  9. 26 3月, 2019 3 次提交
  10. 22 3月, 2019 2 次提交
    • R
      Make it easier for users to load options from option file and set shared block cache. (#5063) · a4396f92
      Rashmi Sharma 提交于
      Summary:
      [RocksDB] Make it easier for users to load options from option file and set shared block cache.
      Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
      If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
      Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063
      
      Differential Revision: D14518584
      
      Pulled By: rashmishrm
      
      fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
      a4396f92
    • B
      fix NowNanos overflow (#5062) · 88d85b68
      Burton Li 提交于
      Summary:
      The original implementation of WinEnvIO::NowNanos() has a constant data overflow by:
      li.QuadPart *= std::nano::den;
      As a result, the api provides a incorrect result.
      e.g.:
      li.QuadPart=13477844301545
      std::nano::den=1e9
      
      The fix uses pre-computed nano_seconds_per_period_ to present the nano seconds per performance counter period, in the case if nano::den is divisible by perf_counter_frequency_. Otherwise it falls back to use high_resolution_clock.
      siying ajkr
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5062
      
      Differential Revision: D14426842
      
      Pulled By: anand1976
      
      fbshipit-source-id: 127f1daf423dd4b30edd0dcf8ea0466f468bec12
      88d85b68