1. 23 6月, 2018 3 次提交
  2. 22 6月, 2018 6 次提交
    • Z
      option for timing measurement of non-blocking ops during compaction (#4029) · 795e663d
      Zhongyi Xie 提交于
      Summary:
      For example calling CompactionFilter is always timed and gives the user no way to disable.
      This PR will disable the timer if `Statistics::stats_level_` (which is part of DBOptions) is `kExceptDetailedTimers`
      Closes https://github.com/facebook/rocksdb/pull/4029
      
      Differential Revision: D8583670
      
      Pulled By: miasantreble
      
      fbshipit-source-id: 913be9fe433ae0c06e88193b59d41920a532307f
      795e663d
    • A
      Cleanup staging directory at start of checkpoint (#4035) · 0a5b16c7
      Andrew Kryczka 提交于
      Summary:
      - Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files.
      - Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up.
      - Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB.
      Closes https://github.com/facebook/rocksdb/pull/4035
      
      Reviewed By: yiwu-arbug
      
      Differential Revision: D8580223
      
      Pulled By: ajkr
      
      fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599
      0a5b16c7
    • S
      Assert for Direct IO at the beginning in PositionedRead (#3891) · 645e57c2
      Sagar Vemuri 提交于
      Summary:
      Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO.
      Closes https://github.com/facebook/rocksdb/pull/3891
      
      Differential Revision: D8267972
      
      Pulled By: sagar0
      
      fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7
      645e57c2
    • Y
      Update TARGETS file (#4028) · 58c22144
      Yi Wu 提交于
      Summary:
      -Wshorten-64-to-32 is invalid flag in fbcode. Changing it to -Warrowing.
      Closes https://github.com/facebook/rocksdb/pull/4028
      
      Differential Revision: D8553694
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 1523cbcb4c76cf1d2b10a4d28b5f58c78e6cb876
      58c22144
    • Y
      Fix a warning (treated as error) caused by type mismatch. · 39749596
      Yanqin Jin 提交于
      Summary: Closes https://github.com/facebook/rocksdb/pull/4032
      
      Differential Revision: D8573061
      
      Pulled By: riversand963
      
      fbshipit-source-id: 112324dcb35956d6b3ec891073f4f21493933c8b
      39749596
    • S
      Improve direct IO range scan performance with readahead (#3884) · 7103559f
      Sagar Vemuri 提交于
      Summary:
      This PR extends the improvements in #3282 to also work when using Direct IO.
      We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
      
      **Description:**
      This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
      
      **Implementation Details:**
      - Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
      - `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
      - `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
      - Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
      - Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
      
      **Constraints:**
      - Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
      - Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
      - Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
      
      **Benchmarks:**
      I used the same benchmark as used in #3282.
      Data fill:
      ```
      TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
      ```
      
      Do a long range scan: Seekrandom with large number of nexts
      ```
      TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
      ```
      
      ```
      Before:
      seekrandom   :   37939.906 micros/op 26 ops/sec;   29.2 MB/s (1636 of 1999 found)
      With this change:
      seekrandom   :   8527.720 micros/op 117 ops/sec;  129.7 MB/s (6530 of 7999 found)
      ```
      ~4.5X perf improvement. Taken on an average of 3 runs.
      Closes https://github.com/facebook/rocksdb/pull/3884
      
      Differential Revision: D8082143
      
      Pulled By: sagar0
      
      fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
      7103559f
  3. 21 6月, 2018 2 次提交
    • Y
      Add file name info to SequentialFileReader. (#4026) · 524c6e6b
      Yanqin Jin 提交于
      Summary:
      We potentially need this information for tracing, profiling and diagnosis.
      Closes https://github.com/facebook/rocksdb/pull/4026
      
      Differential Revision: D8555214
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2
      524c6e6b
    • A
      Support file ingestion in stress test (#4018) · 14cee194
      Andrew Kryczka 提交于
      Summary:
      Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`.
      
      We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it.
      Closes https://github.com/facebook/rocksdb/pull/4018
      
      Differential Revision: D8507698
      
      Pulled By: ajkr
      
      fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f
      14cee194
  4. 20 6月, 2018 4 次提交
  5. 19 6月, 2018 5 次提交
  6. 18 6月, 2018 1 次提交
  7. 16 6月, 2018 9 次提交
  8. 14 6月, 2018 4 次提交
    • A
      Check with PosixEnv before opening LOCK file (#3993) · 1f32dc7d
      Andrew Kryczka 提交于
      Summary:
      Rebased and resubmitting #1831 on behalf of stevelittle.
      
      The problem is when a single process attempts to open the same DB twice, the second attempt fails due to LOCK file held. If the second attempt had opened the LOCK file, it'll now need to close it, and closing causes the file to be unlocked. Then, any subsequent attempt to open the DB will succeed, which is the wrong behavior.
      
      The solution was to track which files a process has locked in PosixEnv, and check those before opening a LOCK file.
      
      Fixes #1780.
      Closes https://github.com/facebook/rocksdb/pull/3993
      
      Differential Revision: D8398984
      
      Pulled By: ajkr
      
      fbshipit-source-id: 2755fe66950a0c9de63075f932f9e15768041918
      1f32dc7d
    • A
      Run manual compaction in stress/crash tests (#3936) · 7497f992
      Andrew Kryczka 提交于
      Summary:
      - Add support to `db_stress` for `CompactRange`
      - Enable `CompactRange` and `CompactFiles` in crash tests
      Closes https://github.com/facebook/rocksdb/pull/3936
      
      Differential Revision: D8230953
      
      Pulled By: ajkr
      
      fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8
      7497f992
    • A
      Choose unique keys faster in db_stress (#3990) · dd216dd7
      Andrew Kryczka 提交于
      Summary:
      db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR:
      
      - reuses the same set of randomly chosen no-overwrite keys across all CFs
      - logs a couple more timestamps so we can more easily see initialization time
      Closes https://github.com/facebook/rocksdb/pull/3990
      
      Differential Revision: D8393821
      
      Pulled By: ajkr
      
      fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34
      dd216dd7
    • A
      Avoid acquiring SyncPoint mutex when it is disabled (#3991) · a7204018
      Andrew Kryczka 提交于
      Summary:
      In `db_stress` profile the vast majority of CPU time is spent acquiring the `SyncPoint` mutex. I mistakenly assumed #3939 had fixed this mutex contention problem by disabling `SyncPoint` processing. But actually the lock was still being acquired just to check whether processing is enabled. We can avoid that overhead by using an atomic to track whether it's enabled.
      Closes https://github.com/facebook/rocksdb/pull/3991
      
      Differential Revision: D8393825
      
      Pulled By: ajkr
      
      fbshipit-source-id: 5bc4e3c722ee7304e7a9c2439998c456b05a6897
      a7204018
  9. 13 6月, 2018 4 次提交
    • S
      Fix regression bug of Prev() with upper bound (#3989) · d82f1421
      Siying Dong 提交于
      Summary:
      A recent change pushed down the upper bound checking to child iterators. However, this causes the logic of following sequence wrong:
        Seek(key);
        if (!Valid()) SeekToLast();
      Because !Valid() may be caused by upper bounds, rather than the end of the iterator. In this case SeekToLast() points to totally wrong places. This can cause wrong results, infinite loops, or segfault in some cases.
      This sequence is called when changing direction from forward to backward. And this by itself also implicitly happen during reseeking optimization in Prev().
      
      Fix this bug by using SeekForPrev() rather than this sequuence, as what is already done in prefix extrator case.
      Closes https://github.com/facebook/rocksdb/pull/3989
      
      Differential Revision: D8385422
      
      Pulled By: siying
      
      fbshipit-source-id: 429e869990cfd2dc389421e0836fc496bed67bb4
      d82f1421
    • A
      Fix argument mismatch in BlockBasedTableBuilder (#3974) · 9d347332
      Andrew Kryczka 提交于
      Summary:
      The sixth argument should be `key_includes_seq` bool, the seventh a `GetContext*`. We were mistakenly passing the `GetContext*` as the sixth argument and relying on the default (nullptr) for the seventh. This would make statistics inaccurate, at least.
      
      Blame: 402b7aa0
      Closes https://github.com/facebook/rocksdb/pull/3974
      
      Differential Revision: D8344907
      
      Pulled By: ajkr
      
      fbshipit-source-id: 3ad865a0541d6d30f75dfc726352788118cfe12e
      9d347332
    • S
      Fix a crash in WinEnvIO::GetSectorSize (#3975) · 9c7da963
      shpala 提交于
      Summary:
      Fix a crash in `WinEnvIO::GetSectorSize` that happens on old Windows systems (e.g Windows 7).
      On old Windows systems that don't support querying StorageAccessAlignmentProperty using IOCTL_STORAGE_QUERY_PROPERTY, the flow calls a different DeviceIoControl with nullptr as lpBytesReturned.
      When the code reaches this point, we get an access violation.
      Closes https://github.com/facebook/rocksdb/pull/3975
      
      Differential Revision: D8385186
      
      Pulled By: ajkr
      
      fbshipit-source-id: fae4c9b4b0a52c8a10182e1b35bcaa30dc393bbb
      9c7da963
    • F
      Remove restart point from the properties_block (#3970) · 35932753
      Fenggang Wu 提交于
      Summary:
      Property block will be read sequentially and cached in a heap located
      object, so there's no need for restart points. Thus we set the restart
      interval to infinity to save space.
      Closes https://github.com/facebook/rocksdb/pull/3970
      
      Differential Revision: D8332586
      
      Pulled By: fgwu
      
      fbshipit-source-id: 899c3267832a81d0f084ec2db6b387332f461134
      35932753
  10. 09 6月, 2018 1 次提交
  11. 08 6月, 2018 1 次提交