1. 31 7月, 2019 1 次提交
    • E
      Improve CPU Efficiency of ApproximateSize (part 2) (#5609) · 4834dab5
      Eli Pozniansky 提交于
      Summary:
      In some cases, we don't have to get really accurate number. Something like 10% off is fine, we can create a new option for that use case. In this case, we can calculate size for full files first, and avoid estimation inside SST files if full files got us a huge number. For example, if we already covered 100GB of data, we should be able to skip partial dives into 10 SST files of 30MB.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5609
      
      Differential Revision: D16433481
      
      Pulled By: elipoz
      
      fbshipit-source-id: 5830b31e1c656d0fd3a00d7fd2678ddc8f6e601b
      4834dab5
  2. 26 7月, 2019 1 次提交
  3. 18 7月, 2019 1 次提交
    • V
      Export Import sst files (#5495) · 22ce4624
      Venki Pallipadi 提交于
      Summary:
      Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135
      
      This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469
      "Add support for taking snapshot of a column family and creating column family from a given CF snapshot"
      
      We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.
      
      (1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
      // Exports all live SST files of a specified Column Family onto export_dir,
      // returning SST files information in metadata.
      // - SST files will be created as hard links when the directory specified
      //   is in the same partition as the db directory, copied otherwise.
      // - export_dir should not already exist and will be created by this API.
      // - Always triggers a flush.
      virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                        const std::string& export_dir,
                                        ExportImportFilesMetaData** metadata);
      
      Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
      metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
      returning the list of file metadata.
      
      (2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
      // CreateColumnFamilyWithImport() will create a new column family with
      // column_family_name and import external SST files specified in metadata into
      // this column family.
      // (1) External SST files can be created using SstFileWriter.
      // (2) External SST files can be exported from a particular column family in
      //     an existing DB.
      // Option in import_options specifies whether the external files are copied or
      // moved (default is copy). When option specifies copy, managing files at
      // external_file_path is caller's responsibility. When option specifies a
      // move, the call ensures that the specified files at external_file_path are
      // deleted on successful return and files are not modified on any error
      // return.
      // On error return, column family handle returned will be nullptr.
      // ColumnFamily will be present on successful return and will not be present
      // on error return. ColumnFamily may be present on any crash during this call.
      virtual Status CreateColumnFamilyWithImport(
          const ColumnFamilyOptions& options, const std::string& column_family_name,
          const ImportColumnFamilyOptions& import_options,
          const ExportImportFilesMetaData& metadata,
          ColumnFamilyHandle** handle);
      
      Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.
      
      If incoming sequence number is higher than current local sequence number, local sequence
      number is updated to reflect this.
      
      Note, as the sst files is are being moved across Column Families, Column Family name in sst file
      will no longer match the actual column family on destination DB. The API does not modify Column
      Family name or id in the sst files being imported.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495
      
      Differential Revision: D16018881
      
      fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
      22ce4624
  4. 21 6月, 2019 1 次提交
  5. 15 6月, 2019 1 次提交
    • S
      Validate CF Options when creating a new column family (#5453) · f1219644
      Sagar Vemuri 提交于
      Summary:
      It seems like CF Options are not properly validated  when creating a new column family with `CreateColumnFamily` API; only a selected few checks are done. Calling `ColumnFamilyData::ValidateOptions`, which is the single source for all CFOptions validations,  will help fix this. (`ColumnFamilyData::ValidateOptions` is already called at the time of `DB::Open`).
      
      **Test Plan:**
      Added a new test: `DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions`
      ```
      TEST_TMPDIR=/dev/shm ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions
      ```
      Also ran gtest-parallel to make sure the new test is not flaky.
      ```
      TEST_TMPDIR=/dev/shm ~/gtest-parallel/gtest-parallel ./db_test --gtest_filter=DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions --repeat=10000
      [10000/10000] DBTest.CreateColumnFamilyShouldFailOnIncompatibleOptions (15 ms)
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5453
      
      Differential Revision: D15816851
      
      Pulled By: sagar0
      
      fbshipit-source-id: 9e702b9850f5c4a7e0ef8d39e1e6f9b81e7fe1e5
      f1219644
  6. 13 6月, 2019 1 次提交
  7. 11 6月, 2019 1 次提交
  8. 04 6月, 2019 1 次提交
  9. 01 6月, 2019 1 次提交
  10. 31 5月, 2019 3 次提交
  11. 30 5月, 2019 1 次提交
  12. 12 4月, 2019 1 次提交
    • A
      Introduce a new MultiGet batching implementation (#5011) · fefd4b98
      anand76 提交于
      Summary:
      This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.
      
      Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
      1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
      2. Bloom filter cachelines can be prefetched, hiding the cache miss latency
      
      The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.
      
      Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).
      
      Batch   Sizes
      
      1        | 2        | 4         | 8      | 16  | 32
      
      Random pattern (Stride length 0)
      4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
      4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
      4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)
      
      Good locality (Stride length 16)
      4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
      4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
      4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135
      
      Good locality (Stride length 256)
      4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
      4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
      4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62
      
      Medium locality (Stride length 4096)
      4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
      4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
      4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891
      
      dbbench command used (on a DB with 4 levels, 12 million keys)-
      TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011
      
      Differential Revision: D14348703
      
      Pulled By: anand1976
      
      fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
      fefd4b98
  13. 02 3月, 2019 1 次提交
  14. 16 2月, 2019 1 次提交
    • A
      Deprecate ttl option from CompactionOptionsFIFO (#4965) · 3231a2e5
      Aubin Sanyal 提交于
      Summary:
      We introduced ttl option in CompactionOptionsFIFO when ttl-based file
      deletion (compaction) was supported only as part of FIFO Compaction. But
      with the extension of ttl semantics even to Level compaction,
      CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start
      using ColumnFamilyOptions.ttl for FIFO compaction as well.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965
      
      Differential Revision: D14072960
      
      Pulled By: sagar0
      
      fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e
      3231a2e5
  15. 15 2月, 2019 1 次提交
    • M
      Apply modernize-use-override (2nd iteration) · ca89ac2b
      Michael Liu 提交于
      Summary:
      Use C++11’s override and remove virtual where applicable.
      Change are automatically generated.
      
      Reviewed By: Orvid
      
      Differential Revision: D14090024
      
      fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
      ca89ac2b
  16. 13 2月, 2019 1 次提交
    • Y
      Atomic ingest (#4895) · a69d4dee
      Yanqin Jin 提交于
      Summary:
      Make file ingestion atomic.
      
       as title.
      Ingesting external SST files into multiple column families should be atomic. If
      a crash occurs and db reopens, either all column families have successfully
      ingested the files before the crash, or non of the ingestions have any effect
      on the state of the db.
      
      Also add unit tests for atomic ingestion.
      
      Note that the unit test here does not cover the case of incomplete atomic group
      in the MANIFEST, which is covered in VersionSetTest already.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895
      
      Differential Revision: D13718245
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198
      a69d4dee
  17. 08 2月, 2019 1 次提交
  18. 06 2月, 2019 2 次提交
    • Z
      exclude test CompactFilesShouldTriggerAutoCompaction from ROCKSDB_LITE (#4950) · 71cae59a
      Zhongyi Xie 提交于
      Summary:
      This will fix the following build error:
      
      > db/db_test.cc: In member function ‘virtual void rocksdb::DBTest_CompactFilesShouldTriggerAutoCompaction_Test::TestBody()’:
      > db/db_test.cc:5462:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      > db/db_test.cc:5490:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      > db/db_test.cc:5499:8: error: ‘class rocksdb::DB’ has no member named ‘GetColumnFamilyMetaData’
      >    db_->GetColumnFamilyMetaData(db_->DefaultColumnFamily(), &cf_meta_data);
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4950
      
      Differential Revision: D13965378
      
      Pulled By: miasantreble
      
      fbshipit-source-id: a975435476fe555b1cd9d5da263ee3da3acdea56
      71cae59a
    • J
      Fix potential DB hang while using CompactFiles (#4940) · c9a52cbd
      Jay Zhuang 提交于
      Summary:
      CompactFiles() may block auto compaction which could cuase DB hang when it
      reachs level0_stop_writes_trigger.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4940
      
      Differential Revision: D13929648
      
      Pulled By: cooldoger
      
      fbshipit-source-id: 10842df38df3bebf862cd1a120a88ce961fdd381
      c9a52cbd
  19. 14 12月, 2018 1 次提交
  20. 15 11月, 2018 1 次提交
  21. 10 11月, 2018 2 次提交
    • Y
      Fix DBTest.SoftLimit flakyness (#4658) · 859dbda6
      Yi Wu 提交于
      Summary:
      The flakyness can be reproduced with the following patch:
      ```
       --- a/db/db_impl_compaction_flush.cc
      +++ b/db/db_impl_compaction_flush.cc
      @@ -2013,6 +2013,9 @@ void DBImpl::BackgroundCallFlush() {
             if (job_context.HaveSomethingToDelete()) {
               PurgeObsoleteFiles(job_context);
             }
      +      static int f_count = 0;
      +      printf("clean flush job context %d\n", ++f_count);
      +      env_->SleepForMicroseconds(1000000);
             job_context.Clean();
             mutex_.Lock();
           }
      ```
      The issue is that FlushMemtable with opt.wait=true does not wait for `OnStallConditionsChanged` being called. The event listener is triggered on `JobContext::Clean`, which happens after flush result is installed. At the time we check for stall condition after flushing memtable, the job context cleanup may not be finished.
      
      To fix the flaykyness, we use sync point to create a custom WaitForFlush that waits for context cleanup.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4658
      
      Differential Revision: D13007301
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: d98395ee7b0ad4c62e83e8d0e9b6028058c61712
      859dbda6
    • S
      Update all unique/shared_ptr instances to be qualified with namespace std (#4638) · dc352807
      Sagar Vemuri 提交于
      Summary:
      Ran the following commands to recursively change all the files under RocksDB:
      ```
      find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
      ```
      Running `make format` updated some formatting on the files touched.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
      
      Differential Revision: D12934992
      
      Pulled By: sagar0
      
      fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
      dc352807
  22. 08 11月, 2018 1 次提交
  23. 27 10月, 2018 1 次提交
  24. 19 10月, 2018 1 次提交
  25. 10 10月, 2018 1 次提交
    • A
      Handle mixed slowdown/no_slowdown writer properly (#4475) · 854a4be0
      Anand Ananthabhotla 提交于
      Summary:
      There is a bug when the write queue leader is blocked on a write
      delay/stop, and the queue has writers with WriteOptions::no_slowdown set
      to true. They are not woken up until the write stall is cleared.
      
      The fix introduces a dummy writer inserted at the tail to indicate a
      write stall and prevent further inserts into the queue, and a condition
      variable that writers who can tolerate slowdown wait on before adding
      themselves to the queue. The leader calls WriteThread::BeginWriteStall()
      to add the dummy writer and then walk the queue to fail any writers with
      no_slowdown set. Once the stall clears, the leader calls
      WriteThread::EndWriteStall() to remove the dummy writer and signal the
      condition variable.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4475
      
      Differential Revision: D10285827
      
      Pulled By: anand1976
      
      fbshipit-source-id: 747465e5e7f07a829b1fb0bc1afcd7b93f4ab1a9
      854a4be0
  26. 30 8月, 2018 1 次提交
    • M
      Avoiding write stall caused by manual flushes (#4297) · 927f2749
      Mikhail Antonov 提交于
      Summary:
      Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening.
      
      One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable.
      
      This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297
      
      Differential Revision: D9420705
      
      Pulled By: mikhail-antonov
      
      fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc
      927f2749
  27. 19 7月, 2018 1 次提交
  28. 14 7月, 2018 1 次提交
    • M
      Per-thread unique test db names (#4135) · 8581a93a
      Maysam Yabandeh 提交于
      Summary:
      The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
      Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135
      
      Differential Revision: D8846653
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
      8581a93a
  29. 27 6月, 2018 1 次提交
  30. 22 6月, 2018 1 次提交
  31. 31 5月, 2018 1 次提交
  32. 23 5月, 2018 1 次提交
    • A
      Avoid sleep in DBTest.GroupCommitTest to fix flakiness · 7db721b9
      Andrew Kryczka 提交于
      Summary:
      DBTest.GroupCommitTest would often fail when run under valgrind because its sleeps were insufficient to guarantee a group commit had multiple entries. Instead we can use sync point to force a leader to wait until a non-leader thread has enqueued its work, thus guaranteeing a leader can do group commit work for multiple threads.
      Closes https://github.com/facebook/rocksdb/pull/3883
      
      Differential Revision: D8079429
      
      Pulled By: ajkr
      
      fbshipit-source-id: 61dc50fad29d2c85547842f681288de60fa29049
      7db721b9
  33. 22 5月, 2018 1 次提交
    • Z
      Move prefix_extractor to MutableCFOptions · c3ebc758
      Zhongyi Xie 提交于
      Summary:
      Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
      This PR aims to make it possible to dynamically change bloom filter config.
      Closes https://github.com/facebook/rocksdb/pull/3601
      
      Differential Revision: D7253114
      
      Pulled By: miasantreble
      
      fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
      c3ebc758
  34. 10 5月, 2018 1 次提交
  35. 13 4月, 2018 1 次提交
  36. 27 3月, 2018 1 次提交
    • M
      Fix race condition via concurrent FlushWAL · 35a4469b
      Maysam Yabandeh 提交于
      Summary:
      Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.
      
      Fixes #3599
      Closes https://github.com/facebook/rocksdb/pull/3656
      
      Differential Revision: D7405608
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934
      35a4469b