1. 03 3月, 2021 1 次提交
    • L
      Break down the amount of data written during flushes/compactions per file type (#8013) · a46f080c
      Levi Tamasi 提交于
      Summary:
      The patch breaks down the "bytes written" (as well as the "number of output files")
      compaction statistics into two, so the values are logged separately for table files
      and blob files in the info log, and are shown in separate columns (`Write(GB)` for table
      files, `Wblob(GB)` for blob files) when the compaction statistics are dumped.
      This will also come in handy for fixing the write amplification statistics, which currently
      do not consider the amount of data read from blob files during compaction. (This will
      be fixed by an upcoming patch.)
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/8013
      
      Test Plan: Ran `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D26742156
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 31d18ee8f90438b438ca7ed1ea8cbd92114442d5
      a46f080c
  2. 26 1月, 2021 1 次提交
    • M
      Add a SystemClock class to capture the time functions of an Env (#7858) · 12f11373
      mrambacher 提交于
      Summary:
      Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.
      
      Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.
      
      There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).
      
      Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858
      
      Reviewed By: pdillinger
      
      Differential Revision: D26006406
      
      Pulled By: mrambacher
      
      fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
      12f11373
  3. 20 12月, 2020 1 次提交
    • P
      aggregated-table-properties with GetMapProperty (#7779) · 4d1ac19e
      Peter Dillinger 提交于
      Summary:
      So that we can more easily get aggregate live table data such
      as total filter, index, and data sizes.
      
      Also adds ldb support for getting properties
      
      Also fixed some missing/inaccurate related comments in db.h
      
      For example:
      
          $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties
          rocksdb.aggregated-table-properties.data_size: 102871
          rocksdb.aggregated-table-properties.filter_size: 0
          rocksdb.aggregated-table-properties.index_partitions: 0
          rocksdb.aggregated-table-properties.index_size: 2232
          rocksdb.aggregated-table-properties.num_data_blocks: 100
          rocksdb.aggregated-table-properties.num_deletions: 0
          rocksdb.aggregated-table-properties.num_entries: 15000
          rocksdb.aggregated-table-properties.num_merge_operands: 0
          rocksdb.aggregated-table-properties.num_range_deletions: 0
          rocksdb.aggregated-table-properties.raw_key_size: 288890
          rocksdb.aggregated-table-properties.raw_value_size: 198890
          rocksdb.aggregated-table-properties.top_level_index_size: 0
          $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties-at-level1
          rocksdb.aggregated-table-properties-at-level1.data_size: 80909
          rocksdb.aggregated-table-properties-at-level1.filter_size: 0
          rocksdb.aggregated-table-properties-at-level1.index_partitions: 0
          rocksdb.aggregated-table-properties-at-level1.index_size: 1787
          rocksdb.aggregated-table-properties-at-level1.num_data_blocks: 81
          rocksdb.aggregated-table-properties-at-level1.num_deletions: 0
          rocksdb.aggregated-table-properties-at-level1.num_entries: 12466
          rocksdb.aggregated-table-properties-at-level1.num_merge_operands: 0
          rocksdb.aggregated-table-properties-at-level1.num_range_deletions: 0
          rocksdb.aggregated-table-properties-at-level1.raw_key_size: 238210
          rocksdb.aggregated-table-properties-at-level1.raw_value_size: 163414
          rocksdb.aggregated-table-properties-at-level1.top_level_index_size: 0
          $
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7779
      
      Test Plan: Added a test to ldb_test.py
      
      Reviewed By: jay-zhuang
      
      Differential Revision: D25653103
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 2905469a08a64dd6b5510cbd7be2e64d3234d6d3
      4d1ac19e
  4. 16 10月, 2020 1 次提交
    • L
      Introduce BlobFileCache and add support for blob files to Get() (#7540) · e8cb32ed
      Levi Tamasi 提交于
      Summary:
      The patch adds blob file support to the `Get` API by extending `Version` so that
      whenever a blob reference is read from a file, the blob is retrieved from the corresponding
      blob file and passed back to the caller. (This is assuming the blob reference is valid
      and the blob file is actually part of the given `Version`.) It also introduces a cache
      of `BlobFileReader`s called `BlobFileCache` that enables sharing `BlobFileReader`s
      between callers. `BlobFileCache` uses the same backing cache as `TableCache`, so
      `max_open_files` (if specified) limits the total number of open (table + blob) files.
      
      TODO: proactively open/cache blob files and pin the cache handles of the readers in the
      metadata objects similarly to what `VersionBuilder::LoadTableHandlers` does for
      table files.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7540
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D24260219
      
      Pulled By: ltamasi
      
      fbshipit-source-id: a8a2a4f11d3d04d6082201b52184bc4d7b0857ba
      e8cb32ed
  5. 08 10月, 2020 1 次提交
    • L
      Introduce a blob file reader class (#7461) · 22655a39
      Levi Tamasi 提交于
      Summary:
      The patch adds a class called `BlobFileReader` that can be used to retrieve blobs
      using the information available in blob references (e.g. blob file number, offset, and
      size). This will come in handy when implementing blob support for `Get`, `MultiGet`,
      and iterators, and also for compaction/garbage collection.
      
      When a `BlobFileReader` object is created (using the factory method `Create`),
      it first checks whether the specified file is potentially valid by comparing the file
      size against the combined size of the blob file header and footer (files smaller than
      the threshold are considered malformed). Then, it opens the file, and reads and verifies
      the header and footer. The verification involves magic number/CRC checks
      as well as checking for unexpected header/footer fields, e.g. incorrect column family ID
      or TTL blob files.
      
      Blobs can be retrieved using `GetBlob`. `GetBlob` validates the offset and compression
      type passed by the caller (because of the presence of the header and footer, the
      specified offset cannot be too close to the start/end of the file; also, the compression type
      has to match the one in the blob file header), and retrieves and potentially verifies and
      uncompresses the blob. In particular, when `ReadOptions::verify_checksums` is set,
      `BlobFileReader` reads the blob record header as well (as opposed to just the blob itself)
      and verifies the key/value size, the key itself, as well as the CRC of the blob record header
      and the key/value pair.
      
      In addition, the patch exposes the compression type from `BlobIndex` (both using an
      accessor and via `DebugString`), and adds a blob file read latency histogram to
      `InternalStats` that can be used with `BlobFileReader`.
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7461
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D23999219
      
      Pulled By: ltamasi
      
      fbshipit-source-id: deb6b1160d251258b308d5156e2ec063c3e12e5e
      22655a39
  6. 15 9月, 2020 1 次提交
    • L
      Integrate blob file writing with the flush logic (#7345) · b0e78341
      Levi Tamasi 提交于
      Summary:
      The patch adds support for writing blob files during flush by integrating
      `BlobFileBuilder` with the flush logic, most importantly, `BuildTable` and
      `CompactionIterator`. If `enable_blob_files` is set, large values are extracted
      to blob files and replaced with references. The resulting blob files are then
      logged to the MANIFEST as part of the flush job's `VersionEdit` and
      added to the `Version`, similarly to table files. Errors related to writing
      blob files fail the flush, and any blob files written by such jobs are immediately
      deleted (again, similarly to how SST files are handled). In addition, the patch
      extends the logging and statistics around flushes to account for the presence
      of blob files (e.g. `InternalStats::CompactionStats::bytes_written`, which is
      used for calculating write amplification, now considers the blob files as well).
      
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/7345
      
      Test Plan: Tested using `make check` and `db_bench`.
      
      Reviewed By: riversand963
      
      Differential Revision: D23506369
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 646885f22dfbe063f650d38a1fedc132f499a159
      b0e78341
  7. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  8. 08 1月, 2020 1 次提交
  9. 07 9月, 2019 1 次提交
  10. 13 7月, 2019 1 次提交
  11. 20 3月, 2019 1 次提交
    • Z
      Collect compaction stats by priority and dump to info LOG (#5050) · a291f3a1
      Zhongyi Xie 提交于
      Summary:
      In order to better understand compaction done by different priority thread pool, we now collect compaction stats by priority and also print them to info LOG through stats dump.
      
      ```
      ** Compaction Stats [default] **
      Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
      -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
       Low      0/0    0.00 KB   0.0     16.8    11.3      5.5       5.6      0.1       0.0   0.0    406.4    136.1     42.24             34.96        45    0.939     13M  8865K
      High      0/0    0.00 KB   0.0      0.0     0.0      0.0      11.4     11.4       0.0   0.0      0.0     76.2    153.00             35.74     12185    0.013       0      0
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5050
      
      Differential Revision: D14408583
      
      Pulled By: miasantreble
      
      fbshipit-source-id: e53746586ea27cb8abc9fec35805bd80ed30f608
      a291f3a1
  12. 30 1月, 2019 1 次提交
  13. 06 11月, 2018 1 次提交
    • A
      Add DB property for SST files kept from deletion (#4618) · fffac43c
      Andrew Kryczka 提交于
      Summary:
      This property can help debug why SST files aren't being deleted. Previously we only had the property "rocksdb.is-file-deletions-enabled". However, even when that returned true, obsolete SSTs may still not be deleted due to the coarse-grained mechanism we use to prevent newly created SSTs from being accidentally deleted. That coarse-grained mechanism uses a lower bound file number for SSTs that should not be deleted, and this property exposes that lower bound.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4618
      
      Differential Revision: D12898179
      
      Pulled By: ajkr
      
      fbshipit-source-id: fe68acc041ddbcc9276bbd48976524d95aafc776
      fffac43c
  14. 16 6月, 2018 1 次提交
  15. 04 5月, 2018 1 次提交
    • S
      Skip deleted WALs during recovery · d5954929
      Siying Dong 提交于
      Summary:
      This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.
      
      Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)
      
      This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
      Closes https://github.com/facebook/rocksdb/pull/3765
      
      Differential Revision: D7747618
      
      Pulled By: siying
      
      fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
      d5954929
  16. 19 4月, 2018 1 次提交
  17. 13 4月, 2018 1 次提交
  18. 12 4月, 2018 1 次提交
  19. 02 3月, 2018 1 次提交
    • Y
      Add "rocksdb.live-sst-files-size" DB property · bf937cf1
      Yi Wu 提交于
      Summary:
      Add "rocksdb.live-sst-files-size" DB property which only include files of latest version. Existing "rocksdb.total-sst-files-size" include files from all versions and thus include files that's obsolete but not yet deleted. I'm going to use this new property to cap blob db sst + blob files size.
      Closes https://github.com/facebook/rocksdb/pull/3548
      
      Differential Revision: D7116939
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: c6a52e45ce0f24ef78708156e1a923c1dd6bc79a
      bf937cf1
  20. 24 10月, 2017 1 次提交
    • Y
      Add DB::Properties::kEstimateOldestKeyTime · 66a2c44e
      Yi Wu 提交于
      Summary:
      With FIFO compaction we would like to get the oldest data time for monitoring. The problem is we don't have timestamp for each key in the DB. As an approximation, we expose the earliest of sst file "creation_time" property.
      
      My plan is to override the property with a more accurate value with blob db, where we actually have timestamp.
      Closes https://github.com/facebook/rocksdb/pull/2842
      
      Differential Revision: D5770600
      
      Pulled By: yiwu-arbug
      
      fbshipit-source-id: 03833c8f10bbfbee62f8ea5c0d03c0cafb5d853a
      66a2c44e
  21. 08 9月, 2017 1 次提交
  22. 31 8月, 2017 1 次提交
    • A
      Extend property map with compaction stats · 8a6708f5
      Artem Danilov 提交于
      Summary:
      This branch extends existing property map which keeps values in doubles to keep values in strings so that it can be used to provide wider range of properties. The immediate need for that is to provide IO stall stats in an easy parseable way to MyRocks which is also part of this branch.
      Closes https://github.com/facebook/rocksdb/pull/2794
      
      Differential Revision: D5717676
      
      Pulled By: Tema
      
      fbshipit-source-id: e34ba5b79ba774697f7b97ce1138d8fd55471b8a
      8a6708f5
  23. 16 7月, 2017 1 次提交
  24. 01 7月, 2017 1 次提交
  25. 30 6月, 2017 1 次提交
    • M
      Add a fetch_add variation to AddDBStats · e9f91a51
      Maysam Yabandeh 提交于
      Summary:
      AddDBStats is in two steps of load and store, which is more efficient than fetch_add. This is however not thread-safe. Currently we have to protect concurrent access to AddDBStats with a mutex which is less efficient that fetch_add.
      
      This patch adds the option to do fetch_add when AddDBStats. The results for my 2pc benchmark on sysbench is:
      - vanilla: 68618 tps
      - removing mutex on AddDBStats (unsafe): 69767 tps
      - fetch_add for all AddDBStats: 69200 tps
      - fetch_add only for concurrently access AddDBStats (this patch): 69579 tps
      Closes https://github.com/facebook/rocksdb/pull/2505
      
      Differential Revision: D5330656
      
      Pulled By: maysamyabandeh
      
      fbshipit-source-id: af64d7bee135b0e86b4fac323a4f9d9113eaa383
      e9f91a51
  26. 28 4月, 2017 1 次提交
  27. 22 4月, 2017 1 次提交
  28. 19 4月, 2017 1 次提交
  29. 11 4月, 2017 1 次提交
  30. 06 4月, 2017 1 次提交
    • I
      Use a human readable size for level report · c50e3750
      Islam AbdelRahman 提交于
      Summary:
      Current
      ```
      ** Compaction Stats [default] **
      Level    Files   Size(MB} Score Read(GB}  Rn(GB} Rnp1(GB} Write(GB} Wnew(GB} Moved(GB} W-Amp Rd(MB/s} Wr(MB/s} Comp(sec} Comp(cnt} Avg(sec} KeyIn KeyDrop
      ----------------------------------------------------------------------------------------------------------------------------------------------------------
        L0      2/0      49.02   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     76.1         1         2    0.322       0      0
       Sum      2/0      49.02   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
       Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
      ```
      
      New
      ```
      ** Compaction Stats [default] **
      Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn Key
      Closes https://github.com/facebook/rocksdb/pull/2055
      
      Differential Revision: D4804576
      
      Pulled By: IslamAbdelRahman
      
      fbshipit-source-id: 719be6a
      c50e3750
  31. 30 3月, 2017 1 次提交
  32. 09 2月, 2017 1 次提交
  33. 29 12月, 2016 1 次提交
  34. 12 11月, 2016 1 次提交
  35. 22 9月, 2016 1 次提交
  36. 17 6月, 2016 1 次提交
    • I
      Add InternalStats and logging for AddFile() · 30a24f2d
      Islam AbdelRahman 提交于
      Summary:
      We dont report the bytes that we ingested from AddFile which make the write amplification numbers incorrect
      Update InternalStats and add logging for AddFile()
      
      Test Plan: Make sure the code compile and existing tests pass
      
      Reviewers: lightmark, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D59763
      30a24f2d
  37. 26 4月, 2016 1 次提交
  38. 21 4月, 2016 1 次提交
    • A
      Add per-level compression ratio property · 73a847ef
      Andrew Kryczka 提交于
      Summary:
      This is needed so we can measure compression ratio improvements
      achieved by D52287.
      
      The property compares raw data size against the total file size for a given
      level. If the level is empty it should return 0.0.
      
      Test Plan: new unit test
      
      Reviewers: IslamAbdelRahman, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D56967
      73a847ef
  39. 05 3月, 2016 1 次提交
    • S
      Change Property name from "rocksdb.current_version_number" to... · 294bdf9e
      sdong 提交于
      Change Property name from "rocksdb.current_version_number" to "rocksdb.current-super-version-number"
      
      Summary: I realized I again is wrong about the naming convention. Let me change it to the correct one.
      
      Test Plan: Run unit tests.
      
      Reviewers: IslamAbdelRahman, kradhakrishnan, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55041
      294bdf9e
  40. 02 3月, 2016 1 次提交