1. 18 6月, 2020 1 次提交
    • Z
      Store DB identity and DB session ID in SST files (#6983) · 94d04529
      Zitan Chen 提交于
      Summary:
      `db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.
      
      The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.
      
      In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.
      
      A table property test is added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983
      
      Test Plan: make check and some manual tests.
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D22048826
      
      Pulled By: gg814
      
      fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
      94d04529
  2. 17 6月, 2020 1 次提交
    • Y
      Fix a bug of overwriting return code (#6989) · b7bab480
      Yanqin Jin 提交于
      Summary:
      In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6989
      
      Test Plan: make check
      
      Reviewed By: ajkr
      
      Differential Revision: D22071418
      
      Pulled By: riversand963
      
      fbshipit-source-id: 5a4ea5dfb1a41f41c7a3fdaf62b163007b42f04b
      b7bab480
  3. 16 6月, 2020 2 次提交
    • Y
      Let best-efforts recovery ignore CURRENT file (#6970) · 9bfd46d0
      Yanqin Jin 提交于
      Summary:
      Best-efforts recovery does not check the content of CURRENT file to determine which MANIFEST to recover from. However, it still checks the presence of CURRENT file to determine whether to create a new DB during `open()`. Therefore, we can tweak the logic in `open()` a little bit so that best-efforts recovery does not rely on CURRENT file at all.
      
      Test plan (dev server):
      make check
      ./db_basic_test --gtest_filter=DBBasicTest.RecoverWithNoCurrentFile
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6970
      
      Reviewed By: anand1976
      
      Differential Revision: D22013990
      
      Pulled By: riversand963
      
      fbshipit-source-id: db552a1868c60ed70e1f7cd252a3a076eb8ea58f
      9bfd46d0
    • Z
      Add a DB Session ID (#6959) · 88db97b0
      Zitan Chen 提交于
      Summary:
      Added DB::GetDbSessionId by using the same format and machinery as DB::GetDbIdentity.
      The DB Session ID is generated (and therefore, updated) each time a DB object is opened. It is written to the LOG file right after the line of “DB SUMMARY”.
      A test for the uniqueness, for different openings and during the same opening, is also added.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6959
      
      Test Plan: Passed make check
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21951721
      
      Pulled By: gg814
      
      fbshipit-source-id: 958a48a612db49a39998ea703cded45987d3fa8b
      88db97b0
  4. 14 6月, 2020 1 次提交
    • Z
      Fix persistent cache on windows (#6932) · 9c24a5cb
      Zhen Li 提交于
      Summary:
      Persistent cache feature caused rocks db crash on windows. I posted a issue for it, https://github.com/facebook/rocksdb/issues/6919. I found this is because no "persistent_cache_key_prefix" is generated for persistent cache. Looking repo history, "GetUniqueIdFromFile" is not implemented on Windows. So my fix is adding "NewId()" function in "persistent_cache" and using it to generate prefix for persistent cache. In this PR, i also re-enable related test cases defined in "db_test2" and "persistent_cache_test" for windows.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6932
      
      Test Plan:
      1. run related test cases in "db_test2" and "persistent_cache_test" on windows and see it passed.
      2. manually run db_bench.exe with "read_cache_path" and verified.
      
      Reviewed By: riversand963
      
      Differential Revision: D21911608
      
      Pulled By: cheng-chang
      
      fbshipit-source-id: cdfd938d54a385edbb2836b13aaa1d39b0a6f1c2
      9c24a5cb
  5. 13 6月, 2020 1 次提交
    • L
      Maintain the set of linked SSTs in BlobFileMetaData (#6945) · 83833637
      Levi Tamasi 提交于
      Summary:
      The `FileMetaData` objects associated with table files already contain the
      number of the oldest blob file referenced by the SST in question. This patch
      adds the inverse mapping to `BlobFileMetaData`, namely the set of table file
      numbers for which the oldest blob file link points to the given blob file (these
      are referred to as *linked SSTs*). This mapping will be used by the GC logic.
      
      Implementation-wise, the patch builds on the `BlobFileMetaDataDelta`
      functionality introduced in https://github.com/facebook/rocksdb/pull/6835: newly linked/unlinked SSTs are
      accumulated in `BlobFileMetaDataDelta`, and the changes to the linked SST set
      are applied in one shot when the new `Version` is saved. The patch also reworks
      the blob file related consistency checks in `VersionBuilder` so they validate the
      consistency of the forward table file -> blob file links and the backward blob file ->
      table file links for blob files that are part of the `Version`.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6945
      
      Test Plan: `make check`
      
      Reviewed By: riversand963
      
      Differential Revision: D21912228
      
      Pulled By: ltamasi
      
      fbshipit-source-id: c5bc7acf6e729a8fccbb12672dd5cd00f6f000f8
      83833637
  6. 12 6月, 2020 4 次提交
    • Y
      Fail point-in-time WAL recovery upon IOError reading WAL (#6963) · 717749f4
      Yanqin Jin 提交于
      Summary:
      If `options.wal_recovery_mode == WALRecoveryMode::kPointInTimeRecovery`, RocksDB stops replaying WAL once hitting an error and discards the rest of the WAL. This can lead to data loss if the error occurs at an offset smaller than the last sync'ed offset.
      Ideally, RocksDB point-in-time recovery should permit recovery if the error occurs after last synced offset while fail recovery if error occurs before the last synced offset. However, RocksDB does not track the synced offset of WALs. Consequently, RocksDB does not know whether an error occurs before or after the last synced offset. An error can be one of the following.
      - WAL record checksum mismatch. This can result from both corruption of synced data and dropping of unsynced data during shutdown. We cannot be sure which one. In order not to defeat the original motivation to permit the latter case, we keep the original behavior of point-in-time WAL recovery.
      - IOError. This means the WAL can be bad, an indicator of whole file becoming unavailable, not to mention synced part of the WAL. Therefore, we choose to modify the behavior of point-in-time recovery and fail the database recovery.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6963
      
      Reviewed By: ajkr
      
      Differential Revision: D22011083
      
      Pulled By: riversand963
      
      fbshipit-source-id: f9cbf29a37dc5cc40d3fa62f89eed1ad67ca1536
      717749f4
    • L
      Revisit the handling of the case when a file is re-added to the same level (#6939) · d854abad
      Levi Tamasi 提交于
      Summary:
      https://github.com/facebook/rocksdb/pull/6901 subtly changed the handling of the corner case
      when a table file is deleted from a level, then re-added to the same level. (Note: this
      should be extremely rare; one scenario that comes to mind is a trivial move followed by
      a call to `ReFitLevel` that moves the file back to the original level.) Before that change,
      a new `FileMetaData` object was created as a result of this sequence; after the change,
      the original `FileMetaData` was essentially resurrected (since the deletion and the addition
      simply cancel each other out with the change). This patch restores the original behavior,
      which is more intuitive considering the interface, and in sync with how trivial moves are handled.
      (Also note that `FileMetaData` contains some mutable data members, the values of which
      might be different in the resurrected object and the freshly created one.)
      The PR also fixes a bug in this area: with the original pre-6901 code, `VersionBuilder`
      would add the same file twice to the same level in the scenario described above.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6939
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D21905580
      
      Pulled By: ltamasi
      
      fbshipit-source-id: da07ae45384ecf3c6c53506d106432d88a7ec9df
      d854abad
    • L
      Turn DBTest2.CompressionFailures into a parameterized test (#6968) · 722ebba8
      Levi Tamasi 提交于
      Summary:
      `DBTest2.CompressionFailures` currently tests many configurations
      sequentially using nested loops, which often leads to timeouts
      in our test system. The patch turns it into a parameterized test
      instead.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6968
      
      Test Plan: `make check`
      
      Reviewed By: siying
      
      Differential Revision: D22006954
      
      Pulled By: ltamasi
      
      fbshipit-source-id: f71f2f7108086b7651ecfce3d79a7fab24620b2c
      722ebba8
    • Z
      Ingest SST files with checksum information (#6891) · b3585a11
      Zhichao Cao 提交于
      Summary:
      Application can ingest SST files with file checksum information, such that during ingestion, DB is able to check data integrity and identify of the SST file. The PR introduces generate_and_verify_file_checksum to IngestExternalFileOption to control if the ingested checksum information should be verified with the generated checksum.
      
          1. If generate_and_verify_file_checksum options is *FALSE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enables the SST file checksum and the checksum function name matches the checksum function name in DB, we trust the ingested checksum, store it in Manifest. If the checksum function name does not match, we treat that as an error and fail the IngestExternalFile() call.
          2. If generate_and_verify_file_checksum options is *TRUE*: *1)* if DB does not enable SST file checksum, the checksum information ingested will be ignored; *2)* if DB enable the SST file checksum, we will use the checksum generator from DB to calculate the checksum for each ingested SST files after they are copied or moved. Then, compare the checksum results with the ingested checksum information: _A)_ if the checksum function name does not match, _verification always report true_ and we store the DB generated checksum information in Manifest. _B)_ if the checksum function name mach, and checksum match, ingestion continues and stores the checksum information in the Manifest. Otherwise, terminate file ingestion and report file corruption.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6891
      
      Test Plan: added unit test, pass make asan_check
      
      Reviewed By: pdillinger
      
      Differential Revision: D21935988
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 7b55f486632db467e76d72602218d0658aa7f6ed
      b3585a11
  7. 11 6月, 2020 1 次提交
  8. 10 6月, 2020 2 次提交
  9. 09 6月, 2020 2 次提交
    • A
      Fix a bug in looking up duplicate keys with MultiGet (#6953) · 1fb3593f
      anand76 提交于
      Summary:
      When MultiGet is called with duplicate keys, and the key matches the
      largest key in an SST file and the value type is merge, only the first
      instance of the duplicate key is returned with correct results. This is
      due to the incorrect assumption that if a key in a batch is equal to the
      largest key in the file, the next key cannot be present in that file.
      
      Tests:
      Add a new unit test
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6953
      
      Reviewed By: cheng-chang
      
      Differential Revision: D21935898
      
      Pulled By: anand1976
      
      fbshipit-source-id: a2cc327a15150e23fd997546ca64d1c33021cb4c
      1fb3593f
    • L
      Add convenience method GetFileMetaDataByNumber (#6940) · f5e64945
      Levi Tamasi 提交于
      Summary:
      The patch adds a convenience method `GetFileMetaDataByNumber` that
      builds on the `FileLocation` functionality introduced recently (see
      https://github.com/facebook/rocksdb/pull/6862). This method makes it possible to
      retrieve the `FileMetaData` directly as opposed to having to go through
      `LevelFiles` and friends.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6940
      
      Test Plan: `make check`
      
      Reviewed By: cheng-chang
      
      Differential Revision: D21905946
      
      Pulled By: ltamasi
      
      fbshipit-source-id: af99e19de21242b2b4a87594a535c6028d16ee72
      f5e64945
  10. 08 6月, 2020 1 次提交
    • Y
      Remove unnecessary inclusion of version_edit.h in env (#6952) · 3020df9d
      Yanqin Jin 提交于
      Summary:
      In db_options.c, we should avoid including header files in the `db` directory to avoid introducing unnecessary dependency. The reason why `version_edit.h` has been included in `db_options.cc` is because we need two constants, `kUnknownChecksum` and `kUnknownChecksumFuncName`. We can put these two constants as `constexpr` in the public header `file_checksum.h`.
      
      Test plan (devserver):
      make check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6952
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21925341
      
      Pulled By: riversand963
      
      fbshipit-source-id: 2902f3b74c97f0cf16c58ad24c095c787c3a40e2
      3020df9d
  11. 06 6月, 2020 5 次提交
    • L
      Do not print messages to stderr in VersionBuilder (#6948) · f8c2e5a6
      Levi Tamasi 提交于
      Summary:
      RocksDB is an embedded library; we should not write to the application's
      console. Note: in each case, the same information is returned in the form of a
      `Status::Corruption` object.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6948
      
      Test Plan: `make check`
      
      Reviewed By: ajkr
      
      Differential Revision: D21914965
      
      Pulled By: ltamasi
      
      fbshipit-source-id: ae4b66789aa6b659eb8cc2ed4a048187962c86cc
      f8c2e5a6
    • L
      Fix up a VersionBuilder test case (#6942) · 8988f831
      Levi Tamasi 提交于
      Summary:
      We currently do not have any validation that would ensure that the `FileMetaData`
      objects are equivalent when a file gets deleted from the LSM tree and then re-added
      (think trivial moves); however, if we did, this test case would be in violation. The patch
      changes the values used in the test case so they are consistent.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6942
      
      Test Plan: `make check`
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21911366
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2f0486f8337373a6a111b6f28433d70507857104
      8988f831
    • Z
      Disable OpenForReadOnly tests in the LITE mode (#6947) · 23e446a1
      Zitan Chen 提交于
      Summary:
      Disable two OpenForReadOnly tests in the LITE mode
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6947
      
      Test Plan: passed db_test2
      
      Reviewed By: cheng-chang
      
      Differential Revision: D21914345
      
      Pulled By: gg814
      
      fbshipit-source-id: 58e81baf5d8cf8adcedaef3966aa3a427bbdf7c2
      23e446a1
    • A
      Check iterator status BlockBasedTableReader::VerifyChecksumInBlocks() (#6909) · 98b0cbea
      anand76 提交于
      Summary:
      The ```for``` loop in ```VerifyChecksumInBlocks``` only checks ```index_iter->Valid()``` which could be ```false``` either due to reaching the end of the index or, in case of partitioned index, it could be due to a checksum mismatch error when reading a 2nd level index block. Instead of throwing away the index iterator status, we need to return any errors back to the caller.
      
      Tests:
      Add a test in block_based_table_reader_test.cc.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6909
      
      Reviewed By: pdillinger
      
      Differential Revision: D21833922
      
      Pulled By: anand1976
      
      fbshipit-source-id: bc778ebf1121dbbdd768689de5183f07a9f0beae
      98b0cbea
    • A
      Add logs and stats in DeleteScheduler (#6927) · 2677bd59
      Akanksha Mahajan 提交于
      Summary:
      Add logs and stats for files marked as trash and files deleted immediately in DeleteScheduler
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6927
      
      Test Plan: make check -j64
      
      Reviewed By: riversand963
      
      Differential Revision: D21869068
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: e9f673c4fa8049ce648b23c75d742f2f9c6c57a1
      2677bd59
  12. 05 6月, 2020 1 次提交
  13. 04 6月, 2020 9 次提交
    • Y
      Fix a typo (bug) when setting error during Flush (#6928) · 2f326183
      Yanqin Jin 提交于
      Summary:
      As title. The prior change to the line is a typo. Fixing it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6928
      
      Test Plan: make check
      
      Reviewed By: zhichao-cao
      
      Differential Revision: D21873587
      
      Pulled By: riversand963
      
      fbshipit-source-id: f4837fc8792d7106bc230b7b499dfbb7a2847430
      2f326183
    • Z
      API change: DB::OpenForReadOnly will not write to the file system unless... · 02df00d9
      Zitan Chen 提交于
      API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900)
      
      Summary:
      DB::OpenForReadOnly will not write anything to the file system (i.e., create directories or files for the DB) unless create_if_missing is true.
      
      This change also fixes some subcommands of ldb, which write to the file system even if the purpose is for readonly.
      
      Two tests for this updated behavior of DB::OpenForReadOnly are also added.
      
      Other minor changes:
      1. Updated HISTORY.md to include this API change of DB::OpenForReadOnly;
      2. Updated the help information for the put and batchput subcommands of ldb with the option [--create_if_missing];
      3. Updated the comment of Env::DeleteDir to emphasize that it returns OK only if the directory to be deleted is empty.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6900
      
      Test Plan: passed make check; also manually tested a few ldb subcommands
      
      Reviewed By: pdillinger
      
      Differential Revision: D21822188
      
      Pulled By: gg814
      
      fbshipit-source-id: 604cc0f0d0326a937ee25a32cdc2b512f9a3be6e
      02df00d9
    • S
      Add (some) getters for options to the C API (#6925) · b7c825d5
      Stanislav Tkach 提交于
      Summary:
      Additionally I have extended the incomplete test added in the https://github.com/facebook/rocksdb/issues/6880.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6925
      
      Reviewed By: ajkr
      
      Differential Revision: D21869788
      
      Pulled By: pdillinger
      
      fbshipit-source-id: e9db80f259c57ca1bdcbc2c66cb938cb1ac26e48
      b7c825d5
    • S
      Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) · afa35188
      sdong 提交于
      Summary:
      This reverts commit 8d87e9ce.
      
      Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923
      
      Reviewed By: pdillinger
      
      Differential Revision: D21864799
      
      fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
      afa35188
    • H
      correct level information in version_set.cc (#6920) · ffe08ffc
      Hao Chen 提交于
      Summary:
      fix these two issues https://github.com/facebook/rocksdb/issues/6912  and https://github.com/facebook/rocksdb/issues/6667
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6920
      
      Reviewed By: cheng-chang
      
      Differential Revision: D21864885
      
      Pulled By: ajkr
      
      fbshipit-source-id: 10e21fc1851b67a59d44358f59c64fa5523bd263
      ffe08ffc
    • A
      Add zstd_max_train_bytes to c interop (#6796) · 22e5c513
      Anatoly Zhmur 提交于
      Summary:
      Added setting of zstd_max_train_bytes compression option parameter to c interop.
      
      rocksdb_options_set_bottommost_compression_options was using bool parameter and thus not exported, updated it to unsigned char and added to c.h as well.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6796
      
      Reviewed By: cheng-chang
      
      Differential Revision: D21611471
      
      Pulled By: ajkr
      
      fbshipit-source-id: caaaf153de934837ad9af283c7f8c025ff0b0cf5
      22e5c513
    • P
      Some fixes for gcc 4.8 and add to Travis (#6915) · 43f8a9dc
      Peter Dillinger 提交于
      Summary:
      People keep breaking the gcc 4.8 compilation due to different
      warnings for shadowing member functions with locals. Adding to Travis
      to keep compatibility. (gcc 4.8 is default on CentOS 7.)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6915
      
      Test Plan: local and Travis
      
      Reviewed By: siying
      
      Differential Revision: D21842894
      
      Pulled By: pdillinger
      
      fbshipit-source-id: bdcd4385127ee5d1cc222d87e53fb3695c32a9d4
      43f8a9dc
    • L
      Improve consistency checks in VersionBuilder (#6901) · 78e291b1
      Levi Tamasi 提交于
      Summary:
      The patch cleans up the code and improves the consistency checks around
      adding/deleting table files in `VersionBuilder`. Namely, it makes the checks
      stricter and improves them in the following ways:
      1) A table file can now only be deleted from the LSM tree using the level it
      resides on. Earlier, there was some unnecessary wiggle room for
      trivially moved files (they could be deleted using a lower level number than
      the actual one).
      2) A table file cannot be added to the tree if it is already present in the tree
      on any level (not just the target level). The earlier code only had an assertion
      (which is a no-op in release builds) that the newly added file is not already
      present on the target level.
      3) The above consistency checks around state transitions are now mandatory,
      as opposed to the earlier `CheckConsistencyForDeletes`, which was a no-op
      in release mode unless `force_consistency_checks` was set to `true`. The rationale
      here is that assuming that the initial state is consistent, a valid transition leads to a
      next state that is also consistent; however, an *invalid* transition offers no such
      guarantee. Hence it makes sense to validate the transitions unconditionally,
      and save `force_consistency_checks` for the paranoid checks that re-validate
      the entire state.
      4) The new checks build on the mechanism introduced in https://github.com/facebook/rocksdb/pull/6862,
      which enables us to efficiently look up the location (level and position within level)
      of files in a `Version` by file number. This makes the consistency checks much more
      efficient than the earlier `CheckConsistencyForDeletes`, which essentially
      performed a linear search.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6901
      
      Test Plan:
      Extended the unit tests and ran:
      
      `make check`
      `make whitebox_crash_test`
      
      Reviewed By: ajkr
      
      Differential Revision: D21822714
      
      Pulled By: ltamasi
      
      fbshipit-source-id: e2b29c8b6da1bf0f59004acc889e4870b2d18215
      78e291b1
    • P
      Fix handling of too-small filter partition size (#6905) · 9360776c
      Peter Dillinger 提交于
      Summary:
      Because ARM and some other platforms have a larger cache line
      size, they have a larger minimum filter size, which causes recently
      added PartitionedMultiGet test in db_bloom_filter_test to fail on those
      platforms. The code would actually end up using larger partitions,
      because keys_per_partition_ would be 0 and never == number of keys
      added.
      
      The code now attempts to get as close as possible to the small target
      size, while fully utilizing that filter size, if the target partition
      size is smaller than the minimum filter size.
      
      Also updated the test to break more uniformly across platforms
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6905
      
      Test Plan: updated test, tested on ARM
      
      Reviewed By: anand1976
      
      Differential Revision: D21840639
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 11684b6d35f43d2e98b85ddb2c8dcfd59d670817
      9360776c
  14. 03 6月, 2020 3 次提交
    • Z
      Fix potential overflow of unsigned type in for loop (#6902) · 2adb7e37
      Zhichao Cao 提交于
      Summary:
      x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902
      
      Test Plan: pass make asan_check
      
      Reviewed By: ajkr
      
      Differential Revision: D21843767
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
      2adb7e37
    • S
      Expose rocksdb_options_copy function to the C API (#6880) · 38f988d3
      Stanislav Tkach 提交于
      Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6880
      
      Reviewed By: ajkr
      
      Differential Revision: D21842752
      
      Pulled By: pdillinger
      
      fbshipit-source-id: eda326f551ddd9cb397681544b9e9799ea614e52
      38f988d3
    • P
      For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) · 14eca6bf
      Peter Dillinger 提交于
      Summary:
      The implementation of GetApproximateSizes was inconsistent in
      its treatment of the size of non-data blocks of SST files, sometimes
      including and sometimes now. This was at its worst with large portion
      of table file used by filters and querying a small range that crossed
      a table boundary: the size estimate would include large filter size.
      
      It's conceivable that someone might want only to know the size in terms
      of data blocks, but I believe that's unlikely enough to ignore for now.
      Similarly, there's no evidence the internal function AppoximateOffsetOf
      is used for anything other than a one-sided ApproximateSize, so I intend
      to refactor to remove redundancy in a follow-up commit.
      
      So to fix this, GetApproximateSizes (and implementation details
      ApproximateSize and ApproximateOffsetOf) now consistently include in
      their returned sizes a portion of table file metadata (incl filters
      and indexes) based on the size portion of the data blocks in range. In
      other words, if a key range covers data blocks that are X% by size of all
      the table's data blocks, returned approximate size is X% of the total
      file size. It would technically be more accurate to attribute metadata
      based on number of keys, but that's not computationally efficient with
      data available and rarely a meaningful difference.
      
      Also includes miscellaneous comment improvements / clarifications.
      
      Also included is a new approximatesizerandom benchmark for db_bench.
      No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784
      
      Test Plan:
      Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
      Old code running new test...
      
          [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
          db/db_test.cc:1562: Failure
          Expected: (size) <= (11 * 100), actual: 9478 vs 1100
      
      Other tests updated to reflect consistent accounting of metadata.
      
      Reviewed By: siying
      
      Differential Revision: D21334706
      
      Pulled By: pdillinger
      
      fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
      14eca6bf
  15. 02 6月, 2020 1 次提交
  16. 29 5月, 2020 3 次提交
    • A
      avoid `IterKey::UpdateInternalKey()` in `BlockIter` (#6843) · c5abf78b
      Andrew Kryczka 提交于
      Summary:
      `IterKey::UpdateInternalKey()` is an error-prone API as it's
      incompatible with `IterKey::TrimAppend()`, which is used for
      decoding delta-encoded internal keys. This PR stops using it in
      `BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s
      buffer when needed. The logic for safely getting a Slice with global
      seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`.
      `BinarySeek()` is also migrated to use this API (previously it ignored
      global seqno entirely).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6843
      
      Test Plan:
      benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
      created by regular flush; "ingestion_db" created by ingesting a file. Both
      DBs have same contents.
      
      ```
      $ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
      $ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
      $ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
      ```
      
      benchmark run command:
      ```
      TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000
      ```
      
      results:
      
      | DB | code | throughput |
      |---|---|---|
      | normal_db | master |  267.9 |
      | normal_db   |    PR6843 | 254.2 (-5.1%) |
      | ingestion_db |   master |  259.6 |
      | ingestion_db |   PR6843 | 250.5 (-3.5%) |
      
      Reviewed By: pdillinger
      
      Differential Revision: D21562604
      
      Pulled By: ajkr
      
      fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264
      c5abf78b
    • Y
      Add timestamp to delete (#6253) · 961c7590
      Yanqin Jin 提交于
      Summary:
      Preliminary user-timestamp support for delete.
      
      If ["a", ts=100] exists, you can delete it by calling `DB::Delete(write_options, key)` in which `write_options.timestamp` points to a `ts` higher than 100.
      
      Implementation
      A new ValueType, i.e. `kTypeDeletionWithTimestamp` is added for deletion marker with timestamp.
      The reason for a separate `kTypeDeletionWithTimestamp`: RocksDB may drop tombstones (keys with kTypeDeletion) when compacting them to the bottom level. This is OK and useful if timestamp is disabled. When timestamp is enabled, should we still reuse `kTypeDeletion`, we may drop the tombstone with a more recent timestamp, causing deleted keys to re-appear.
      
      Test plan (dev server)
      ```
      make check
      ```
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6253
      
      Reviewed By: ltamasi
      
      Differential Revision: D20995328
      
      Pulled By: riversand963
      
      fbshipit-source-id: a9e5c22968ad76f98e3dc6ee0151265a3f0df619
      961c7590
    • L
      Make it possible to look up files by number in VersionStorageInfo (#6862) · e3f953a8
      Levi Tamasi 提交于
      Summary:
      Does what it says on the can: the patch adds a hash map to `VersionStorageInfo`
      that maps file numbers to file locations, i.e. (level, position in level) pairs. This
      will enable stricter consistency checks in `VersionBuilder`. The patch also fixes
      all the unit tests that used duplicate file numbers in a version (which would trigger
      an assertion with the new code).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6862
      
      Test Plan:
      `make check`
      `make whitebox_crash_test`
      
      Reviewed By: riversand963
      
      Differential Revision: D21670446
      
      Pulled By: ltamasi
      
      fbshipit-source-id: 2eac249945cf33d8fb8597b26bfff5221e1a861a
      e3f953a8
  17. 28 5月, 2020 1 次提交
    • A
      Allow MultiGet users to limit cumulative value size (#6826) · bcefc59e
      Akanksha Mahajan 提交于
      Summary:
      1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key.
      2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6826
      
      Test Plan:
      1. make check -j64
      	   2. Add a new unit test case
      
      Reviewed By: anand1976
      
      Differential Revision: D21471483
      
      Pulled By: akankshamahajan15
      
      fbshipit-source-id: dea51b8e76d5d1df38ece8cdb29933b1d798b900
      bcefc59e
  18. 21 5月, 2020 1 次提交
    • Z
      Generate file checksum in SstFileWriter (#6859) · 545e14b5
      Zhichao Cao 提交于
      Summary:
      If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest.
      
      This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the  ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile().
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859
      
      Test Plan: add unit test and pass make asan_check.
      
      Reviewed By: ajkr
      
      Differential Revision: D21656247
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0
      545e14b5