1. 30 3月, 2020 1 次提交
    • Z
      Use FileChecksumGenFactory for SST file checksum (#6600) · e8d332d9
      Zhichao Cao 提交于
      Summary:
      In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600
      
      Test Plan: tested with make asan_check
      
      Reviewed By: riversand963
      
      Differential Revision: D20717670
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
      e8d332d9
  2. 24 3月, 2020 1 次提交
    • A
      Simplify migration to FileSystem API (#6552) · a9d168cf
      anand76 提交于
      Summary:
      The current Env/FileSystem API separation has a couple of issues -
      1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
      2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.
      
      This PR solves the above issues and simplifies the migration in the following ways -
      1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
      1a. - This also makes it more robust by ensuring that even if RocksDB
        has some stray calls to Env APIs rather than FileSystem, they will go
        through the same object and thus there is no risk of getting out of
        sync.
      2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
      PosixEnv with a custom FileSystem implementation. This eliminates an
      indirection to call Env APIs, and relieves the FileSystem developer of
      the burden of having to implement wrappers for the Env APIs.
      3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
      ```NewLogger()```
      
      Tests:
      1. New unit tests
      2. make check and make asan_check
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552
      
      Reviewed By: riversand963
      
      Differential Revision: D20592038
      
      Pulled By: anand1976
      
      fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
      a9d168cf
  3. 07 3月, 2020 1 次提交
    • Y
      Iterator with timestamp (#6255) · d93812c9
      Yanqin Jin 提交于
      Summary:
      Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.
      
      Create an iterator with timestamp.
      ```
      ...
      read_opts.timestamp = &ts;
      auto* iter = db->NewIterator(read_opts);
      // target is key without timestamp.
      for (iter->Seek(target); iter->Valid(); iter->Next()) {}
      for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
      delete iter;
      read_opts.timestamp = &ts1;
      // lower_bound and upper_bound are without timestamp.
      read_opts.iterate_lower_bound = &lower_bound;
      read_opts.iterate_upper_bound = &upper_bound;
      auto* iter1 = db->NewIterator(read_opts);
      // Do Seek or SeekToFirst()
      delete iter1;
      ```
      
      Test plan (dev server)
      ```
      $make check
      ```
      
      Simple benchmarking (dev server)
      1. The overhead introduced by this PR even when timestamp is disabled.
      key size: 16 bytes
      value size: 100 bytes
      Entries: 1000000
      Data reside in main memory, and try to stress iterator.
      Repeated three times on master and this PR.
      - Seek without next
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
      ```
      master: 159047.0 ops/sec
      this PR: 158922.3 ops/sec (2% drop in throughput)
      - Seek and next 10 times
      ```
      ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
      ```
      master: 109539.3 ops/sec
      this PR: 107519.7 ops/sec (2% drop in throughput)
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255
      
      Differential Revision: D19438227
      
      Pulled By: riversand963
      
      fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
      d93812c9
  4. 21 2月, 2020 1 次提交
    • S
      Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) · fdf882de
      sdong 提交于
      Summary:
      When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
      
      Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
      
      Differential Revision: D19977691
      
      fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
      fdf882de
  5. 11 2月, 2020 1 次提交
    • Z
      Checksum for each SST file and stores in MANIFEST (#6216) · 4369f2c7
      Zhichao Cao 提交于
      Summary:
      In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.
      
      Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216
      
      Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.
      
      Differential Revision: D19171461
      
      Pulled By: zhichao-cao
      
      fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
      4369f2c7
  6. 14 12月, 2019 1 次提交
    • A
      Introduce a new storage specific Env API (#5761) · afa2420c
      anand76 提交于
      Summary:
      The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
      
      This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
      
      The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
      
      This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
      
      The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
      
      Differential Revision: D18868376
      
      Pulled By: anand1976
      
      fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
      afa2420c
  7. 11 12月, 2019 1 次提交
  8. 18 10月, 2019 1 次提交
    • L
      Support decoding blob indexes in sst_dump (#5926) · fdc1cb43
      Levi Tamasi 提交于
      Summary:
      The patch adds a new command line parameter --decode_blob_index to sst_dump.
      If this switch is specified, sst_dump prints blob indexes in a human readable format,
      printing the blob file number, offset, size, and expiration (if applicable) for blob
      references, and the blob value (and expiration) for inlined blobs.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926
      
      Test Plan:
      Used db_bench's BlobDB mode to generate SST files containing blob references with
      and without expiration, as well as inlined blobs with and without expiration (note: the
      latter are stored as plain values), and confirmed sst_dump correctly prints all four types
      of records.
      
      Differential Revision: D17939077
      
      Pulled By: ltamasi
      
      fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d
      fdc1cb43
  9. 09 10月, 2019 1 次提交
    • Y
      Support custom env in sst_dump (#5845) · 167cdc9f
      Yanqin Jin 提交于
      Summary:
      This PR allows for the creation of custom env when using sst_dump. If
      the user does not set options.env or set options.env to nullptr, then sst_dump
      will automatically try to create a custom env depending on the path to the sst
      file or db directory. In order to use this feature, the user must call
      ObjectRegistry::Register() beforehand.
      
      Test Plan (on devserver):
      ```
      $make all && make check
      ```
      All tests must pass to ensure this change does not break anything.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845
      
      Differential Revision: D17678038
      
      Pulled By: riversand963
      
      fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626
      167cdc9f
  10. 21 9月, 2019 1 次提交
  11. 16 8月, 2019 1 次提交
    • S
      Add command "list_file_range_deletes" in ldb (#5615) · bd2c753d
      sdong 提交于
      Summary:
      Add a command in ldb so that users can print out tombstones in SST files.
      In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615
      
      Test Plan: Add a new unit test
      
      Differential Revision: D16550326
      
      fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e
      bd2c753d
  12. 24 7月, 2019 2 次提交
    • M
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293) · cfcf045a
      Mark Rambacher 提交于
      Summary:
      The ObjectRegistry class replaces the Registrar and NewCustomObjects.  Objects are registered with the registry by Type (the class must implement the static const char *Type() method).
      
      This change is necessary for a few reasons:
      - By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
      - By having a class with instances, different units could have different objects registered.  This could be useful if, for example, one Option allowed for a dynamic library and one did not.
      
      When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.
      
      Test plan (on riversand963's  devserver)
      ```
      $COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
      ```
      All tests pass.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293
      
      Differential Revision: D16363396
      
      Pulled By: riversand963
      
      fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
      cfcf045a
    • S
      ldb sometimes specify a string-append merge operator (#5607) · 3782accf
      sdong 提交于
      Summary:
      Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607
      
      Test Plan: Run ldb against a DB with merge operands and see the outputs.
      
      Differential Revision: D16442634
      
      fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
      3782accf
  13. 20 7月, 2019 1 次提交
  14. 10 7月, 2019 1 次提交
    • S
      Allow ldb to open DB as secondary (#5537) · aa0367aa
      sdong 提交于
      Summary:
      Right now ldb can open running DB through read-only DB. However, it might leave info logs files to the read-only DB directory. Add an option to open the DB as secondary to avoid it.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5537
      
      Test Plan:
      Run
      ./ldb scan  --max_keys=10 --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp --no_value --hex
      and
      ./ldb get 0x00000000000000103030303030303030 --hex --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp
      against a normal db_bench run and observe the output changes. Also observe that no new info logs files are created under /tmp/rocksdbtest-2491/dbbench.
      Run without --secondary_path and observe that new info logs created under /tmp/rocksdbtest-2491/dbbench.
      
      Differential Revision: D16113886
      
      fbshipit-source-id: 4e09dec47c2528f6ca08a9e7a7894ba2d9daebbb
      aa0367aa
  15. 04 7月, 2019 1 次提交
  16. 28 6月, 2019 1 次提交
  17. 14 6月, 2019 1 次提交
  18. 07 6月, 2019 1 次提交
  19. 01 6月, 2019 1 次提交
  20. 30 5月, 2019 1 次提交
  21. 17 4月, 2019 1 次提交
  22. 05 4月, 2019 1 次提交
  23. 27 3月, 2019 1 次提交
    • Y
      Support for single-primary, multi-secondary instances (#4899) · 9358178e
      Yanqin Jin 提交于
      Summary:
      This PR allows RocksDB to run in single-primary, multi-secondary process mode.
      The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
      Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.
      
      This PR has several components:
      1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.
      
      2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.
      
      3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
      3.1 Tail the primary's MANIFEST during recovery.
      3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
      3.3 Tailing WAL will be in a future PR.
      
      4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899
      
      Differential Revision: D14510945
      
      Pulled By: riversand963
      
      fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
      9358178e
  24. 26 3月, 2019 1 次提交
    • Z
      ldb command parsing: allow option values to contain equals signs (#5088) · 52e6404e
      Zhongyi Xie 提交于
      Summary:
      Right now ldb command doesn't allow cases where option values contain equals sign. For example,
      ```
      ldb --db=/tmp/test scan --from='q=3' --max_keys=1
      ```
      after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'.
      This PR updates the parsing logic so that it now supports the above mentioned cases
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088
      
      Differential Revision: D14600869
      
      Pulled By: miasantreble
      
      fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e
      52e6404e
  25. 13 3月, 2019 1 次提交
  26. 15 2月, 2019 1 次提交
    • M
      Apply modernize-use-override (2nd iteration) · ca89ac2b
      Michael Liu 提交于
      Summary:
      Use C++11’s override and remove virtual where applicable.
      Change are automatically generated.
      
      Reviewed By: Orvid
      
      Differential Revision: D14090024
      
      fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
      ca89ac2b
  27. 12 1月, 2019 1 次提交
  28. 04 1月, 2019 1 次提交
  29. 14 12月, 2018 1 次提交
  30. 28 11月, 2018 1 次提交
    • H
      Add SstFileReader to read sst files (#4717) · 5e72bc11
      Huachao Huang 提交于
      Summary:
      A user friendly sst file reader is useful when we want to access sst
      files outside of RocksDB. For example, we can generate an sst file
      with SstFileWriter and send it to other places, then use SstFileReader
      to read the file and process the entries in other ways.
      
      Also rename the original SstFileReader to SstFileDumper because of
      name conflict, and seems SstFileDumper is more appropriate for tools.
      
      TODO: there is only a very simple test now, because I want to get some feedback first.
      If the changes look good, I will add more tests soon.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717
      
      Differential Revision: D13212686
      
      Pulled By: ajkr
      
      fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
      5e72bc11
  31. 10 11月, 2018 1 次提交
    • S
      Update all unique/shared_ptr instances to be qualified with namespace std (#4638) · dc352807
      Sagar Vemuri 提交于
      Summary:
      Ran the following commands to recursively change all the files under RocksDB:
      ```
      find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
      find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
      ```
      Running `make format` updated some formatting on the files touched.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
      
      Differential Revision: D12934992
      
      Pulled By: sagar0
      
      fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
      dc352807
  32. 31 10月, 2018 1 次提交
    • A
      Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594) · eaaf1a6f
      Abhishek Madan 提交于
      Summary:
      Since the number of range deletions are reported in
      TableProperties, it is confusing to not report the number of merge
      operands and point deletions as top-level properties; they are
      accessible through the public API, but since they are not the "main"
      properties, they do not appear in aggregated table properties, or the
      string representation of table properties.
      
      This change promotes those two property keys to
      `rocksdb/table_properties.h`, adds corresponding uint64 members for
      them, deprecates the old access methods `GetDeletedKeys()` and
      `GetMergeOperands()` (though they are still usable for now), and removes
      `InternalKeyPropertiesCollector`. The property key strings are the same
      as before this change, so this should be able to read DBs written from older
      versions (though I haven't tested this yet).
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594
      
      Differential Revision: D12826893
      
      Pulled By: abhimadan
      
      fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
      eaaf1a6f
  33. 20 10月, 2018 1 次提交
    • Y
      Add read retry support to log reader (#4394) · da4aa59b
      Yanqin Jin 提交于
      Summary:
      Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.
      
      Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394
      
      Differential Revision: D9926508
      
      Pulled By: riversand963
      
      fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
      da4aa59b
  34. 14 9月, 2018 1 次提交
  35. 17 8月, 2018 1 次提交
  36. 10 8月, 2018 1 次提交
    • Y
      Add SST ingestion to ldb (#4205) · de7f423a
      Yanqin Jin 提交于
      Summary:
      We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205
      
      Differential Revision: D9112711
      
      Pulled By: riversand963
      
      fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
      de7f423a
  37. 07 7月, 2018 1 次提交
    • M
      WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) · b9846370
      Manuel Ung 提交于
      Summary:
      This adds support for recovering WriteUnprepared transactions through the following changes:
      - The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
      - `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
      - `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.
      
      Commit/Rollback of live transactions is still unimplemented and will come later.
      Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078
      
      Differential Revision: D8703382
      
      Pulled By: lth
      
      fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
      b9846370
  38. 21 6月, 2018 1 次提交
  39. 08 6月, 2018 1 次提交