Fix use_after_free bug when underlying FS enables kFSBuffer (#11645)

Summary: Fix use_after_free bug in async_io MultiReads when underlying FS enabled kFSBuffer. kFSBuffer is when underlying FS pass their own buffer instead of using RocksDB scratch in FSReadRequest Since it's an experimental feature, added a hack for now to fix the bug. Planning to make public API change to remove const from the callback as it doesn't make sense to use const. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11645 Test Plan: tested locally Reviewed By: ltamasi Differential Revision: D47819907 Pulled By: akankshamahajan15 fbshipit-source-id: 1faf5ef795bf27e2b3a60960374d91274931df8d

Fix use_after_free bug when underlying FS enables kFSBuffer (#11645)
Summary: Fix use_after_free bug in async_io MultiReads when underlying FS enabled kFSBuffer. kFSBuffer is when underlying FS pass their own buffer instead of using RocksDB scratch in FSReadRequest Since it's an experimental feature, added a hack for now to fix the bug. Planning to make public API change to remove const from the callback as it doesn't make sense to use const. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11645 Test Plan: tested locally Reviewed By: ltamasi Differential Revision: D47819907 Pulled By: akankshamahajan15 fbshipit-source-id: 1faf5ef795bf27e2b3a60960374d91274931df8d
69ddf2e0 · akankshamahajan · 05f24254 · 69ddf2e0 · 69ddf2e0
隐藏空白更改
内联并排

Showing with 11 addition and 2 deletion

HISTORY.md HISTORY.md +6 -2

util/async_file_reader.cc util/async_file_reader.cc +5 -0

未找到文件。
--- a/HISTORY.md
+++ b/HISTORY.md
 # Rocksdb Change Log
 > NOTE: Entries for next release do not go here. Follow instructions in `unreleased_history/README.txt`

+## Unreleased
+### Bug Fixes
+* Fix use_after_free bug in async_io MultiReads when underlying FS enabled kFSBuffer. kFSBuffer is when underlying FS pass their own buffer instead of using RocksDB scratch in FSReadRequest. Right now it's an experimental feature.
+
 ## 8.5.0 (07/21/2023)
 ### Public API Changes
 * Removed recently added APIs `GeneralCache` and `MakeSharedGeneralCache()` as our plan changed to stop exposing a general-purpose cache interface. The old forms of these APIs, `Cache` and `NewLRUCache()`, are still available, although general-purpose caching support will be dropped eventually.
@@ -18,7 +22,7 @@
 * Add FSReadRequest::fs_scratch which is a data buffer allocated and provided by underlying FileSystem to RocksDB during reads, when FS wants to provide its own buffer with data instead of using RocksDB provided FSReadRequest::scratch. This can help in cpu optimization by avoiding copy from file system's buffer to RocksDB buffer. More details on how to use/enable it in file_system.h. Right now its supported only for MultiReads(async + sync) with non direct io.
 * Start logging non-zero user-defined timestamp sizes in WAL to signal user key format in subsequent records and use it during recovery. This change will break recovery from WAL files written by early versions that contain user-defined timestamps. The workaround is to ensure there are no WAL files to recover (i.e. by flushing before close) before upgrade.
 * Added new property "rocksdb.obsolete-sst-files-size-property" that reports the size of SST files that have become obsolete but have not yet been deleted or scheduled for deletion
-* Start to record the value of the flag `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` in the Manifest and table properties for a SST file when it is created. And use the recorded flag when creating a table reader for the SST file. This flag is only explicitly record if it's false. 
+* Start to record the value of the flag `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` in the Manifest and table properties for a SST file when it is created. And use the recorded flag when creating a table reader for the SST file. This flag is only explicitly record if it's false.
 * Add a new option OptimisticTransactionDBOptions::shared_lock_buckets that enables sharing mutexes for validating transactions between DB instances, for better balancing memory efficiency and validation contention across DB instances. Different column families and DBs also now use different hash seeds in this validation, so that the same set of key names will not contend across DBs or column families.
 * Add a new ticker `rocksdb.files.marked.trash.deleted` to track the number of trash files deleted by background thread from the trash queue.
 * Add an API NewTieredVolatileCache() in include/rocksdb/cache.h to allocate an instance of a block cache with a primary block cache tier and a compressed secondary cache tier. A cache of this type distributes memory reservations against the block cache, such as WriteBufferManager, table reader memory etc., proportionally across both the primary and compressed secondary cache.
@@ -42,7 +46,7 @@ For Leveled Compaction users, `CompactRange()` with `bottommost_level_compaction
 ### Bug Fixes
 * Reduced cases of illegally using Env::Default() during static destruction by never destroying the internal PosixEnv itself (except for builds checking for memory leaks). (#11538)
 * Fix extra prefetching during seek in async_io when BlockBasedTableOptions.num_file_reads_for_auto_readahead is 1 leading to extra reads than required.
-* Fix a bug where compactions that are qualified to be run as 2 subcompactions were only run as one subcompaction. 
+* Fix a bug where compactions that are qualified to be run as 2 subcompactions were only run as one subcompaction.
 * Fix a use-after-move bug in block.cc.

 ## 8.3.0 (05/19/2023)

--- a/util/async_file_reader.cc
+++ b/util/async_file_reader.cc
@@ -26,6 +26,11 @@ bool AsyncFileReader::MultiReadAsyncImpl(ReadAwaiter* awaiter) {
          FSReadRequest* read_req = static_cast<FSReadRequest*>(cb_arg);
          read_req->status = req.status;
          read_req->result = req.result;
+          if (req.fs_scratch != nullptr) {
+            // TODO akanksha: Revisit to remove the const in the callback.
+            FSReadRequest& req_tmp = const_cast<FSReadRequest&>(req);
+            read_req->fs_scratch = std::move(req_tmp.fs_scratch);
+          }
        },
        &awaiter->read_reqs_[i], &awaiter->io_handle_[i], &awaiter->del_fn_[i],
        /*aligned_buf=*/nullptr);