提交 5fc57eec 编写于 作者: H Hui Xiao 提交者: Facebook GitHub Bot

Support parallel read and write/delete to same key in NonBatchedOpsStressTest (#11058)

Summary:
**Context:**
Current `NonBatchedOpsStressTest` does not allow multi-thread read (i.e, Get, Iterator) and write (i.e, Put, Merge) or delete to the same key. Every read or write/delete operation will acquire lock (`GetLocksForKeyRange`) on the target key to gain exclusive access to it. This does not align with RocksDB's nature of allowing multi-thread read and write/delete to the same key, that is concurrent threads can issue read/write/delete to RocksDB without external locking. Therefore this is a gap in our testing coverage.

To close the gap, biggest challenge remains in verifying db value against expected state in presence of parallel read and write/delete. The challenge is due to read/write/delete to the db and read/write to expected state is not within one atomic operation. Therefore we may not know the exact expected state of a certain db read, as by the time we read the expected state for that db read, another write to expected state for another db write to the same key might have changed the expected state.

**Summary:**
Credited to ajkr's idea, we now solve this challenge by breaking the 32-bits expected value of a key into different parts that can be read and write to in parallel.

Basically we divide the 32-bits expected value into `value_base` (corresponding to the previous whole 32 bits but now with some shrinking in the value base range we allow), `pending_write` (i.e, whether there is an ongoing concurrent write), `del_counter` (i.e, number of times a value has been deleted, analogous to value_base for write), `pending_delete` (similar to pending_write) and `deleted` (i.e whether a key is deleted).

Also, we need to use incremental `value_base` instead of random value base as before because we want to control the range of value base a correct db read result can possibly be in presence of parallel read and write. In that way, we can verify the correctness of the read against expected state more easily. This is at the cost of reducing the randomness of the value generated in NonBatchedOpsStressTest we are willing to accept.

(For detailed algorithm of how to use these parts to infer expected state of a key, see the PR)

Misc: hide value_base detail from callers of ExpectedState by abstracting related logics into ExpectedValue class

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11058

Test Plan:
- Manual test of small number of keys (i.e, high chances of parallel read and write/delete to same key) with equally distributed read/write/deleted for 30 min
```
python3 tools/db_crashtest.py --simple {blackbox|whitebox} --sync_fault_injection=1 --skip_verifydb=0 --continuous_verification_interval=1000 --clear_column_family_one_in=0 --max_key=10 --column_families=1 --threads=32 --readpercent=25 --writepercent=25 --nooverwritepercent=0 --iterpercent=25 --verify_iterator_with_expected_state_one_in=1 --num_iterations=5 --delpercent=15 --delrangepercent=10 --range_deletion_width=5 --use_merge={0|1} --use_put_entity_one_in=0 --use_txn=0 --verify_before_write=0 --user_timestamp_size=0 --compact_files_one_in=1000 --compact_range_one_in=1000 --flush_one_in=1000 --get_property_one_in=1000 --ingest_external_file_one_in=100 --backup_one_in=100 --checkpoint_one_in=100 --approximate_size_one_in=0 --acquire_snapshot_one_in=100 --use_multiget=0 --prefixpercent=0 --get_live_files_one_in=1000 --manual_wal_flush_one_in=1000 --pause_background_one_in=1000 --target_file_size_base=524288 --write_buffer_size=524288 --verify_checksum_one_in=1000 --verify_db_one_in=1000
```
- Rehearsal stress test for normal parameter and aggressive parameter to see if such change can find what existing stress test can find (i.e, no regression in testing capability)
- [Ongoing]Try to find new bugs with this change that are not found by current NonBatchedOpsStressTest with no parallel read and write/delete to same key

Reviewed By: ajkr

Differential Revision: D42257258

Pulled By: hx235

fbshipit-source-id: e6fdc18f1fad3753e5ac91731483a644d9b5b6eb
上级 fb636f24
......@@ -31,8 +31,7 @@ class BatchedOpsStressTest : public StressTest {
const std::string key_body = Key(rand_keys[0]);
const uint32_t value_base =
thread->rand.Next() % thread->shared->UNKNOWN_SENTINEL;
const uint32_t value_base = thread->rand.Next();
const size_t sz = GenerateValue(value_base, value, sizeof(value));
const std::string value_body = Slice(value, sz).ToString();
......
......@@ -43,12 +43,6 @@ class StressTest;
// State shared by all concurrent executions of the same benchmark.
class SharedState {
public:
// indicates a key may have any value (or not be present) as an operation on
// it is incomplete.
static constexpr uint32_t UNKNOWN_SENTINEL = 0xfffffffe;
// indicates a key should definitely be deleted
static constexpr uint32_t DELETION_SENTINEL = 0xffffffff;
// Errors when reading filter blocks are ignored, so we use a thread
// local variable updated via sync points to keep track of errors injected
// while reading filter blocks in order to ignore the Get/MultiGet result
......@@ -254,54 +248,70 @@ class SharedState {
return expected_state_manager_->ClearColumnFamily(cf);
}
// @param pending True if the update may have started but is not yet
// guaranteed finished. This is useful for crash-recovery testing when the
// process may crash before updating the expected values array.
// Prepare a Put that will be started but not finish yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
void Put(int cf, int64_t key, uint32_t value_base, bool pending) {
return expected_state_manager_->Put(cf, key, value_base, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PreparePut(int cf, int64_t key) {
return expected_state_manager_->PreparePut(cf, key);
}
// Requires external locking covering `key` in `cf`.
uint32_t Get(int cf, int64_t key) const {
// Does not requires external locking.
ExpectedValue Get(int cf, int64_t key) {
return expected_state_manager_->Get(cf, key);
}
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
// Prepare a Delete that will be started but not finish yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
bool Delete(int cf, int64_t key, bool pending) {
return expected_state_manager_->Delete(cf, key, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareDelete(int cf, int64_t key) {
return expected_state_manager_->PrepareDelete(cf, key);
}
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
//
// Requires external locking covering `key` in `cf`.
bool SingleDelete(int cf, int64_t key, bool pending) {
return expected_state_manager_->Delete(cf, key, pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareSingleDelete(int cf, int64_t key) {
return expected_state_manager_->PrepareSingleDelete(cf, key);
}
// @param pending See comment above Put()
// Returns number of keys deleted by the call.
//
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`.
int DeleteRange(int cf, int64_t begin_key, int64_t end_key, bool pending) {
return expected_state_manager_->DeleteRange(cf, begin_key, end_key,
pending);
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`
// to prevent concurrent write or delete to the same `key`.
std::vector<PendingExpectedValue> PrepareDeleteRange(int cf,
int64_t begin_key,
int64_t end_key) {
return expected_state_manager_->PrepareDeleteRange(cf, begin_key, end_key);
}
bool AllowsOverwrite(int64_t key) const {
return no_overwrite_ids_.find(key) == no_overwrite_ids_.end();
}
// Requires external locking covering `key` in `cf`.
// Requires external locking covering `key` in `cf` to prevent concurrent
// delete to the same `key`.
bool Exists(int cf, int64_t key) {
return expected_state_manager_->Exists(cf, key);
}
// Sync the `value_base` to the corresponding expected value
void SyncPut(int cf, int64_t key, uint32_t value_base) {
return expected_state_manager_->SyncPut(cf, key, value_base);
}
// Sync the corresponding expected value to be pending Put
void SyncPendingPut(int cf, int64_t key) {
return expected_state_manager_->SyncPendingPut(cf, key);
}
// Sync the corresponding expected value to be deleted
void SyncDelete(int cf, int64_t key) {
return expected_state_manager_->SyncDelete(cf, key);
}
uint32_t GetSeed() const { return seed_; }
void SetShouldStopBgThread() { should_stop_bg_thread_ = true; }
......
......@@ -483,12 +483,13 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
for (int64_t k = 0; k != number_of_keys; ++k) {
const std::string key = Key(k);
constexpr uint32_t value_base = 0;
PendingExpectedValue pending_expected_value =
shared->PreparePut(cf_idx, k);
const uint32_t value_base = pending_expected_value.GetFinalValueBase();
const size_t sz = GenerateValue(value_base, value, sizeof(value));
const Slice v(value, sz);
shared->Put(cf_idx, k, value_base, true /* pending */);
std::string ts;
if (FLAGS_user_timestamp_size > 0) {
......@@ -534,7 +535,7 @@ void StressTest::PreloadDbAndReopenAsReadOnly(int64_t number_of_keys,
}
}
shared->Put(cf_idx, k, value_base, false /* pending */);
pending_expected_value.Commit();
if (!s.ok()) {
break;
}
......@@ -614,8 +615,7 @@ void StressTest::ProcessRecoveredPreparedTxnsHelper(Transaction* txn,
for (wbwi_iter->SeekToFirst(); wbwi_iter->Valid(); wbwi_iter->Next()) {
uint64_t key_val;
if (GetIntVal(wbwi_iter->Entry().key.ToString(), &key_val)) {
shared->Put(static_cast<int>(i) /* cf_idx */, key_val,
0 /* value_base */, true /* pending */);
shared->SyncPendingPut(static_cast<int>(i) /* cf_idx */, key_val);
}
}
}
......
......@@ -3,17 +3,126 @@
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#include <atomic>
#ifdef GFLAGS
#include "db_stress_tool/expected_state.h"
#include "db/wide/wide_column_serialization.h"
#include "db_stress_tool/db_stress_common.h"
#include "db_stress_tool/db_stress_shared_state.h"
#include "db_stress_tool/expected_state.h"
#include "rocksdb/trace_reader_writer.h"
#include "rocksdb/trace_record_result.h"
namespace ROCKSDB_NAMESPACE {
void ExpectedValue::Put(bool pending) {
if (pending) {
SetPendingWrite();
} else {
SetValueBase(NextValueBase());
ClearDeleted();
ClearPendingWrite();
}
}
bool ExpectedValue::Delete(bool pending) {
if (!Exists()) {
return false;
}
if (pending) {
SetPendingDel();
} else {
SetDelCounter(NextDelCounter());
SetDeleted();
ClearPendingDel();
}
return true;
}
void ExpectedValue::SyncPut(uint32_t value_base) {
assert(ExpectedValue::IsValueBaseValid(value_base));
SetValueBase(value_base);
ClearDeleted();
ClearPendingWrite();
// This is needed in case crash happens during a pending delete of the key
// assocated with this expected value
ClearPendingDel();
}
void ExpectedValue::SyncPendingPut() { Put(true /* pending */); }
void ExpectedValue::SyncDelete() {
Delete(false /* pending */);
// This is needed in case crash happens during a pending write of the key
// assocated with this expected value
ClearPendingWrite();
}
uint32_t ExpectedValue::GetFinalValueBase() const {
return PendingWrite() ? NextValueBase() : GetValueBase();
}
uint32_t ExpectedValue::GetFinalDelCounter() const {
return PendingDelete() ? NextDelCounter() : GetDelCounter();
}
bool ExpectedValueHelper::MustHaveNotExisted(
ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value) {
const bool pre_read_expected_deleted = pre_read_expected_value.IsDeleted();
const uint32_t pre_read_expected_value_base =
pre_read_expected_value.GetValueBase();
const uint32_t post_read_expected_final_value_base =
post_read_expected_value.GetFinalValueBase();
const bool during_read_no_write_happened =
(pre_read_expected_value_base == post_read_expected_final_value_base);
return pre_read_expected_deleted && during_read_no_write_happened;
}
bool ExpectedValueHelper::MustHaveExisted(
ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value) {
const bool pre_read_expected_not_deleted =
!pre_read_expected_value.IsDeleted();
const uint32_t pre_read_expected_del_counter =
pre_read_expected_value.GetDelCounter();
const uint32_t post_read_expected_final_del_counter =
post_read_expected_value.GetFinalDelCounter();
const bool during_read_no_delete_happened =
(pre_read_expected_del_counter == post_read_expected_final_del_counter);
return pre_read_expected_not_deleted && during_read_no_delete_happened;
}
bool ExpectedValueHelper::InExpectedValueBaseRange(
uint32_t value_base, ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value) {
assert(ExpectedValue::IsValueBaseValid(value_base));
const uint32_t pre_read_expected_value_base =
pre_read_expected_value.GetValueBase();
const uint32_t post_read_expected_final_value_base =
post_read_expected_value.GetFinalValueBase();
if (pre_read_expected_value_base <= post_read_expected_final_value_base) {
const uint32_t lower_bound = pre_read_expected_value_base;
const uint32_t upper_bound = post_read_expected_final_value_base;
return lower_bound <= value_base && value_base <= upper_bound;
} else {
const uint32_t upper_bound_1 = post_read_expected_final_value_base;
const uint32_t lower_bound_2 = pre_read_expected_value_base;
const uint32_t upper_bound_2 = ExpectedValue::GetValueBaseMask();
return (value_base <= upper_bound_1) ||
(lower_bound_2 <= value_base && value_base <= upper_bound_2);
}
}
ExpectedState::ExpectedState(size_t max_key, size_t num_column_families)
: max_key_(max_key),
......@@ -21,70 +130,107 @@ ExpectedState::ExpectedState(size_t max_key, size_t num_column_families)
values_(nullptr) {}
void ExpectedState::ClearColumnFamily(int cf) {
std::fill(&Value(cf, 0 /* key */), &Value(cf + 1, 0 /* key */),
SharedState::DELETION_SENTINEL);
const uint32_t del_mask = ExpectedValue::GetDelMask();
std::fill(&Value(cf, 0 /* key */), &Value(cf + 1, 0 /* key */), del_mask);
}
void ExpectedState::Put(int cf, int64_t key, uint32_t value_base,
bool pending) {
if (!pending) {
// prevent expected-value update from reordering before Write
std::atomic_thread_fence(std::memory_order_release);
}
Value(cf, key).store(pending ? SharedState::UNKNOWN_SENTINEL : value_base,
std::memory_order_relaxed);
if (pending) {
// prevent Write from reordering before expected-value update
std::atomic_thread_fence(std::memory_order_release);
}
void ExpectedState::Precommit(int cf, int64_t key, const ExpectedValue& value) {
Value(cf, key).store(value.Read());
// To prevent low-level instruction reordering that results
// in db write happens before setting pending state in expected value
std::atomic_thread_fence(std::memory_order_release);
}
uint32_t ExpectedState::Get(int cf, int64_t key) const {
return Value(cf, key);
PendingExpectedValue ExpectedState::PreparePut(int cf, int64_t key) {
ExpectedValue expected_value = Load(cf, key);
const ExpectedValue orig_expected_value = expected_value;
expected_value.Put(true /* pending */);
const ExpectedValue pending_expected_value = expected_value;
expected_value.Put(false /* pending */);
const ExpectedValue final_expected_value = expected_value;
Precommit(cf, key, pending_expected_value);
return PendingExpectedValue(&Value(cf, key), orig_expected_value,
final_expected_value);
}
bool ExpectedState::Delete(int cf, int64_t key, bool pending) {
if (Value(cf, key) == SharedState::DELETION_SENTINEL) {
return false;
}
Put(cf, key, SharedState::DELETION_SENTINEL, pending);
return true;
ExpectedValue ExpectedState::Get(int cf, int64_t key) { return Load(cf, key); }
PendingExpectedValue ExpectedState::PrepareDelete(int cf, int64_t key,
bool* prepared) {
ExpectedValue expected_value = Load(cf, key);
const ExpectedValue orig_expected_value = expected_value;
bool res = expected_value.Delete(true /* pending */);
if (prepared) {
*prepared = res;
}
if (!res) {
return PendingExpectedValue(&Value(cf, key), orig_expected_value,
orig_expected_value);
}
const ExpectedValue pending_expected_value = expected_value;
expected_value.Delete(false /* pending */);
const ExpectedValue final_expected_value = expected_value;
Precommit(cf, key, pending_expected_value);
return PendingExpectedValue(&Value(cf, key), orig_expected_value,
final_expected_value);
}
bool ExpectedState::SingleDelete(int cf, int64_t key, bool pending) {
return Delete(cf, key, pending);
PendingExpectedValue ExpectedState::PrepareSingleDelete(int cf, int64_t key) {
return PrepareDelete(cf, key);
}
int ExpectedState::DeleteRange(int cf, int64_t begin_key, int64_t end_key,
bool pending) {
int covered = 0;
std::vector<PendingExpectedValue> ExpectedState::PrepareDeleteRange(
int cf, int64_t begin_key, int64_t end_key) {
std::vector<PendingExpectedValue> pending_expected_values;
for (int64_t key = begin_key; key < end_key; ++key) {
if (Delete(cf, key, pending)) {
++covered;
bool prepared = false;
PendingExpectedValue pending_expected_value =
PrepareDelete(cf, key, &prepared);
if (prepared) {
pending_expected_values.push_back(pending_expected_value);
}
}
return covered;
return pending_expected_values;
}
bool ExpectedState::Exists(int cf, int64_t key) {
// UNKNOWN_SENTINEL counts as exists. That assures a key for which overwrite
// is disallowed can't be accidentally added a second time, in which case
// SingleDelete wouldn't be able to properly delete the key. It does allow
// the case where a SingleDelete might be added which covers nothing, but
// that's not a correctness issue.
uint32_t expected_value = Value(cf, key).load();
return expected_value != SharedState::DELETION_SENTINEL;
return Load(cf, key).Exists();
}
void ExpectedState::Reset() {
const uint32_t del_mask = ExpectedValue::GetDelMask();
for (size_t i = 0; i < num_column_families_; ++i) {
for (size_t j = 0; j < max_key_; ++j) {
Value(static_cast<int>(i), j)
.store(SharedState::DELETION_SENTINEL, std::memory_order_relaxed);
Value(static_cast<int>(i), j).store(del_mask, std::memory_order_relaxed);
}
}
}
void ExpectedState::SyncPut(int cf, int64_t key, uint32_t value_base) {
ExpectedValue expected_value = Load(cf, key);
expected_value.SyncPut(value_base);
Value(cf, key).store(expected_value.Read());
}
void ExpectedState::SyncPendingPut(int cf, int64_t key) {
ExpectedValue expected_value = Load(cf, key);
expected_value.SyncPendingPut();
Value(cf, key).store(expected_value.Read());
}
void ExpectedState::SyncDelete(int cf, int64_t key) {
ExpectedValue expected_value = Load(cf, key);
expected_value.SyncDelete();
Value(cf, key).store(expected_value.Read());
}
void ExpectedState::SyncDeleteRange(int cf, int64_t begin_key,
int64_t end_key) {
for (int64_t key = begin_key; key < end_key; ++key) {
SyncDelete(cf, key);
}
}
FileExpectedState::FileExpectedState(std::string expected_state_file_path,
size_t max_key, size_t num_column_families)
: ExpectedState(max_key, num_column_families),
......@@ -385,7 +531,7 @@ class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,
if (!GetIntVal(key.ToString(), &key_id)) {
return Status::Corruption("unable to parse key", key.ToString());
}
uint32_t value_id = GetValueBase(value);
uint32_t value_base = GetValueBase(value);
bool should_buffer_write = !(buffered_writes_ == nullptr);
if (should_buffer_write) {
......@@ -393,8 +539,7 @@ class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,
key, value);
}
state_->Put(column_family_id, static_cast<int64_t>(key_id), value_id,
false /* pending */);
state_->SyncPut(column_family_id, static_cast<int64_t>(key_id), value_base);
++num_write_ops_;
return Status::OK();
}
......@@ -431,8 +576,7 @@ class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,
const uint32_t value_base = GetValueBase(columns.front().value());
state_->Put(column_family_id, static_cast<int64_t>(key_id), value_base,
false /* pending */);
state_->SyncPut(column_family_id, static_cast<int64_t>(key_id), value_base);
++num_write_ops_;
......@@ -454,8 +598,7 @@ class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,
column_family_id, key);
}
state_->Delete(column_family_id, static_cast<int64_t>(key_id),
false /* pending */);
state_->SyncDelete(column_family_id, static_cast<int64_t>(key_id));
++num_write_ops_;
return Status::OK();
}
......@@ -499,8 +642,9 @@ class ExpectedStateTraceRecordHandler : public TraceRecord::Handler,
buffered_writes_.get(), column_family_id, begin_key, end_key);
}
state_->DeleteRange(column_family_id, static_cast<int64_t>(begin_key_id),
static_cast<int64_t>(end_key_id), false /* pending */);
state_->SyncDeleteRange(column_family_id,
static_cast<int64_t>(begin_key_id),
static_cast<int64_t>(end_key_id));
++num_write_ops_;
return Status::OK();
}
......
......@@ -22,6 +22,174 @@
#include "util/string_util.h"
namespace ROCKSDB_NAMESPACE {
// This class is not thread-safe.
class ExpectedValue {
public:
static uint32_t GetValueBaseMask() { return VALUE_BASE_MASK; }
static uint32_t GetValueBaseDelta() { return VALUE_BASE_DELTA; }
static uint32_t GetDelCounterDelta() { return DEL_COUNTER_DELTA; }
static uint32_t GetDelMask() { return DEL_MASK; }
static bool IsValueBaseValid(uint32_t value_base) {
return IsValuePartValid(value_base, VALUE_BASE_MASK);
}
explicit ExpectedValue(uint32_t expected_value)
: expected_value_(expected_value) {}
bool Exists() const { return PendingWrite() || !IsDeleted(); }
uint32_t Read() const { return expected_value_; }
void Put(bool pending);
bool Delete(bool pending);
void SyncPut(uint32_t value_base);
void SyncPendingPut();
void SyncDelete();
uint32_t GetValueBase() const { return GetValuePart(VALUE_BASE_MASK); }
uint32_t NextValueBase() const {
return GetIncrementedValuePart(VALUE_BASE_MASK, VALUE_BASE_DELTA);
}
void SetValueBase(uint32_t new_value_base) {
SetValuePart(VALUE_BASE_MASK, new_value_base);
}
bool PendingWrite() const {
const uint32_t pending_write = GetValuePart(PENDING_WRITE_MASK);
return pending_write != 0;
}
void SetPendingWrite() {
SetValuePart(PENDING_WRITE_MASK, PENDING_WRITE_MASK);
}
void ClearPendingWrite() { ClearValuePart(PENDING_WRITE_MASK); }
uint32_t GetDelCounter() const { return GetValuePart(DEL_COUNTER_MASK); }
uint32_t NextDelCounter() const {
return GetIncrementedValuePart(DEL_COUNTER_MASK, DEL_COUNTER_DELTA);
}
void SetDelCounter(uint32_t new_del_counter) {
SetValuePart(DEL_COUNTER_MASK, new_del_counter);
}
bool PendingDelete() const {
const uint32_t pending_del = GetValuePart(PENDING_DEL_MASK);
return pending_del != 0;
}
void SetPendingDel() { SetValuePart(PENDING_DEL_MASK, PENDING_DEL_MASK); }
void ClearPendingDel() { ClearValuePart(PENDING_DEL_MASK); }
bool IsDeleted() const {
const uint32_t deleted = GetValuePart(DEL_MASK);
return deleted != 0;
}
void SetDeleted() { SetValuePart(DEL_MASK, DEL_MASK); }
void ClearDeleted() { ClearValuePart(DEL_MASK); }
uint32_t GetFinalValueBase() const;
uint32_t GetFinalDelCounter() const;
private:
static bool IsValuePartValid(uint32_t value_part, uint32_t value_part_mask) {
return (value_part & (~value_part_mask)) == 0;
}
// The 32-bit expected_value_ is divided into following parts:
// Bit 0 - 14: value base
static constexpr uint32_t VALUE_BASE_MASK = 0x7fff;
static constexpr uint32_t VALUE_BASE_DELTA = 1;
// Bit 15: whether write to this value base is pending (0 equals `false`)
static constexpr uint32_t PENDING_WRITE_MASK = (uint32_t)1 << 15;
// Bit 16 - 29: deletion counter (i.e, number of times this value base has
// been deleted)
static constexpr uint32_t DEL_COUNTER_MASK = 0x3fff0000;
static constexpr uint32_t DEL_COUNTER_DELTA = (uint32_t)1 << 16;
// Bit 30: whether deletion of this value base is pending (0 equals `false`)
static constexpr uint32_t PENDING_DEL_MASK = (uint32_t)1 << 30;
// Bit 31: whether this value base is deleted (0 equals `false`)
static constexpr uint32_t DEL_MASK = (uint32_t)1 << 31;
uint32_t GetValuePart(uint32_t value_part_mask) const {
return expected_value_ & value_part_mask;
}
uint32_t GetIncrementedValuePart(uint32_t value_part_mask,
uint32_t value_part_delta) const {
uint32_t current_value_part = GetValuePart(value_part_mask);
ExpectedValue temp_expected_value(current_value_part + value_part_delta);
return temp_expected_value.GetValuePart(value_part_mask);
}
void SetValuePart(uint32_t value_part_mask, uint32_t new_value_part) {
assert(IsValuePartValid(new_value_part, value_part_mask));
ClearValuePart(value_part_mask);
expected_value_ |= new_value_part;
}
void ClearValuePart(uint32_t value_part_mask) {
expected_value_ &= (~value_part_mask);
}
uint32_t expected_value_;
};
class PendingExpectedValue {
public:
explicit PendingExpectedValue(std::atomic<uint32_t>* value_ptr,
ExpectedValue orig_value,
ExpectedValue final_value)
: value_ptr_(value_ptr),
orig_value_(orig_value),
final_value_(final_value) {}
void Commit() {
// To prevent low-level instruction reordering that results
// in setting expected value happens before db write
std::atomic_thread_fence(std::memory_order_release);
value_ptr_->store(final_value_.Read());
}
uint32_t GetFinalValueBase() { return final_value_.GetValueBase(); }
private:
std::atomic<uint32_t>* const value_ptr_;
const ExpectedValue orig_value_;
const ExpectedValue final_value_;
};
class ExpectedValueHelper {
public:
// Return whether value is expected not to exist from begining till the end
// of the read based on `pre_read_expected_value` and
// `pre_read_expected_value`.
static bool MustHaveNotExisted(ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value);
// Return whether value is expected to exist from begining till the end of
// the read based on `pre_read_expected_value` and
// `pre_read_expected_value`.
static bool MustHaveExisted(ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value);
// Return whether the `value_base` falls within the expected value base
static bool InExpectedValueBaseRange(uint32_t value_base,
ExpectedValue pre_read_expected_value,
ExpectedValue post_read_expected_value);
};
// An `ExpectedState` provides read/write access to expected values for every
// key.
......@@ -38,43 +206,79 @@ class ExpectedState {
// Requires external locking covering all keys in `cf`.
void ClearColumnFamily(int cf);
// @param pending True if the update may have started but is not yet
// guaranteed finished. This is useful for crash-recovery testing when the
// process may crash before updating the expected values array.
// Prepare a Put that will be started but not finished yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
void Put(int cf, int64_t key, uint32_t value_base, bool pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PreparePut(int cf, int64_t key);
// Requires external locking covering `key` in `cf`.
uint32_t Get(int cf, int64_t key) const;
// Does not requires external locking.
ExpectedValue Get(int cf, int64_t key);
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
// Prepare a Delete that will be started but not finished yet
// This is useful for crash-recovery testing when the process may crash
// before updating the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
bool Delete(int cf, int64_t key, bool pending);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareDelete(int cf, int64_t key,
bool* prepared = nullptr);
// Requires external locking covering `key` in `cf` to prevent concurrent
// write or delete to the same `key`.
PendingExpectedValue PrepareSingleDelete(int cf, int64_t key);
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`
// to prevent concurrent write or delete to the same `key`.
std::vector<PendingExpectedValue> PrepareDeleteRange(int cf,
int64_t begin_key,
int64_t end_key);
// Update the expected value for start of an incomplete write or delete
// operation on the key assoicated with this expected value
void Precommit(int cf, int64_t key, const ExpectedValue& value);
// Requires external locking covering `key` in `cf` to prevent concurrent
// delete to the same `key`.
bool Exists(int cf, int64_t key);
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
// Sync the `value_base` to the corresponding expected value
//
// Requires external locking covering `key` in `cf`.
bool SingleDelete(int cf, int64_t key, bool pending);
// Requires external locking covering `key` in `cf` or be in single thread
// to prevent concurrent write or delete to the same `key`
void SyncPut(int cf, int64_t key, uint32_t value_base);
// @param pending See comment above Put()
// Returns number of keys deleted by the call.
// Sync the corresponding expected value to be pending Put
//
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`.
int DeleteRange(int cf, int64_t begin_key, int64_t end_key, bool pending);
// Requires external locking covering `key` in `cf` or be in single thread
// to prevent concurrent write or delete to the same `key`
void SyncPendingPut(int cf, int64_t key);
// Requires external locking covering `key` in `cf`.
bool Exists(int cf, int64_t key);
// Sync the corresponding expected value to be deleted
//
// Requires external locking covering `key` in `cf` or be in single thread
// to prevent concurrent write or delete to the same `key`
void SyncDelete(int cf, int64_t key);
// Sync the corresponding expected values to be deleted
//
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`
// to prevent concurrent write or delete to the same `key`
void SyncDeleteRange(int cf, int64_t begin_key, int64_t end_key);
private:
// Requires external locking covering `key` in `cf`.
// Does not requires external locking.
std::atomic<uint32_t>& Value(int cf, int64_t key) const {
return values_[cf * max_key_ + key];
}
// Does not requires external locking
ExpectedValue Load(int cf, int64_t key) const {
return ExpectedValue(Value(cf, key).load());
}
const size_t max_key_;
const size_t num_column_families_;
......@@ -160,45 +364,52 @@ class ExpectedStateManager {
// Requires external locking covering all keys in `cf`.
void ClearColumnFamily(int cf) { return latest_->ClearColumnFamily(cf); }
// @param pending True if the update may have started but is not yet
// guaranteed finished. This is useful for crash-recovery testing when the
// process may crash before updating the expected values array.
//
// Requires external locking covering `key` in `cf`.
void Put(int cf, int64_t key, uint32_t value_base, bool pending) {
return latest_->Put(cf, key, value_base, pending);
// See ExpectedState::PreparePut()
PendingExpectedValue PreparePut(int cf, int64_t key) {
return latest_->PreparePut(cf, key);
}
// Requires external locking covering `key` in `cf`.
uint32_t Get(int cf, int64_t key) const { return latest_->Get(cf, key); }
// See ExpectedState::Get()
ExpectedValue Get(int cf, int64_t key) { return latest_->Get(cf, key); }
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
//
// Requires external locking covering `key` in `cf`.
bool Delete(int cf, int64_t key, bool pending) {
return latest_->Delete(cf, key, pending);
// See ExpectedState::PrepareDelete()
PendingExpectedValue PrepareDelete(int cf, int64_t key) {
return latest_->PrepareDelete(cf, key);
}
// @param pending See comment above Put()
// Returns true if the key was not yet deleted.
//
// Requires external locking covering `key` in `cf`.
bool SingleDelete(int cf, int64_t key, bool pending) {
return latest_->SingleDelete(cf, key, pending);
// See ExpectedState::PrepareSingleDelete()
PendingExpectedValue PrepareSingleDelete(int cf, int64_t key) {
return latest_->PrepareSingleDelete(cf, key);
}
// @param pending See comment above Put()
// Returns number of keys deleted by the call.
//
// Requires external locking covering keys in `[begin_key, end_key)` in `cf`.
int DeleteRange(int cf, int64_t begin_key, int64_t end_key, bool pending) {
return latest_->DeleteRange(cf, begin_key, end_key, pending);
// See ExpectedState::PrepareDeleteRange()
std::vector<PendingExpectedValue> PrepareDeleteRange(int cf,
int64_t begin_key,
int64_t end_key) {
return latest_->PrepareDeleteRange(cf, begin_key, end_key);
}
// Requires external locking covering `key` in `cf`.
// See ExpectedState::Exists()
bool Exists(int cf, int64_t key) { return latest_->Exists(cf, key); }
// See ExpectedState::SyncPut()
void SyncPut(int cf, int64_t key, uint32_t value_base) {
return latest_->SyncPut(cf, key, value_base);
}
// See ExpectedState::SyncPendingPut()
void SyncPendingPut(int cf, int64_t key) {
return latest_->SyncPendingPut(cf, key);
}
// See ExpectedState::SyncDelete()
void SyncDelete(int cf, int64_t key) { return latest_->SyncDelete(cf, key); }
// See ExpectedState::SyncDeleteRange()
void SyncDeleteRange(int cf, int64_t begin_key, int64_t end_key) {
return latest_->SyncDeleteRange(cf, begin_key, end_key);
}
protected:
const size_t max_key_;
const size_t num_column_families_;
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册