1. 13 10月, 2016 1 次提交
  2. 12 10月, 2016 5 次提交
    • P
      Fix log_write_bench -bytes_per_sync option. (#1375) · f8d8cf53
      Peter (Stig) Edwards 提交于
      Hello and thanks for RocksDB,
       
      When log_write_bench is run with the -bytes_per_sync option, the option does not influence any *sync* behaviour.
       
      > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)'
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
       
      I suspect that this is because the bytes_per_sync option now needs to be using a `WritableFileWriter` and not a `WritableFile`.
       
      With the diff below applied, it changes to:
       
      > strace -e trace=write,sync_file_range ./log_write_bench -record_interval 0 -record_size 1048576 -num_records 11 -bytes_per_sync 2097152 2>&1 | egrep '^(sync|write.*XXXX)'
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      sync_file_range(0x3, 0, 0x200000, 0x2)  = 0
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      sync_file_range(0x3, 0x200000, 0x200000, 0x2) = 0
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      sync_file_range(0x3, 0x400000, 0x200000, 0x2) = 0
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      sync_file_range(0x3, 0x600000, 0x200000, 0x2) = 0
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 1048576) = 1048576
      sync_file_range(0x3, 0x800000, 0x200000, 0x2) = 0
       
      ( Note that the first 1MB is not synced as mentioned in util/file_reader_writer.cc::WritableFileWriter::Flush() )
       
      This diff also includes the fix from https://github.com/facebook/rocksdb/pull/1373
       
      > diff -du util/log_write_bench.cc.orig util/log_write_bench.cc
      --- util/log_write_bench.cc.orig        2016-10-04 12:06:29.115122580 -0400
      +++ util/log_write_bench.cc     2016-10-05 07:24:09.677037576 -0400
      @@ -14,6 +14,7 @@
       #include <gflags/gflags.h>
      
       #include "rocksdb/env.h"
      +#include "util/file_reader_writer.h"
       #include "util/histogram.h"
       #include "util/testharness.h"
       #include "util/testutil.h"
      @@ -38,19 +39,21 @@
         env_options.bytes_per_sync = FLAGS_bytes_per_sync;
         unique_ptr<WritableFile> file;
         env->NewWritableFile(file_name, &file, env_options);
      +  unique_ptr<WritableFileWriter> writer;
      +  writer.reset(new WritableFileWriter(std::move(file), env_options));
      
         std::string record;
      -  record.assign('X', FLAGS_record_size);
      +  record.assign(FLAGS_record_size, 'X');
      
         HistogramImpl hist;
      
         uint64_t start_time = env->NowMicros();
         for (int i = 0; i < FLAGS_num_records; i++) {
           uint64_t start_nanos = env->NowNanos();
      -    file->Append(record);
      -    file->Flush();
      +    writer->Append(record);
      +    writer->Flush();
           if (FLAGS_enable_sync) {
      -      file->Sync();
      +      writer->Sync(false);
           }
           hist.Add(env->NowNanos() - start_nanos);
      f8d8cf53
    • R
      Make txn->GetState() const · 02b3e398
      Reid Horuff 提交于
      Summary: makes Transaction::GetState() a const function.
      
      Test Plan: compiles.
      
      Reviewers: mung
      
      Reviewed By: mung
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64929
      02b3e398
    • A
      new Prev() prefix support using SeekForPrev() · 447f1712
      Aaron Gao 提交于
      Summary:
      1) The previous solution for Prev() prefix support is not clean.
      Since I add api SeekForPrev(), now the Prev() can be symmetric to Next().
      and we do not need SeekToLast() to be called in Prev() any more.
      
      Also, Next() will Seek(prefix_seek_key_) to solve the problem of possible inconsistency between db_iter and merge_iter when
      there is merge_operator. And prefix_seek_key is only refreshed when change direction to forward.
      
      2) This diff also solves the bug of Iterator::SeekToLast() with iterate_upper_bound_ with prefix extractor.
      
      add test cases for the above two cases.
      
      There are some tests for the SeekToLast() in Prev(), I will clean them later.
      
      Test Plan: make all check
      
      Reviewers: IslamAbdelRahman, andrewkr, yiwu, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D63933
      447f1712
    • Y
      More block cache tickers · 991b585e
      Yi Wu 提交于
      Summary: Adding several missing block cache tickers.
      
      Test Plan:
        make all check
      
      Reviewers: IslamAbdelRahman, yhchiang, lightmark
      
      Reviewed By: lightmark
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64881
      991b585e
    • Y
      Add Statistics::getAndResetTickerCount(). · d6ae6dec
      Yi Wu 提交于
      Summary: A convience method to atomically get and reset ticker count. I'm wanting to use it to have a thin wrapper to the statistics object to export ticker counts to ODS for LogDevice (since they don't even use fb303).
      
      Test Plan:
      test in LogDevice shadow cluster.
      https://fburl.com/461868822
      
      Reviewers: andrewkr, yhchiang, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64869
      d6ae6dec
  3. 11 10月, 2016 1 次提交
  4. 08 10月, 2016 8 次提交
    • I
      Support running consistency checks in release mode · 2ad68b97
      Islam AbdelRahman 提交于
      Summary:
      We always run consistency checks when compiling in debug mode
      allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode
      
      Test Plan:
      make check -j64
      make release
      
      Reviewers: lightmark, sdong, yiwu
      
      Reviewed By: yiwu
      
      Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64701
      2ad68b97
    • I
      Fix -ve std::string::resize · 67501cfc
      Islam AbdelRahman 提交于
      Summary:
      I saw this exception thrown because sometimes we may resize with -ve value
      if we have empty max_bytes_for_level_multiplier_additional vector
      
      Test Plan: run the tests
      
      Reviewers: yiwu
      
      Reviewed By: yiwu
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64791
      67501cfc
    • J
      Testing asset links after config change · 04b02dd1
      Joel Marcey 提交于
      04b02dd1
    • R
      Make Lock Info test multiple column families · 8c55bb87
      Reid Horuff 提交于
      Summary: Modifies the lock info export test to test multiple column families after I was experiencing a bug while developing the MyRocks front-end for this.
      
      Test Plan: is test.
      
      Reviewers: mung
      
      Reviewed By: mung
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64725
      8c55bb87
    • I
      Revert "Support SST files with Global sequence numbers" · d0623289
      Islam AbdelRahman 提交于
      This reverts commit ab01da54.
      d0623289
    • A
      [RocksJava] Adjusted RateLimiter to 3.10.0 (#1368) · 5cd28833
      Adam Retter 提交于
      Summary:
      - Deprecated RateLimiterConfig and GenericRateLimiterConfig
      - Introduced RateLimiter
      
      It is now possible to use all C++ related methods also in RocksJava.
      A noteable method is setBytesPerSecond which can change the allowed
      number of bytes per second at runtime.
      
      Test Plan:
      make rocksdbjava
      make jtest
      
      Reviewers: adamretter, yhchiang, ankgup87
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D35715
      5cd28833
    • R
      Expose Transaction State Publicly · 37737c3a
      Reid Horuff 提交于
      Summary:
      This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring.
      ExecutionStatus => TransactionState
      exec_status_ => trx_state_
      
      Test Plan: It compiles and transaction_test passes.
      
      Reviewers: IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, mung, dhruba, sdong
      
      Differential Revision: https://reviews.facebook.net/D64689
      37737c3a
    • R
      Add facility to write only a portion of WriteBatch to WAL · 2c1f9529
      Reid Horuff 提交于
      Summary:
      When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement.
      
      ```
      RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off
      INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628
      
      RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off
      INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054
      ```
      
      Test Plan: Two unit tests added.
      
      Reviewers: sdong, yiwu, IslamAbdelRahman
      
      Reviewed By: yiwu
      
      Subscribers: hermanlee4, dhruba, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D64599
      2c1f9529
  5. 07 10月, 2016 3 次提交
    • P
      Fix record_size in log_write_bench, swap args to std::string::assign. (#1373) · 043cb62d
      Peter (Stig) Edwards 提交于
      Hello and thank you for RocksDB,
       
      I noticed when using log_write_bench that writes were always 88 bytes:
       
      > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2
      write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88
      write(3, "\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371\371"..., 88) = 88
      
      > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2
      write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88
      write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 88) = 88
       
      I think this should be:
      
      <<    record.assign('X', FLAGS_record_size);
      >>    record.assign(FLAGS_record_size, 'X');
      
      So fill and not buffer. Otherwise I always see writes of size 88 (the decimal value for chr "X").
      
      string& assign (const char* s, size_t n);
      buffer - Copies the first n characters from the array of characters pointed by s.
      
      string& assign (size_t n, char c);
      fill   - Replaces the current value by n consecutive copies of character c.
      
      perl -le 'print ord "X"'
      88
       
      With the change:
       
      > strace -e trace=write ./log_write_bench -record_size 4096 -num_records 2 2>&1 | head -n 2
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 4096) = 4096
       
      > strace -e trace=write ./log_write_bench -num_records 2 2>&1 | head -n 2
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249
      write(3, "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"..., 249) = 249
      
      Thanks.
      
      https://github.com/facebook/rocksdb/commit/01c27be5fb42524c5052b4b4a23e05501e1d1421
      https://reviews.facebook.net/D16239
      043cb62d
    • S
      env_mirror: fix a few leaks (#1363) · 4985f60f
      Sage Weil 提交于
      * env_mirror: fix leak from LockFile
      Signed-off-by: NSage Weil <sage@redhat.com>
      
      * env_mirror: instruct EnvMirror whether mirrored Envs should be destroyed
      
      The lifecycle rules for Env are frustrating and undocumented.  Notably,
      Env::Default() should *not* be freed, but any Env instances we created
      should be.
      
      Explicitly instruct EnvMirror whether to clean up child Env instances.
      Default to false so that we do not affect existing callers.
      Signed-off-by: NSage Weil <sage@redhat.com>
      4985f60f
    • I
      update of c.h (#1371) · 5aded67d
      Igor Mihalik 提交于
      Added rocksdb_options_set_memtable_prefix_bloom_size_ratio function implemented in c.cc but not exported via c.h
      5aded67d
  6. 06 10月, 2016 1 次提交
  7. 05 10月, 2016 11 次提交
  8. 04 10月, 2016 4 次提交
    • I
      Fix Mac build · 9d6c9613
      Islam AbdelRahman 提交于
      9d6c9613
    • I
      Support SST files with Global sequence numbers · ab01da54
      Islam AbdelRahman 提交于
      Summary:
      - Update SstFileWriter to include a property for a global sequence number in the SST file `rocksdb.external_sst_file.global_seqno`
      - Update TableProperties to be aware of the offset of each property in the file
      - Update BlockBasedTableReader and Block to be able to honor the sequence number in `rocksdb.external_sst_file.global_seqno` property and use it to overwrite all sequence number in the file
      
      Something worth mentioning is that we don't update the seqno in the index block since and when doing a binary search, the reason for that is that it's guaranteed that SST files with global seqno will have only one user_key and each key will have seqno=0 encoded in it, This mean that this key is greater than any other key with seqno> 0. That mean that we can actually keep the current logic for these blocks
      
      Test Plan: unit tests
      
      Reviewers: andrewkr, yhchiang, yiwu, sdong
      
      Reviewed By: sdong
      
      Subscribers: hcz, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D62523
      ab01da54
    • A
      Minor fixes around Windows 64 Java Artifacts (#1366) · d346ba24
      Adam Retter 提交于
      d346ba24
    • K
      Add factory method for creating persistent cache that is accessible from public · e91b4d0c
      krad 提交于
      Summary:
      Currently there is no mechanism to create persistent cache from
      headers. Adding a simple factory method to create a simple persistent cache with
      default or NVM optimized settings.
      
      note: Any idea to test this factory is appreciated.
      
      Test Plan: None
      
      Reviewers: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D64527
      e91b4d0c
  9. 01 10月, 2016 2 次提交
    • M
      Expose transaction id, lock state information and transaction wait information · be1f1092
      Manuel Ung 提交于
      Summary:
      This diff does 3 things:
      
      Expose TransactionID so that we can identify transactions when we retrieve locking and lock wait information. This is exposed as `Transaction::GetID`.
      
      Expose lock state information by locking all stripes in all column families and copying their contents to a data structure. This is exposed as `TransactionDB::GetLockStatusData`.
      
      Adds support for tracking the transaction and the key being waited on, and exposes this as `Transaction::GetWaitingTxn`.
      
      Test Plan: unit tests
      
      Reviewers: horuff, sdong
      
      Reviewed By: sdong
      
      Subscribers: vasilep, hermanlee4, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D64413
      be1f1092
    • A
      Store range tombstones in memtable · 6009c473
      Andrew Kryczka 提交于
      Summary:
      - Store range tombstones in a separate MemTableRep instantiated with ColumnFamilyOptions::memtable_factory
      - MemTable::NewRangeTombstoneIterator() returns a MemTableIterator over the separate MemTableRep
      - Part of the read path is not implemented yet (i.e., MemTable::Get())
      
      Test Plan: see unit tests
      
      Reviewers: wanning
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D62217
      6009c473
  10. 30 9月, 2016 4 次提交