1. 20 5月, 2016 4 次提交
    • I
      Add MaxOperator to utilities/merge_operators/ · 1f2dca0e
      Islam AbdelRahman 提交于
      Summary:
      Introduce MaxOperator a simple merge operator that return the max of all operands.
      This merge operand help me in benchmarking
      
      Test Plan: Add new unitttests
      
      Reviewers: sdong, andrewkr, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57873
      1f2dca0e
    • R
      Added "number of merge operands" to statistics in ssts. · f6e404c2
      Richard Cairns Jr 提交于
      Summary:
      A couple of notes from the diff:
        - The namespace block I added at the top of table_properties_collector.cc was in reaction to an issue i was having with PutVarint64 and reusing the "val" string.  I'm not sure this is the cleanest way of doing this, but abstracting this out at least results in the correct behavior.
        - I chose "rocksdb.merge.operands" as the property name.  I am open to suggestions for better names.
        - The change to sst_dump_tool.cc seems a bit inelegant to me.  Is there a better way to do the if-else block?
      
      Test Plan:
      I added a test case in table_properties_collector_test.cc.  It adds two merge operands and checks to make sure that both of them are reflected by GetMergeOperands.  It also checks to make sure the wasPropertyPresent bool is properly set in the method.
      
      Running both of these tests should pass:
      ./table_properties_collector_test
      ./sst_dump_test
      
      Reviewers: IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58119
      f6e404c2
    • E
      Fix formatting of HISTORY.md (#1126) · 7383b64b
      Evan Shaw 提交于
      A couple "New Features" headers needed a blank line before them.
      7383b64b
    • D
      Disable long running GroupCommitTest (#1125) · 0e665c39
      Dmitri Smirnov 提交于
      Add db_test2
      0e665c39
  2. 19 5月, 2016 3 次提交
  3. 18 5月, 2016 5 次提交
    • I
      Fix build · 05c5c39a
      Islam AbdelRahman 提交于
      05c5c39a
    • R
      Long outstanding prepare test · a6254f2b
      Reid Horuff 提交于
      Summary: This tests that a prepared transaction is not lost after several crashes, restarts, and memtable flushes.
      
      Test Plan: TwoPhaseLongPrepareTest
      
      Reviewers: sdong
      
      Subscribers: hermanlee4, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58185
      a6254f2b
    • I
      Fix TransactionTest.TwoPhaseMultiThreadTest under TSAN · 2ead1151
      Islam AbdelRahman 提交于
      Summary:
      TransactionTest.TwoPhaseMultiThreadTest runs forever under TSAN and our CI builds time out
      looks like the reason is that some threads keep running and other threads dont get a chance to increment the counter
      
      Test Plan: run the test under TSAN
      
      Reviewers: sdong, horuff
      
      Reviewed By: horuff
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58359
      2ead1151
    • K
      Persistent Read Cache (Part 2) Data structure for building persistent read cache index · 1f0142ce
      krad 提交于
      Summary:
      We expect the persistent read cache to perform at speeds upto 8 GB/s. In order
      to accomplish that, we need build a index mechanism which operate in the order
      of multiple millions per sec rate.
      
      This patch provide the basic data structure to accomplish that:
      
      (1) Hash table implementation with lock contention spread
          It is based on the StripedHashSet<T> implementation in
          The Art of multiprocessor programming by Maurice Henry & Nir Shavit
      (2) LRU implementation
          Place holder algorithm for further optimizing
      (3) Evictable Hash Table implementation
          Building block for building index data structure that evicts data like files
          etc
      
      TODO:
      (1) Figure if the sharded hash table and LRU can be used instead
      (2) Figure if we need to support configurable eviction algorithm for
      EvictableHashTable
      
      Test Plan: Run unit tests
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55785
      1f0142ce
    • A
      [rocksdb] make more options dynamic · 43afd72b
      Aaron Gao 提交于
      Summary:
      make more ColumnFamilyOptions dynamic:
      - compression
      - soft_pending_compaction_bytes_limit
      - hard_pending_compaction_bytes_limit
      - min_partial_merge_operands
      - report_bg_io_stats
      - paranoid_file_checks
      
      Test Plan:
      Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit.
      All passed.
      
      Reviewers: andrewkr, sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57519
      43afd72b
  4. 17 5月, 2016 2 次提交
  5. 16 5月, 2016 1 次提交
    • K
      Added PersistentCache abstraction · a08c8c85
      krad 提交于
      Summary:
      Added a new abstraction to cache page to RocksDB designed for the read
      cache use.
      
      RocksDB current block cache is more of an object cache. For the persistent read cache
      project, what we need is a page cache equivalent. This changes adds a cache
      abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache
      uncompressed pages or raw pages (content as in filesystem). The user can
      choose to operate PersistentCache either in  COMPRESSED or UNCOMPRESSED mode.
      
      Blame Rev:
      
      Test Plan: Run unit tests
      
      Reviewers: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D55707
      a08c8c85
  6. 14 5月, 2016 1 次提交
    • A
      [ldb] Templatize the Selector · 5c06e081
      Arun Sharma 提交于
      Summary:
      So a customized ldb tool can pass it's own Selector.
      Such a selector is expected to call LDBCommand::SelectCommand
      and then add some of its own customized commands
      
      Test Plan: make ldb
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57249
      5c06e081
  7. 13 5月, 2016 3 次提交
  8. 12 5月, 2016 3 次提交
  9. 11 5月, 2016 9 次提交
    • A
      Isolate db env and backup Env in unit tests · e61ba052
      Andrew Kryczka 提交于
      Summary:
      - Used ChrootEnv so the database and backup Envs are isolated in the filesystem.
      - Removed DifferentEnvs test since now every test uses different Envs
      
      Depends on D57543
      
      Test Plan:
      - ran backupable_db_test
      - verified backupable_db_test now catches the bug when D57159 is backed out (this bug previously passed through the test cases, which motivated this change)
      
      Reviewers: sdong, lightmark, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57615
      e61ba052
    • I
      Fix data race in GetObsoleteFiles() · 560358dc
      Islam AbdelRahman 提交于
      Summary:
      GetObsoleteFiles() and LogAndApply() functions modify obsolete_manifests_ vector
      we need to make sure that the mutex is held when we modify the obsolete_manifests_
      
      Test Plan: run the test under TSAN
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D58011
      560358dc
    • A
      ldb option for compression dictionary size · 5c1c9048
      Andrew Kryczka 提交于
      Summary:
      Expose the option so it's easy to run offline tests of compression
      dictionary feature.
      
      Test Plan:
      verified compression dictionary is loaded into lz4 for below command:
      
        $ ./ldb compact --compression_type=lz4 --compression_max_dict_bytes=16384 --db=/tmp/feed-compression-test/
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57441
      5c1c9048
    • R
      [rocksdb] 2PC double recovery bug fix · c27061da
      Reid Horuff 提交于
      Summary:
      1. prepare()
      2. crash
      3. recover
      4. commit()
      5. crash
      6. data is lost
      
      This is due to the transaction data still only residing in the WAL but because the logs were flushed on the first recovery the data is ignored on the second recovery. We must scan all logs found on recovery and only ignore redundant data at the time of replay. It is not possible to know which logs still contain relevant data at time of recovery. We cannot simply ignore a log because all of the non-2pc data it contains has already been written to L0.
      
      The changes made to MemTableInserter are to ensure that prepared sections are still recovered even if all of the non-2pc data in that log has already been flushed to L0.
      
      Test Plan: Provided test.
      
      Reviewers: sdong
      
      Subscribers: andrewkr, hermanlee4, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57729
      c27061da
    • R
      [rocksdb] Recovery path sequence miscount fix · a657ee9a
      Reid Horuff 提交于
      Summary:
      Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
      [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
      [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
      [4: COMMIT(a)]
      [7: COMMIT(b)]
      
      The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
      For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
      We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
      This is a valid WAL.
      
      Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.
      
      We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))
      
      So, now upon recovery we must track the actual consumption of sequence numbers.
      In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?
      
      Test Plan: provided test.
      
      Reviewers: sdong
      
      Subscribers: andrewkr, leveldb, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D57645
      a657ee9a
    • R
      [rocksdb] Two Phase Transaction · 8a66c85e
      Reid Horuff 提交于
      Summary:
      Two Phase Commit addition to RocksDB.
      
      See wiki: https://github.com/facebook/rocksdb/wiki/Two-Phase-Commit-Implementation
      Quip: https://fb.quip.com/pxZrAyrx53r3
      
      Depends on:
      WriteBatch modification: https://reviews.facebook.net/D54093
      Memtable Log Referencing and Prepared Batch Recovery: https://reviews.facebook.net/D56919
      
      Test Plan:
      - SimpleTwoPhaseTransactionTest
      - PersistentTwoPhaseTransactionTest.
      - TwoPhaseRollbackTest
      - TwoPhaseMultiThreadTest
      - TwoPhaseLogRollingTest
      - TwoPhaseEmptyWriteTest
      - TwoPhaseExpirationTest
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, hermanlee4, andrewkr, vasilep, dhruba, santoshb
      
      Differential Revision: https://reviews.facebook.net/D56925
      8a66c85e
    • R
      [rocksdb] Memtable Log Referencing and Prepared Batch Recovery · 1b8a2e8f
      Reid Horuff 提交于
      Summary:
      This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC.
      
      modfication of DBImpl::WriteImpl()
      - added two arguments *uint64_t log_used = nullptr, uint64_t log_ref = 0;
      - *log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place.
      -  log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed.
      
      - Recovery/writepath is now aware of prepared batches and commit and rollback markers.
      
      Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing.
      
      Reviewers: IslamAbdelRahman, sdong
      
      Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4
      
      Differential Revision: https://reviews.facebook.net/D56919
      1b8a2e8f
    • R
      Modification of WriteBatch to support two phase commit · 0460e9dc
      Reid Horuff 提交于
      Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.
      
      Test Plan: single unit test in write_batch_test.
      
      Reviewers: hermanlee4, sdong, anthony
      
      Subscribers: leveldb, dhruba, vasilep, andrewkr
      
      Differential Revision: https://reviews.facebook.net/D57867
      0460e9dc
    • A
      Follow symlinks in chroot directory · f548da33
      Andrew Kryczka 提交于
      Summary:
      On Mac OS X, the chroot directory we typically use ("/tmp") is actually
      a symlink for "/private/tmp". Since we dereference symlinks in user-defined
      paths, we must also dereference symlinks in chroot_dir_ such that we can perform
      string comparisons on those paths.
      
      Test Plan: ran env_test on Mac OS X and devserver
      
      Reviewers: sdong, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57957
      f548da33
  10. 10 5月, 2016 7 次提交
    • I
      Fix lite build · d86f9b9c
      Islam AbdelRahman 提交于
      Summary: Fix lite build
      
      Test Plan: run under lite
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57945
      d86f9b9c
    • I
      Add bottommost_compression option · 4b317234
      Islam AbdelRahman 提交于
      Summary:
      Add a new option that can be used to set a specific compression algorithm for bottommost level.
      This option will only affect levels larger than base level.
      
      I have also updated CompactionJobInfo to include the compression algorithm used in compaction
      
      Test Plan:
      added new unittest
      existing unittests
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: lightmark, andrewkr, dhruba, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D57669
      4b317234
    • S
      Estimate pending compaction bytes more accurately · bfb6b1b8
      sdong 提交于
      Summary: Currently we estimate bytes needed for compaction by assuming fanout value to be level multiplier. It overestimates when size of a level exceeds the target by large. We estimate by the ratio of actual sizes in levels instead.
      
      Test Plan: Fix existing test cases and add a new one.
      
      Reviewers: IslamAbdelRahman, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: MarkCallaghan, leveldb, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57789
      bfb6b1b8
    • A
      Properly destroy ChrootEnv in env_test · 258459ed
      Andrew Kryczka 提交于
      Summary: see title
      
      Test Plan:
        $ /mnt/gvfs/third-party2/valgrind/af85c56f424cd5edfc2c97588299b44ecdec96bb/3.10.0/gcc-4.9-glibc-2.20/e9936bf/bin/valgrind --error-exitcode=2 --leak-check=full ./env_test
      
      Reviewers: IslamAbdelRahman, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57897
      258459ed
    • Y
      Initial script for the new regression test · fca5aa6f
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff includes an initial script running a set of benchmarks for
      regression test.  The script does the following things:
      
        checkout the specified rocksdb commit (or origin/master as default)
        make clean && DEBUG_LEVEL=0 make db_bench
        setup test directories
        run set of benchmarks and store results
      
      Currently, the script will run couple benchmarks, store all the benchmark
      output, extract micros per op and percentile information for each benchmark
      and store them in a single SUMMARY.csv file.  The SUMMARY.csv will make the
      follow-up regression detection easier.
      
      In addition, the current script only takes env arguments to set important
      attributes of db_bench.  Will follow-up with a patch that allows db_bench
      to construct options from an options file.
      
      Test Plan:
      NUM_KEYS=100 ./tools/regression_test.sh
      
        Sample SUMMARY.csv file:
      
                                           commit id,                      benchmark,  ms-per-op,        p50,        p75,        p99,      p99.9,     p99.99
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,                        fillseq,      15.28,      54.66,      77.14,    5000.00,   17900.00,   18483.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,                      overwrite,      13.54,      57.69,      86.39,    3000.00,   15600.00,   17013.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     readrandom,       1.04,       0.80,       1.67,     293.33,     395.00,     504.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,               readwhilewriting,       2.75,       1.01,       1.87,     200.00,     460.00,     485.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,                   deleterandom,       3.64,      48.12,      70.09,     200.00,     336.67,     347.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     seekrandom,      24.31,     391.87,     513.69,     872.73,     990.00,    1048.00
            7e23ddf575890510e7d2fc7a79b31a1bbf317917,         seekrandomwhilewriting,      14.02,     185.14,     294.15,     700.00,    1440.00,    1527.00
      
      Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr, gunnarku
      
      Reviewed By: gunnarku
      
      Subscribers: gunnarku, MarkCallaghan, andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57597
      fca5aa6f
    • I
      Add --index_block_restart_interval option in db_bench · e1951b6f
      Islam AbdelRahman 提交于
      Summary:
      Pass --index_block_restart_interval flag to block_based_options in db_bench tool.
      
      Test Plan: none
      
      Reviewers: sdong, kradhakrishnan
      
      Reviewed By: kradhakrishnan
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57699
      e1951b6f
    • Y
      Fix win build · 730f7e2e
      Yi Wu 提交于
      Summary: Fixing error with win build where we compare int64_t with size_t.
      
      Test Plan: make check
      
      Reviewers: andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57885
      730f7e2e
  11. 07 5月, 2016 2 次提交
    • A
      Fix includes for clang on OS X · a9b3c47c
      Andrew Kryczka 提交于
      Summary:
      Fix below error:
      
        use of undeclared identifier 'errno'
      
      Test Plan: doitlive
      
      Reviewers: IslamAbdelRahman, sdong
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57849
      a9b3c47c
    • A
      Introduce chroot Env · 3f16a836
      Andrew Kryczka 提交于
      Summary:
      For testing backups, we needed an Env that is fully isolated from other
      Envs on the same machine. Our in-memory Envs (MockEnv and InMemoryEnv) were
      insufficient because they don't implement most directory operations.
      
      This diff introduces a new Env, "ChrootEnv", that translates paths such that the
      chroot directory appears to be the root directory. This way, multiple Envs can
      be isolated in the filesystem by using different chroot directories. Since we
      use the filesystem, all directory operations are trivially supported.
      
      Test Plan:
      I parameterized the existing EnvPosixTest so it runs tests on ChrootEnv
      except the ioctl-related cases.
      
      Reviewers: sdong, lightmark, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: andrewkr, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D57543
      3f16a836