1. 27 12月, 2013 2 次提交
    • K
      Implement autovector · c01676e4
      kailiu 提交于
      Summary:
      A vector that leverages pre-allocated stack-based array to achieve better
      performance for array with small amount of items.
      
      Test Plan:
      Added tests for both correctness and performance
      
      Here is the performance benchmark between vector and autovector
      
      Please note that in the test "Creation and Insertion Test", the test case were designed with the motivation described below:
      
      * no element inserted: internal array of std::vector may not really get
        initialize.
      * one element inserted: internal array of std::vector must have
        initialized.
      * kSize elements inserted. This shows the most time we'll spend if we
        keep everything in stack.
      * 2 * kSize elements inserted. The internal vector of
        autovector must have been initialized.
      
      Note: kSize is the capacity of autovector
      
        =====================================================
        Creation and Insertion Test
        =====================================================
        created 100000 vectors:
        	each was inserted with 0 elements
        	total time elapsed: 128000 (ns)
        created 100000 autovectors:
        	each was inserted with 0 elements
        	total time elapsed: 3641000 (ns)
        created 100000 VectorWithReserveSizes:
        	each was inserted with 0 elements
        	total time elapsed: 9896000 (ns)
        -----------------------------------
        created 100000 vectors:
        	each was inserted with 1 elements
        	total time elapsed: 11089000 (ns)
        created 100000 autovectors:
        	each was inserted with 1 elements
        	total time elapsed: 5008000 (ns)
        created 100000 VectorWithReserveSizes:
        	each was inserted with 1 elements
        	total time elapsed: 24271000 (ns)
        -----------------------------------
        created 100000 vectors:
        	each was inserted with 4 elements
        	total time elapsed: 39369000 (ns)
        created 100000 autovectors:
        	each was inserted with 4 elements
        	total time elapsed: 10121000 (ns)
        created 100000 VectorWithReserveSizes:
        	each was inserted with 4 elements
        	total time elapsed: 28473000 (ns)
        -----------------------------------
        created 100000 vectors:
        	each was inserted with 8 elements
        	total time elapsed: 75013000 (ns)
        created 100000 autovectors:
        	each was inserted with 8 elements
        	total time elapsed: 18237000 (ns)
        created 100000 VectorWithReserveSizes:
        	each was inserted with 8 elements
        	total time elapsed: 42464000 (ns)
        -----------------------------------
        created 100000 vectors:
        	each was inserted with 16 elements
        	total time elapsed: 102319000 (ns)
        created 100000 autovectors:
        	each was inserted with 16 elements
        	total time elapsed: 76724000 (ns)
        created 100000 VectorWithReserveSizes:
        	each was inserted with 16 elements
        	total time elapsed: 68285000 (ns)
        -----------------------------------
        =====================================================
        Sequence Access Test
        =====================================================
        performed 100000 sequence access against vector
        	size: 4
        	total time elapsed: 198000 (ns)
        performed 100000 sequence access against autovector
        	size: 4
        	total time elapsed: 306000 (ns)
        -----------------------------------
        performed 100000 sequence access against vector
        	size: 8
        	total time elapsed: 565000 (ns)
        performed 100000 sequence access against autovector
        	size: 8
        	total time elapsed: 512000 (ns)
        -----------------------------------
        performed 100000 sequence access against vector
        	size: 16
        	total time elapsed: 1076000 (ns)
        performed 100000 sequence access against autovector
        	size: 16
        	total time elapsed: 1070000 (ns)
        -----------------------------------
      
      Reviewers: dhruba, haobo, sdong, chip
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14655
      c01676e4
    • K
      Merge pull request #32 from jamesgolick/master · 5643ae1a
      Kai Liu 提交于
      Only try to use fallocate if it's actually present on the system.
      5643ae1a
  2. 24 12月, 2013 1 次提交
  3. 21 12月, 2013 2 次提交
    • I
      Initialize sequence number in BatchResult - issue #39 · b26dc956
      Igor Canadi 提交于
      b26dc956
    • I
      [RocksDB] Optimize locking for Get · 1fdb3f7d
      Igor Canadi 提交于
      Summary:
      Instead of locking and saving a DB state, we can cache a DB state and update it only when it changes. This change reduces lock contention and speeds up read operations on the DB.
      
      Performance improvements are substantial, although there is some cost in no-read workloads. I ran the regression tests on my devserver and here are the numbers:
      
        overwrite                    56345  ->   63001
        fillseq                      193730 ->  185296
        readrandom                   771301 -> 1219803 (58% improvement!)
        readrandom_smallblockcache   677609 ->  862850
        readrandom_memtable_sst      710440 -> 1109223
        readrandom_fillunique_random 221589 ->  247869
        memtablefillrandom           105286 ->   92643
        memtablereadrandom           763033 -> 1288862
      
      Test Plan:
      make asan_check
      I am also running db_stress
      
      Reviewers: dhruba, haobo, sdong, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14679
      1fdb3f7d
  4. 20 12月, 2013 1 次提交
  5. 19 12月, 2013 4 次提交
    • M
      Add 'readtocache' test · ca92068b
      Mark Callaghan 提交于
      Summary:
      For some tests I want to cache the database prior to running other tests on the same invocation
      of db_bench. The readtocache test ignores --threads and --reads so those can be used by other tests
      and it will still do a full read of --num rows with one thread. It might be invoked like:
        db_bench --benchmarks=readtocache,readrandom --reads 100 --num 10000 --threads 8
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14739
      ca92068b
    • I
      Reorder tests · e914b649
      Igor Canadi 提交于
      Summary:
      db_test should be the first to execute because it finds the most bugs.
      
      Also, when third parties report issues, we don't want ldb error message, we prefer to have db_test error message. For example, see thread: https://github.com/facebook/rocksdb/issues/25
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14715
      e914b649
    • I
      Merge pull request #35 from zizkovrb/rm-ds_store · cbb8da6f
      Igor Canadi 提交于
      Remove utilities/.DS_Store file.
      cbb8da6f
    • I
      Merge pull request #37 from mlin/more-c-bindings · 3b50b621
      Igor Canadi 提交于
      C bindings: add a bunch of the newer options
      3b50b621
  6. 18 12月, 2013 2 次提交
  7. 17 12月, 2013 1 次提交
  8. 16 12月, 2013 1 次提交
  9. 13 12月, 2013 3 次提交
    • I
      [backupable db] Delete db_dir children when restoring backup · 417b453f
      Igor Canadi 提交于
      Summary:
      I realized that manifest will get deleted by PurgeObsoleteFiles in DBImpl, but it is sill cleaner to delete
      files before we restore the backup
      
      Test Plan: backupable_db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14619
      417b453f
    • M
      Add monitoring for universal compaction and add counters for compaction IO · e9e6b00d
      Mark Callaghan 提交于
      Summary:
      Adds these counters
      { WAL_FILE_SYNCED, "rocksdb.wal.synced" }
        number of writes that request a WAL sync
      { WAL_FILE_BYTES, "rocksdb.wal.bytes" },
        number of bytes written to the WAL
      { WRITE_DONE_BY_SELF, "rocksdb.write.self" },
        number of writes processed by the calling thread
      { WRITE_DONE_BY_OTHER, "rocksdb.write.other" },
        number of writes not processed by the calling thread. Instead these were
        processed by the current holder of the write lock
      { WRITE_WITH_WAL, "rocksdb.write.wal" },
        number of writes that request WAL logging
      { COMPACT_READ_BYTES, "rocksdb.compact.read.bytes" },
        number of bytes read during compaction
      { COMPACT_WRITE_BYTES, "rocksdb.compact.write.bytes" },
        number of bytes written during compaction
      
      Per-interval stats output was updated with WAL stats and correct stats for universal compaction
      including a correct value for write-amplification. It now looks like:
                                     Compactions
      Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall Stall-cnt
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        7      464  46.4       281      3411      3875      3411         0      3875        2.1      12.1        13.8      621        0      240      240      628       0.0         0
      Uptime(secs): 310.8 total, 2.0 interval
      Writes cumulative: 9999999 total, 9999999 batches, 1.0 per batch, 1.22 ingest GB
      WAL cumulative: 9999999 WAL writes, 9999999 WAL syncs, 1.00 writes per sync, 1.22 GB written
      Compaction IO cumulative (GB): 1.22 new, 3.33 read, 3.78 write, 7.12 read+write
      Compaction IO cumulative (MB/sec): 4.0 new, 11.0 read, 12.5 write, 23.4 read+write
      Amplification cumulative: 4.1 write, 6.8 compaction
      Writes interval: 100000 total, 100000 batches, 1.0 per batch, 12.5 ingest MB
      WAL interval: 100000 WAL writes, 100000 WAL syncs, 1.00 writes per sync, 0.01 MB written
      Compaction IO interval (MB): 12.49 new, 14.98 read, 21.50 write, 36.48 read+write
      Compaction IO interval (MB/sec): 6.4 new, 7.6 read, 11.0 write, 18.6 read+write
      Amplification interval: 101.7 write, 102.9 compaction
      Stalls(secs): 142.924 level0_slowdown, 0.000 level0_numfiles, 0.805 memtable_compaction, 0.000 leveln_slowdown
      Stalls(count): 132461 level0_slowdown, 0 level0_numfiles, 3 memtable_compaction, 0 leveln_slowdown
      
      Task ID: #3329644, #3301695
      
      Blame Rev:
      
      Test Plan:
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14583
      e9e6b00d
    • I
      portable %lu printing · 249e736b
      Igor Canadi 提交于
      249e736b
  10. 12 12月, 2013 6 次提交
    • I
      Add readrandom with both memtable and sst regression test · f5f5c645
      Igor Canadi 提交于
      Summary: @MarkCallaghan's tests indicate that performance with 8k rows in memtable is much worse than empty memtable. I wanted to add a regression tests that measures this effect, so we could optimize it. However, current config shows 634461 QPS on my devbox. Mark, any idea why this is so much faster than your measurements?
      
      Test Plan: Ran the regression test.
      
      Reviewers: MarkCallaghan, dhruba, haobo
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D14511
      f5f5c645
    • S
      Introduce MergeContext to Lazily Initialize merge operand list · a8029fdc
      Siying Dong 提交于
      Summary: In get operations, merge_operands is only used in few cases. Lazily initialize it can reduce average latency in some cases
      
      Test Plan: make all check
      
      Reviewers: haobo, kailiu, dhruba
      
      Reviewed By: haobo
      
      CC: igor, nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D14415
      
      Conflicts:
      	db/db_impl.cc
      	db/memtable.cc
      a8029fdc
    • J
      oops - missed a spot · c28dd2a8
      James Golick 提交于
      c28dd2a8
    • S
      [RocksDB Performance Branch] Avoid sorting in Version::Get() by presorting... · bc5dd19b
      Siying Dong 提交于
      [RocksDB Performance Branch] Avoid sorting in Version::Get() by presorting them in VersionSet::Builder::SaveTo()
      
      Summary: Pre-sort files in VersionSet::Builder::SaveTo() so that when getting the value, no need to sort them. It can avoid the costs of vector operations and sorting in Version::Get().
      
      Test Plan: make all check
      
      Reviewers: haobo, kailiu, dhruba
      
      Reviewed By: dhruba
      
      CC: nkg-, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D14409
      bc5dd19b
    • S
      When flushing mem tables, create iterators out of mutex · 0304e3d2
      Siying Dong 提交于
      Summary:
      creating new iterators of mem tables can be expensive. Move them out of mutex.
      DBImpl::WriteLevel0Table()'s mems seems to be a local vector and is only used by flushing. memtables to flush are also immutable, so it should be safe to do so.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, kailiu
      
      Reviewed By: dhruba
      
      CC: igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D14577
      
      Conflicts:
      	db/db_impl.cc
      0304e3d2
    • I
      [RocksDB perf] Cache speedup · e8d40c31
      Igor Canadi 提交于
      Summary:
      I have ran a get benchmark where all the data is in the cache and observed that most of the time is spent on waiting for lock in LRUCache.
      
      This is an effort to optimize LRUCache.
      
      Test Plan:
      The data was loaded with fillseq. Then, I ran a benchmark:
      
          /db_bench --db=/tmp/rocksdb_stat_bench --num=1000000 --benchmarks=readrandom --statistics=1 --use_existing_db=1 --threads=16 --disable_seek_compaction=1 --cache_size=20000000000 --cache_numshardbits=8 --table_cache_numshardbits=8
      
      I ran the benchmark three times. Here are the results:
      AFTER THE PATCH: 798072, 803998, 811807
      BEFORE THE PATCH: 782008, 815593, 763017
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14571
      e8d40c31
  11. 11 12月, 2013 9 次提交
  12. 10 12月, 2013 5 次提交
    • D
      Rename leveldb to rocksdb in C api · 6c4e110c
      Doğan Çeçen 提交于
      6c4e110c
    • D
      Fix shared lib build · f6012ab8
      Doğan Çeçen 提交于
      f6012ab8
    • I
      Fix unused variable warning · 784e62f9
      Igor Canadi 提交于
      784e62f9
    • I
      [RocksDB] BackupableDB · fb9fce4f
      Igor Canadi 提交于
      Summary:
      In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
      Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
      
      There are multiple things you can configure:
      1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
      2. sync - if true, it *guarantees* backup consistency on machine reboot
      3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
      4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
      5. More things you can find in BackupableDBOptions
      
      Here is the directory structure I use:
      
         backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
                     0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
                     files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
                     files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
      
      All the files are ref counted and deleted immediatelly when they get out of scope.
      
      Some other stuff in this diff:
      1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
      2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
      
      Test Plan:
      I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
      
      Also, `make asan_check`
      
      Reviewers: dhruba, haobo, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D14295
      fb9fce4f
    • I
      Fixing git branch detection in Jenkins · 26bc40a8
      Igor Canadi 提交于
      Branch detection did not work in Jenkins. I realized that it set
      GIT_BRANCH env variable to point to the current branch, so let's try
      using this for branch detection.
      26bc40a8
  13. 07 12月, 2013 3 次提交
    • I
      Print stack trace on assertion failure · 9644e0e0
      Igor Canadi 提交于
      Summary:
      This will help me a lot! When we hit an assertion in unittest, we get the whole stack trace now.
      
      Also, changed stack trace a bit, we now include actual demangled C++ class::function symbols!
      
      Test Plan: Added ASSERT_TRUE(false) to a test, observed a stack trace
      
      Reviewers: haobo, dhruba, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14499
      9644e0e0
    • I
      Enable regression tests to be run on other branches · 07c84488
      Igor Canadi 提交于
      Summary: When running regression tests on other branches, this will push values to entity rocksdb_build.$git_branch
      
      Test Plan: Ran regression test on regression branch, observed values send to ODS in entity rocksdb_build.regression
      
      Reviewers: kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14493
      07c84488
    • I
      Make DBWithTTL more like StackableDB · 0a5ec498
      Igor Canadi 提交于
      Summary: Now DBWithTTL takes DB* and can behave more like StackableDB. This saves us a lot of duplicate work by defining interfaces
      
      Test Plan: ttl_test with ASAN - OK
      
      Reviewers: emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D14481
      0a5ec498