1. 30 4月, 2013 1 次提交
  2. 26 4月, 2013 1 次提交
  3. 23 4月, 2013 2 次提交
  4. 21 4月, 2013 3 次提交
    • H
      [RocksDB] CompactionFilter cleanup · b4243e5a
      Haobo Xu 提交于
      Summary:
      - removed the compaction_filter_value from the callback interface. Restrict compaction filter to purging values.
      - modify some comments to reflect curent status.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10335
      b4243e5a
    • M
      Add --writes_per_second rate limit, print p99.99 in histogram · b1ff9ac9
      Mark Callaghan 提交于
      Summary:
      Adds the --writes_per_second rate limit for the readwhilewriting test.
      The purpose is to optionally avoid saturating storage with writes & compaction
      and test read response time when some writes are being done.
      
      Changes the histogram code to also print the p99.99 value
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, ran db_bench with it
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10305
      b1ff9ac9
    • H
      [RocksDB] Add stacktrace signal handler · 1255dcd4
      Haobo Xu 提交于
      Summary:
      This diff provides the ability to print out a stacktrace when the process receives certain signals.
      Currently, we enable this for the following signals (program error related):
      SIGILL SIGSEGV SIGBUS SIGABRT
      Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode).
      
      Sample output:
      Received signal 11 (Segmentation fault)
      #0  0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4
      #1  0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24
      #2  0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0
      #3  0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113
      Segmentation fault (core dumped)
      
      For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now.
      
      Test Plan: signal_test.cc
      
      Reviewers: dhruba, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10173
      1255dcd4
  5. 16 4月, 2013 2 次提交
  6. 13 4月, 2013 1 次提交
    • H
      [RocksDB] [Performance] Speed up FindObsoleteFiles · 013e9ebb
      Haobo Xu 提交于
      Summary:
      FindObsoleteFiles was slow, holding the single big lock, resulted in bad p99 behavior.
      Didn't profile anything, but several things could be improved:
      1. VersionSet::AddLiveFiles works with std::set, which is by itself slow (a tree).
         You also don't know how many dynamic allocations occur just for building up this tree.
         switched to std::vector, also added logic to pre-calculate total size and do just one allocation
      2. Don't see why env_->GetChildren() needs to be mutex proteced, moved to PurgeObsoleteFiles where
         mutex could be unlocked.
      3. switched std::set to std:unordered_set, the conversion from vector is also inside PurgeObsoleteFiles
      I have a feeling this should pretty much fix it.
      
      Test Plan: make check;  db_stress
      
      Reviewers: dhruba, heyongqiang, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D10197
      013e9ebb
  7. 12 4月, 2013 1 次提交
  8. 11 4月, 2013 1 次提交
    • D
      Prevent segfault in OpenCompactionOutputFile · 77305871
      Dhruba Borthakur 提交于
      Summary:
      The segfault was happening because the program was unable to open a new
      sst file (as part of the compaction) because the process ran out of
      file descriptors.
      
      The fix is to check the return status of the file creation before taking
      any other action.
      
      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x7fabf03f9700 (LWP 29904)]
      leveldb::DBImpl::OpenCompactionOutputFile (this=this@entry=0x7fabf9011400, compact=compact@entry=0x7fabf741a2b0) at db/db_impl.cc:1399
      1399    db/db_impl.cc: No such file or directory.
      (gdb) where
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, sheki
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10101
      77305871
  9. 09 4月, 2013 1 次提交
    • A
      [RocksDB][Bug] Look at all the files, not just the first file in... · 574b76f7
      Abhishek Kona 提交于
      [RocksDB][Bug] Look at all the files, not just the first file in TransactionLogIter as BatchWrites can leave it in Limbo
      
      Summary:
      Transaction Log Iterator did not move to the next file in the series if there was a write batch at the end of the currentFile.
      The solution is if the last seq no. of the current file is < RequestedSeqNo. Assume the first seqNo. of the next file has to satisfy the request.
      
      Also major refactoring around the code. Moved opening the logreader to a seperate function, got rid of goto.
      
      Test Plan: added a unit test for it.
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb, emayanke
      
      Differential Revision: https://reviews.facebook.net/D10029
      574b76f7
  10. 06 4月, 2013 1 次提交
    • A
      [Rocksdb] Remove useless struct TableAndFile · 0e40185a
      Abhishek Kona 提交于
      Summary:
      TableAndFile was a struct used earlier to delete the file as we did not have std::unique_ptr in the codebase.
      With Chip introducing C++11 hotness like std::unique_ptr we can do away with the struct.
      
      Test Plan: make all check
      
      Reviewers: haobo, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D9975
      0e40185a
  11. 03 4月, 2013 2 次提交
  12. 29 3月, 2013 4 次提交
    • H
      Let's get rid of delete as much as possible, here are some examples. · 645ff8f2
      Haobo Xu 提交于
      Summary:
      If a class owns an object:
       - If the object can be null => use a unique_ptr. no delete
       - If the object can not be null => don't even need new, let alone delete
       - for runtime sized array => use vector, no delete.
      
      Test Plan: make check
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D9783
      645ff8f2
    • A
      [RocksDB] Fix binary search while finding probable wal files · 3b51605b
      Abhishek Kona 提交于
      Summary:
      RocksDB does a binary search to look at the files which might contain the requested sequence number at the call GetUpdatesSince.
      There was a bug in the binary search => when the file pointed by the middle index of bsearch was empty/corrupt it needst to resize the vector and update indexes.
      This now fixes that.
      
      Test Plan: existing unit tests pass.
      
      Reviewers: heyongqiang, dhruba
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9777
      3b51605b
    • A
      [Rocksdb] Fix Crash on finding a db with no log files. Error out instead · 8e9c781a
      Abhishek Kona 提交于
      Summary:
      If the vector returned by GetUpdatesSince is empty, it is still returned to the
      user. This causes it throw an std::range error.
      The probable file list is checked and it returns an IOError status instead of OK now.
      
      Test Plan: added a unit test.
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9771
      8e9c781a
    • A
      Use non-mmapd files for Write-Ahead Files · 7fdd5f5b
      Abhishek Kona 提交于
      Summary:
      Use non mmapd files for Write-Ahead log.
      Earlier use of MMaped files. made the log iterator read ahead and miss records.
      Now the reader and writer will point to the same physical location.
      
      There is no perf regression :
      ./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
      with This diff :
      fillseq      :      10.756 micros/op 185281 ops/sec;   20.5 MB/s
      without this dif :
      fillseq      :      11.085 micros/op 179676 ops/sec;   19.9 MB/s
      
      Test Plan: unit test included
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9741
      7fdd5f5b
  13. 28 3月, 2013 1 次提交
    • A
      memory manage statistics · 63f216ee
      Abhishek Kona 提交于
      Summary:
      Earlier Statistics object was a raw pointer. This meant the user had to clear up
      the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
      Now Using a shared_ptr to manage this.
      
      Want this in before the next release.
      
      Test Plan: make all check.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9735
      63f216ee
  14. 27 3月, 2013 1 次提交
  15. 22 3月, 2013 2 次提交
  16. 21 3月, 2013 2 次提交
    • D
      Run compactions even if workload is readonly or read-mostly. · d0798f67
      Dhruba Borthakur 提交于
      Summary:
      The events that trigger compaction:
      * opening the database
      * Get -> only if seek compaction is not disabled and other checks are true
      * MakeRoomForWrite -> when memtable is full
      * BackgroundCall ->
        If the background thread is about to do a compaction run, it schedules
        a new background task to trigger a possible compaction. This will cause
        additional background threads to find and process other compactions that
        can run concurrently.
      
      Test Plan: ran db_bench with overwrite and readonly alternatively.
      
      Reviewers: sheki, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9579
      d0798f67
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  17. 20 3月, 2013 4 次提交
  18. 19 3月, 2013 1 次提交
  19. 15 3月, 2013 1 次提交
    • M
      Enhance db_bench · 5a8c8845
      Mark Callaghan 提交于
      Summary:
      Add --benchmarks=updaterandom for read-modify-write workloads. This is different
      from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the
      combined time to do the read & write rather than treating them as two ops. Second,
      the same key is used for the read & write.
      
      Change RandomGenerator to support rows larger than 1M. That was using "assert"
      to fail and assert is compiled-away when -DNDEBUG is used.
      
      Add more options to db_bench
      --duration - sets the number of seconds for tests to run. When not set the
      operation count continues to be the limit. This is used by random operation
      tests.
      
      --use_snapshot - when set GetSnapshot() is called prior to each random read.
      This is to measure the overhead from using snapshots.
      
      --get_approx - when set GetApproximateSizes() is called prior to each random
      read. This is to measure the overhead for a query optimizer.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9267
      5a8c8845
  20. 13 3月, 2013 1 次提交
  21. 12 3月, 2013 1 次提交
    • D
      Prevent segfault because SizeUnderCompaction was called without any locks. · ebf16f57
      Dhruba Borthakur 提交于
      Summary:
      SizeBeingCompacted was called without any lock protection. This causes
      crashes, especially when running db_bench with value_size=128K.
      The fix is to compute SizeUnderCompaction while holding the mutex and
      passing in these values into the call to Finalize.
      
      (gdb) where
      #4  leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827
      #5  0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420
      #6  0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016
      #7  0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473
      #8  0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757
      #9  0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268
      #10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170
      #11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941
      #12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874
      #13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301
      #14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      
      Test Plan:
      make check
      
      I am running db_bench with a value size of 128K to see if the segfault is fixed.
      
      Reviewers: MarkCallaghan, sheki, emayanke
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9279
      ebf16f57
  22. 08 3月, 2013 1 次提交
    • D
      A mechanism to detect manifest file write errors and put db in readonly mode. · 6d812b6a
      Dhruba Borthakur 提交于
      Summary:
      If there is an error while writing an edit to the manifest file, the manifest
      file is closed and reopened to check if the edit made it in. However, if the
      re-opening of the manifest is unsuccessful and options.paranoid_checks is set
      t true, then the db refuses to accept new puts, effectively putting the db
      in readonly mode.
      
      In a future diff, I would like to make the default value of paranoid_check
      to true.
      
      Test Plan: make check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9201
      6d812b6a
  23. 07 3月, 2013 2 次提交
    • A
      Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file · d68880a1
      Abhishek Kona 提交于
      Summary:
      Store the last flushed, seq no. in db_impl. Check against it in
      transaction Log iterator. Do not attempt to read ahead if we do not know
      if the data is flushed completely.
      Does not work if flush is disabled. Any ideas on fixing that?
      * Minor change, iter->Next is called the first time automatically for
      * the first time.
      
      Test Plan:
      existing test pass.
      More ideas on testing this?
      Planning to run some stress test.
      
      Reviewers: dhruba, heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9087
      d68880a1
    • D
      Fox db_stress crash by copying keys before changing sequencenum to zero. · afed6093
      Dhruba Borthakur 提交于
      Summary:
      The compaction process zeros out sequence numbers if the output is
      part of the bottommost level.
      The Slice is supposed to refer to an immutable data buffer. The
      merger that implements the priority queue while reading kvs as
      the input of a compaction run reies on this fact. The bug was that
      were updating the sequence number of a record in-place and that was
      causing suceeding invocations of the merger to return kvs in
      arbitrary order of sequence numbers.
      The fix is to copy the key to a local memory buffer before setting
      its seqno to 0.
      
      Test Plan:
      Set Options.purge_redundant_kvs_while_flush = false and then run
      db_stress --ops_per_thread=1000 --max_key=320
      
      Reviewers: emayanke, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9147
      afed6093
  24. 05 3月, 2013 1 次提交
  25. 04 3月, 2013 2 次提交
    • M
      Add rate_delay_limit_milliseconds · 993543d1
      Mark Callaghan 提交于
      Summary:
      This adds the rate_delay_limit_milliseconds option to make the delay
      configurable in MakeRoomForWrite when the max compaction score is too high.
      This delay is called the Ln slowdown. This change also counts the Ln slowdown
      per level to make it possible to see where the stalls occur.
      
      From IO-bound performance testing, the Level N stalls occur:
      * with compression -> at the largest uncompressed level. This makes sense
                            because compaction for compressed levels is much
                            slower. When Lx is uncompressed and Lx+1 is compressed
                            then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                            compaction process is the first to be slowed by
                            compression.
      * without compression -> at level 1
      
      Task ID: #1832108
      
      Blame Rev:
      
      Test Plan:
      run with real data, added test
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9045
      993543d1
    • D
      Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0. · 806e2643
      Dhruba Borthakur 提交于
      Summary:
      Rocks accumulates recent writes and deletes in the in-memory memtable.
      When the memtable is full, it writes the contents on the memtable to
      a file in L0.
      
      This patch removes redundant records at the time of the flush. If there
      are multiple versions of the same key in the memtable, then only the
      most recent one is dumped into the output file. The purging of
      redundant records occur only if the most recent snapshot is earlier
      than the earliest record in the memtable.
      
      Should we switch on this feature by default or should we keep this feature
      turned off in the default settings?
      
      Test Plan: Added test case to db_test.cc
      
      Reviewers: sheki, vamsi, emayanke, heyongqiang
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8991
      806e2643