1. 27 3月, 2013 1 次提交
  2. 22 3月, 2013 2 次提交
  3. 21 3月, 2013 2 次提交
    • D
      Run compactions even if workload is readonly or read-mostly. · d0798f67
      Dhruba Borthakur 提交于
      Summary:
      The events that trigger compaction:
      * opening the database
      * Get -> only if seek compaction is not disabled and other checks are true
      * MakeRoomForWrite -> when memtable is full
      * BackgroundCall ->
        If the background thread is about to do a compaction run, it schedules
        a new background task to trigger a possible compaction. This will cause
        additional background threads to find and process other compactions that
        can run concurrently.
      
      Test Plan: ran db_bench with overwrite and readonly alternatively.
      
      Reviewers: sheki, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9579
      d0798f67
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  4. 20 3月, 2013 4 次提交
  5. 19 3月, 2013 1 次提交
  6. 15 3月, 2013 1 次提交
    • M
      Enhance db_bench · 5a8c8845
      Mark Callaghan 提交于
      Summary:
      Add --benchmarks=updaterandom for read-modify-write workloads. This is different
      from --benchmarks=readrandomwriterandom in a few ways. First, an "operation" is the
      combined time to do the read & write rather than treating them as two ops. Second,
      the same key is used for the read & write.
      
      Change RandomGenerator to support rows larger than 1M. That was using "assert"
      to fail and assert is compiled-away when -DNDEBUG is used.
      
      Add more options to db_bench
      --duration - sets the number of seconds for tests to run. When not set the
      operation count continues to be the limit. This is used by random operation
      tests.
      
      --use_snapshot - when set GetSnapshot() is called prior to each random read.
      This is to measure the overhead from using snapshots.
      
      --get_approx - when set GetApproximateSizes() is called prior to each random
      read. This is to measure the overhead for a query optimizer.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9267
      5a8c8845
  7. 13 3月, 2013 1 次提交
  8. 12 3月, 2013 1 次提交
    • D
      Prevent segfault because SizeUnderCompaction was called without any locks. · ebf16f57
      Dhruba Borthakur 提交于
      Summary:
      SizeBeingCompacted was called without any lock protection. This causes
      crashes, especially when running db_bench with value_size=128K.
      The fix is to compute SizeUnderCompaction while holding the mutex and
      passing in these values into the call to Finalize.
      
      (gdb) where
      #4  leveldb::VersionSet::SizeBeingCompacted (this=this@entry=0x7f0b490931c0, level=level@entry=4) at db/version_set.cc:1827
      #5  0x000000000043a3c8 in leveldb::VersionSet::Finalize (this=this@entry=0x7f0b490931c0, v=v@entry=0x7f0b3b86b480) at db/version_set.cc:1420
      #6  0x00000000004418d1 in leveldb::VersionSet::LogAndApply (this=0x7f0b490931c0, edit=0x7f0b3dc8c200, mu=0x7f0b490835b0, new_descriptor_log=<optimized out>) at db/version_set.cc:1016
      #7  0x00000000004222b2 in leveldb::DBImpl::InstallCompactionResults (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1473
      #8  0x0000000000426027 in leveldb::DBImpl::DoCompactionWork (this=this@entry=0x7f0b49083400, compact=compact@entry=0x7f0b2b8330f0) at db/db_impl.cc:1757
      #9  0x0000000000426690 in leveldb::DBImpl::BackgroundCompaction (this=this@entry=0x7f0b49083400, madeProgress=madeProgress@entry=0x7f0b41bf2d1e, deletion_state=...) at db/db_impl.cc:1268
      #10 0x0000000000428f42 in leveldb::DBImpl::BackgroundCall (this=0x7f0b49083400) at db/db_impl.cc:1170
      #11 0x000000000045348e in BGThread (this=0x7f0b49023100) at util/env_posix.cc:941
      #12 leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper (arg=0x7f0b49023100) at util/env_posix.cc:874
      #13 0x00007f0b4a7cf10d in start_thread (arg=0x7f0b41bf3700) at pthread_create.c:301
      #14 0x00007f0b49b4b11d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
      
      Test Plan:
      make check
      
      I am running db_bench with a value size of 128K to see if the segfault is fixed.
      
      Reviewers: MarkCallaghan, sheki, emayanke
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9279
      ebf16f57
  9. 08 3月, 2013 1 次提交
    • D
      A mechanism to detect manifest file write errors and put db in readonly mode. · 6d812b6a
      Dhruba Borthakur 提交于
      Summary:
      If there is an error while writing an edit to the manifest file, the manifest
      file is closed and reopened to check if the edit made it in. However, if the
      re-opening of the manifest is unsuccessful and options.paranoid_checks is set
      t true, then the db refuses to accept new puts, effectively putting the db
      in readonly mode.
      
      In a future diff, I would like to make the default value of paranoid_check
      to true.
      
      Test Plan: make check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9201
      6d812b6a
  10. 07 3月, 2013 2 次提交
    • A
      Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file · d68880a1
      Abhishek Kona 提交于
      Summary:
      Store the last flushed, seq no. in db_impl. Check against it in
      transaction Log iterator. Do not attempt to read ahead if we do not know
      if the data is flushed completely.
      Does not work if flush is disabled. Any ideas on fixing that?
      * Minor change, iter->Next is called the first time automatically for
      * the first time.
      
      Test Plan:
      existing test pass.
      More ideas on testing this?
      Planning to run some stress test.
      
      Reviewers: dhruba, heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9087
      d68880a1
    • D
      Fox db_stress crash by copying keys before changing sequencenum to zero. · afed6093
      Dhruba Borthakur 提交于
      Summary:
      The compaction process zeros out sequence numbers if the output is
      part of the bottommost level.
      The Slice is supposed to refer to an immutable data buffer. The
      merger that implements the priority queue while reading kvs as
      the input of a compaction run reies on this fact. The bug was that
      were updating the sequence number of a record in-place and that was
      causing suceeding invocations of the merger to return kvs in
      arbitrary order of sequence numbers.
      The fix is to copy the key to a local memory buffer before setting
      its seqno to 0.
      
      Test Plan:
      Set Options.purge_redundant_kvs_while_flush = false and then run
      db_stress --ops_per_thread=1000 --max_key=320
      
      Reviewers: emayanke, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9147
      afed6093
  11. 05 3月, 2013 1 次提交
  12. 04 3月, 2013 2 次提交
    • M
      Add rate_delay_limit_milliseconds · 993543d1
      Mark Callaghan 提交于
      Summary:
      This adds the rate_delay_limit_milliseconds option to make the delay
      configurable in MakeRoomForWrite when the max compaction score is too high.
      This delay is called the Ln slowdown. This change also counts the Ln slowdown
      per level to make it possible to see where the stalls occur.
      
      From IO-bound performance testing, the Level N stalls occur:
      * with compression -> at the largest uncompressed level. This makes sense
                            because compaction for compressed levels is much
                            slower. When Lx is uncompressed and Lx+1 is compressed
                            then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                            compaction process is the first to be slowed by
                            compression.
      * without compression -> at level 1
      
      Task ID: #1832108
      
      Blame Rev:
      
      Test Plan:
      run with real data, added test
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9045
      993543d1
    • D
      Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0. · 806e2643
      Dhruba Borthakur 提交于
      Summary:
      Rocks accumulates recent writes and deletes in the in-memory memtable.
      When the memtable is full, it writes the contents on the memtable to
      a file in L0.
      
      This patch removes redundant records at the time of the flush. If there
      are multiple versions of the same key in the memtable, then only the
      most recent one is dumped into the output file. The purging of
      redundant records occur only if the most recent snapshot is earlier
      than the earliest record in the memtable.
      
      Should we switch on this feature by default or should we keep this feature
      turned off in the default settings?
      
      Test Plan: Added test case to db_test.cc
      
      Reviewers: sheki, vamsi, emayanke, heyongqiang
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8991
      806e2643
  13. 02 3月, 2013 1 次提交
  14. 01 3月, 2013 1 次提交
  15. 27 2月, 2013 1 次提交
  16. 26 2月, 2013 1 次提交
  17. 23 2月, 2013 1 次提交
  18. 22 2月, 2013 3 次提交
    • A
      Counters for bytes written and read. · ec77366e
      Abhishek Kona 提交于
      Summary:
      * Counters for bytes read and write.
      as a part of this diff, I want to=>
      * Measure compaction times. @dhruba can you point which function, should
      * I time to get Compaction-times. Was looking at CompactRange.
      
      Test Plan: db_test
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8763
      ec77366e
    • V
      [Missed adding cmdline parsing for new flags added in D8685] · 6abb30d4
      Vamsi Ponnekanti 提交于
      Summary:
      I had added FLAGS_numdistinct and FLAGS_deletepercent for randomwithverify
      but forgot to add cmdline parsing for those flags.
      
      Test Plan:
      [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --numdistinct=500
      LevelDB:    version 1.5
      Date:       Thu Feb 21 10:34:40 2013
      CPU:        24 * Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
      CPUCache:   12288 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Created bg thread 0x7fbf90bff700
      randomwithverify :       4.693 micros/op 213098 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:714556)
      
      [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5
      LevelDB:    version 1.5
      Date:       Thu Feb 21 10:35:03 2013
      CPU:        24 * Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
      CPUCache:   12288 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Created bg thread 0x7fe14dfff700
      randomwithverify :       4.883 micros/op 204798 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:443847)
      [nponnekanti@dev902 /data/users/nponnekanti/rocksdb]
      [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify --deletepercent=5 --numdistinct=500
      LevelDB:    version 1.5
      Date:       Thu Feb 21 10:36:18 2013
      CPU:        24 * Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
      CPUCache:   12288 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Created bg thread 0x7fc31c7ff700
      randomwithverify :       4.920 micros/op 203233 ops/sec; ( get:900000 put:50000 del:50000 total:1000000 found:445522)
      
      Revert Plan: OK
      
      Task ID: #
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8769
      6abb30d4
    • V
      [Add randomwithverify benchmark option] · 945d2b59
      Vamsi Ponnekanti 提交于
      Summary: Added RandomWithVerify benchmark option.
      
      Test Plan:
      This whole diff is to test.
      [nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_bench --benchmarks=randomwithverify
      LevelDB:    version 1.5
      Date:       Tue Feb 19 17:50:28 2013
      CPU:        24 * Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
      CPUCache:   12288 KB
      Keys:       16 bytes each
      Values:     100 bytes each (50 bytes after compression)
      Entries:    1000000
      RawSize:    110.6 MB (estimated)
      FileSize:   62.9 MB (estimated)
      Compression: snappy
      WARNING: Assertions are enabled; benchmarks unnecessarily slow
      ------------------------------------------------
      Created bg thread 0x7fa9c3fff700
      randomwithverify :       5.004 micros/op 199836 ops/sec; ( get:900000 put:80000 del:20000 total:1000000 found:711992)
      
      Revert Plan: OK
      
      Task ID: #
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8685
      945d2b59
  19. 21 2月, 2013 2 次提交
    • A
      Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value · b2c50f1c
      amayank 提交于
      Summary:
      Changed the Get and Scan options with openForReadOnly mode to have access to the memtable.
      Changed the visibility of NewInternalIterator in db_impl from private to protected so that
      the derived class db_impl_read_only can call that in its NewIterator function for the
      scan case. The previous approach which changed the default for flush_on_destroy_ from false to true
      caused many problems in the unit tests due to empty sst files that it created. All
      unit tests pass now.
      
      Test Plan: make clean; make all check; ldb put and get and scans
      
      Reviewers: dhruba, heyongqiang, sheki
      
      Reviewed By: dhruba
      
      CC: kosievdmerwe, zshao, dilipj, kailiu
      
      Differential Revision: https://reviews.facebook.net/D8697
      b2c50f1c
    • A
      Introduce histogram in statistics.h · fe10200d
      Abhishek Kona 提交于
      Summary:
      * Introduce is histogram in statistics.h
      * stop watch to measure time.
      * introduce two timers as a poc.
      Replaced NULL with nullptr to fight some lint errors
      Should be useful for google.
      
      Test Plan:
      ran db_bench and check stats.
      make all check
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8637
      fe10200d
  20. 19 2月, 2013 3 次提交
  21. 16 2月, 2013 1 次提交
    • A
      Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value · 4c696ed0
      amayank 提交于
      Summary:
      flush_on_destroy has a default value of false and the memtable is flushed
      in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that
      the destructor is called(db is closed) and the cases where we work with the memtable only are very less
      it is a good idea to give this a default value of true. Thus the put from ldb
      wil have its data flushed to disk in the destructor and the next Get will be able to
      read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when
      the db was opened in the normal Open mode is that the Get from normal Open first reads
      the memtable and directly finds the latest value written there and the Get from OpenForReadOnly
      doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled
      
      Test Plan: make all; ldb put and get and scans
      
      Reviewers: dhruba, heyongqiang, sheki
      
      Reviewed By: heyongqiang
      
      CC: kosievdmerwe, zshao, dilipj, kailiu
      
      Differential Revision: https://reviews.facebook.net/D8631
      4c696ed0
  22. 05 2月, 2013 1 次提交
    • K
      Allow the logs to be purged by TTL. · b63aafce
      Kai Liu 提交于
      Summary:
      * Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
      development of such classes.
      * Refactor the existing AutoSplitLogger and fix several bugs.
      
      Test Plan:
      * Added a unit tests for different types of "auto splitable" loggers individually.
      * Tested the composited logger which allows the log files to be splitted by both TTL and log size.
      
      Reviewers: heyongqiang, dhruba
      
      Reviewed By: heyongqiang
      
      CC: zshao, leveldb
      
      Differential Revision: https://reviews.facebook.net/D8037
      b63aafce
  23. 26 1月, 2013 1 次提交
    • C
      Fix poor error on num_levels mismatch and few other minor improvements · 0b83a831
      Chip Turner 提交于
      Summary:
      Previously, if you opened a db with num_levels set lower than
      the database, you received the unhelpful message "Corruption:
      VersionEdit: new-file entry."  Now you get a more verbose message
      describing the issue.
      
      Also, fix handling of compression_levels (both the run-over-the-end
      issue and the memory management of it).
      
      Lastly, unique_ptr'ify a couple of minor calls.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8151
      0b83a831
  24. 25 1月, 2013 2 次提交
    • C
      Stop continually re-creating build_version.c · 772f75b3
      Chip Turner 提交于
      Summary:
      We continually rebuilt build_version.c because we put the
      current date into it, but that's what __DATE__ already is.  This makes
      builds faster.
      
      This also fixes an issue with 'make clean FOO' not working properly.
      
      Also tweak the build rules to be more consistent, always have warnings,
      and add a 'make release' rule to handle flags for release builds.
      
      Test Plan: make, make clean
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D8139
      772f75b3
    • C
      Use fallocate to prevent excessive allocation of sst files and logs · 3dafdfb2
      Chip Turner 提交于
      Summary:
      On some filesystems, pre-allocation can be a considerable
      amount of space.  xfs in our production environment pre-allocates by
      1GB, for instance.  By using fallocate to inform the kernel of our
      expected file sizes, we eliminate this wasteage (that isn't recovered
      until the file is closed which, in the case of LOG files, can be a
      considerable amount of time).
      
      Test Plan:
      created an xfs loopback filesystem, mounted with
      allocsize=4M, and ran db_stress.  LOG file without this change was 4M,
      and with it it was 128k then grew to normal size.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: adsharma, leveldb
      
      Differential Revision: https://reviews.facebook.net/D7953
      3dafdfb2
  25. 24 1月, 2013 1 次提交
    • C
      Fix a number of object lifetime/ownership issues · 2fdf91a4
      Chip Turner 提交于
      Summary:
      Replace manual memory management with std::unique_ptr in a
      number of places; not exhaustive, but this fixes a few leaks with file
      handles as well as clarifies semantics of the ownership of file handles
      with log classes.
      
      Test Plan: db_stress, make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: zshao, leveldb, heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D8043
      2fdf91a4
  26. 18 1月, 2013 2 次提交
    • A
      Add counters to count gets and writes · 16903c35
      Abhishek Kona 提交于
      Summary: Add Tickers to count Write's and Get's
      
      Test Plan: make check
      
      Reviewers: dhruba, chip
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7977
      16903c35
    • K
      Fixed issues Valgrind found. · 3c3df740
      Kosie van der Merwe 提交于
      Summary:
      Found issues with `db_test` and `db_stress` when running valgrind.
      
      `DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards).
      
      Test Plan:
      Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared.
      
      Ran `make check` to see if there were no regressions.
      
      Reviewers: vamsi, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8001
      3c3df740