1. 11 5月, 2013 1 次提交
  2. 07 5月, 2013 1 次提交
    • A
      [RocksDB] Clear Archive WAL files · 988c20b9
      Abhishek Kona 提交于
      Summary:
      WAL files are moved to archive directory and clear only at DB::Open.
      Can lead to a lot of space consumption in a Database. Added logic to periodically clear Archive Directory too.
      
      Test Plan: make all check + add unit test
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10617
      988c20b9
  3. 04 5月, 2013 1 次提交
    • H
      [Rocksdb] Support Merge operation in rocksdb · 05e88540
      Haobo Xu 提交于
      Summary:
      This diff introduces a new Merge operation into rocksdb.
      The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
      
      Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
      include/leveldb/db.h
      include/leveldb/merge_operator.h
      include/leveldb/options.h
      include/leveldb/write_batch.h
      
      Please go over local/my_test.cc carefully, as it is a concerete use case.
      
      Please also review the impelmentation files to see if the straw man implementation makes sense.
      
      Note that, the diff does pass all make check and truly supports forward iterator over db and a version
      of Get that's based on iterator.
      
      Future work:
      - Integration with compaction
      - A raw Get implementation
      
      I am working on a wiki that explains the design and implementation choices, but coding comes
      just naturally and I think it might be a good idea to share the code earlier. The code is
      heavily commented.
      
      Test Plan: run all local tests
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D9651
      05e88540
  4. 23 4月, 2013 1 次提交
  5. 21 4月, 2013 1 次提交
    • H
      [RocksDB] CompactionFilter cleanup · b4243e5a
      Haobo Xu 提交于
      Summary:
      - removed the compaction_filter_value from the callback interface. Restrict compaction filter to purging values.
      - modify some comments to reflect curent status.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10335
      b4243e5a
  6. 16 4月, 2013 1 次提交
  7. 13 4月, 2013 1 次提交
    • H
      [RocksDB] [Performance] Speed up FindObsoleteFiles · 013e9ebb
      Haobo Xu 提交于
      Summary:
      FindObsoleteFiles was slow, holding the single big lock, resulted in bad p99 behavior.
      Didn't profile anything, but several things could be improved:
      1. VersionSet::AddLiveFiles works with std::set, which is by itself slow (a tree).
         You also don't know how many dynamic allocations occur just for building up this tree.
         switched to std::vector, also added logic to pre-calculate total size and do just one allocation
      2. Don't see why env_->GetChildren() needs to be mutex proteced, moved to PurgeObsoleteFiles where
         mutex could be unlocked.
      3. switched std::set to std:unordered_set, the conversion from vector is also inside PurgeObsoleteFiles
      I have a feeling this should pretty much fix it.
      
      Test Plan: make check;  db_stress
      
      Reviewers: dhruba, heyongqiang, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D10197
      013e9ebb
  8. 12 4月, 2013 1 次提交
  9. 11 4月, 2013 1 次提交
    • D
      Prevent segfault in OpenCompactionOutputFile · 77305871
      Dhruba Borthakur 提交于
      Summary:
      The segfault was happening because the program was unable to open a new
      sst file (as part of the compaction) because the process ran out of
      file descriptors.
      
      The fix is to check the return status of the file creation before taking
      any other action.
      
      Program received signal SIGSEGV, Segmentation fault.
      [Switching to Thread 0x7fabf03f9700 (LWP 29904)]
      leveldb::DBImpl::OpenCompactionOutputFile (this=this@entry=0x7fabf9011400, compact=compact@entry=0x7fabf741a2b0) at db/db_impl.cc:1399
      1399    db/db_impl.cc: No such file or directory.
      (gdb) where
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, sheki
      
      Reviewed By: MarkCallaghan
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10101
      77305871
  10. 09 4月, 2013 1 次提交
    • A
      [RocksDB][Bug] Look at all the files, not just the first file in... · 574b76f7
      Abhishek Kona 提交于
      [RocksDB][Bug] Look at all the files, not just the first file in TransactionLogIter as BatchWrites can leave it in Limbo
      
      Summary:
      Transaction Log Iterator did not move to the next file in the series if there was a write batch at the end of the currentFile.
      The solution is if the last seq no. of the current file is < RequestedSeqNo. Assume the first seqNo. of the next file has to satisfy the request.
      
      Also major refactoring around the code. Moved opening the logreader to a seperate function, got rid of goto.
      
      Test Plan: added a unit test for it.
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb, emayanke
      
      Differential Revision: https://reviews.facebook.net/D10029
      574b76f7
  11. 03 4月, 2013 2 次提交
  12. 29 3月, 2013 4 次提交
    • H
      Let's get rid of delete as much as possible, here are some examples. · 645ff8f2
      Haobo Xu 提交于
      Summary:
      If a class owns an object:
       - If the object can be null => use a unique_ptr. no delete
       - If the object can not be null => don't even need new, let alone delete
       - for runtime sized array => use vector, no delete.
      
      Test Plan: make check
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D9783
      645ff8f2
    • A
      [RocksDB] Fix binary search while finding probable wal files · 3b51605b
      Abhishek Kona 提交于
      Summary:
      RocksDB does a binary search to look at the files which might contain the requested sequence number at the call GetUpdatesSince.
      There was a bug in the binary search => when the file pointed by the middle index of bsearch was empty/corrupt it needst to resize the vector and update indexes.
      This now fixes that.
      
      Test Plan: existing unit tests pass.
      
      Reviewers: heyongqiang, dhruba
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9777
      3b51605b
    • A
      [Rocksdb] Fix Crash on finding a db with no log files. Error out instead · 8e9c781a
      Abhishek Kona 提交于
      Summary:
      If the vector returned by GetUpdatesSince is empty, it is still returned to the
      user. This causes it throw an std::range error.
      The probable file list is checked and it returns an IOError status instead of OK now.
      
      Test Plan: added a unit test.
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9771
      8e9c781a
    • A
      Use non-mmapd files for Write-Ahead Files · 7fdd5f5b
      Abhishek Kona 提交于
      Summary:
      Use non mmapd files for Write-Ahead log.
      Earlier use of MMaped files. made the log iterator read ahead and miss records.
      Now the reader and writer will point to the same physical location.
      
      There is no perf regression :
      ./db_bench --benchmarks=fillseq --db=/dev/shm/mmap_test --num=$(million 20) --use_existing_db=0 --threads=2
      with This diff :
      fillseq      :      10.756 micros/op 185281 ops/sec;   20.5 MB/s
      without this dif :
      fillseq      :      11.085 micros/op 179676 ops/sec;   19.9 MB/s
      
      Test Plan: unit test included
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9741
      7fdd5f5b
  13. 27 3月, 2013 1 次提交
  14. 21 3月, 2013 2 次提交
    • D
      Run compactions even if workload is readonly or read-mostly. · d0798f67
      Dhruba Borthakur 提交于
      Summary:
      The events that trigger compaction:
      * opening the database
      * Get -> only if seek compaction is not disabled and other checks are true
      * MakeRoomForWrite -> when memtable is full
      * BackgroundCall ->
        If the background thread is about to do a compaction run, it schedules
        a new background task to trigger a possible compaction. This will cause
        additional background threads to find and process other compactions that
        can run concurrently.
      
      Test Plan: ran db_bench with overwrite and readonly alternatively.
      
      Reviewers: sheki, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9579
      d0798f67
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  15. 20 3月, 2013 3 次提交
  16. 08 3月, 2013 1 次提交
    • D
      A mechanism to detect manifest file write errors and put db in readonly mode. · 6d812b6a
      Dhruba Borthakur 提交于
      Summary:
      If there is an error while writing an edit to the manifest file, the manifest
      file is closed and reopened to check if the edit made it in. However, if the
      re-opening of the manifest is unsuccessful and options.paranoid_checks is set
      t true, then the db refuses to accept new puts, effectively putting the db
      in readonly mode.
      
      In a future diff, I would like to make the default value of paranoid_check
      to true.
      
      Test Plan: make check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9201
      6d812b6a
  17. 07 3月, 2013 2 次提交
    • A
      Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file · d68880a1
      Abhishek Kona 提交于
      Summary:
      Store the last flushed, seq no. in db_impl. Check against it in
      transaction Log iterator. Do not attempt to read ahead if we do not know
      if the data is flushed completely.
      Does not work if flush is disabled. Any ideas on fixing that?
      * Minor change, iter->Next is called the first time automatically for
      * the first time.
      
      Test Plan:
      existing test pass.
      More ideas on testing this?
      Planning to run some stress test.
      
      Reviewers: dhruba, heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9087
      d68880a1
    • D
      Fox db_stress crash by copying keys before changing sequencenum to zero. · afed6093
      Dhruba Borthakur 提交于
      Summary:
      The compaction process zeros out sequence numbers if the output is
      part of the bottommost level.
      The Slice is supposed to refer to an immutable data buffer. The
      merger that implements the priority queue while reading kvs as
      the input of a compaction run reies on this fact. The bug was that
      were updating the sequence number of a record in-place and that was
      causing suceeding invocations of the merger to return kvs in
      arbitrary order of sequence numbers.
      The fix is to copy the key to a local memory buffer before setting
      its seqno to 0.
      
      Test Plan:
      Set Options.purge_redundant_kvs_while_flush = false and then run
      db_stress --ops_per_thread=1000 --max_key=320
      
      Reviewers: emayanke, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9147
      afed6093
  18. 04 3月, 2013 2 次提交
    • M
      Add rate_delay_limit_milliseconds · 993543d1
      Mark Callaghan 提交于
      Summary:
      This adds the rate_delay_limit_milliseconds option to make the delay
      configurable in MakeRoomForWrite when the max compaction score is too high.
      This delay is called the Ln slowdown. This change also counts the Ln slowdown
      per level to make it possible to see where the stalls occur.
      
      From IO-bound performance testing, the Level N stalls occur:
      * with compression -> at the largest uncompressed level. This makes sense
                            because compaction for compressed levels is much
                            slower. When Lx is uncompressed and Lx+1 is compressed
                            then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                            compaction process is the first to be slowed by
                            compression.
      * without compression -> at level 1
      
      Task ID: #1832108
      
      Blame Rev:
      
      Test Plan:
      run with real data, added test
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9045
      993543d1
    • D
      Ability for rocksdb to compact when flushing the in-memory memtable to a file in L0. · 806e2643
      Dhruba Borthakur 提交于
      Summary:
      Rocks accumulates recent writes and deletes in the in-memory memtable.
      When the memtable is full, it writes the contents on the memtable to
      a file in L0.
      
      This patch removes redundant records at the time of the flush. If there
      are multiple versions of the same key in the memtable, then only the
      most recent one is dumped into the output file. The purging of
      redundant records occur only if the most recent snapshot is earlier
      than the earliest record in the memtable.
      
      Should we switch on this feature by default or should we keep this feature
      turned off in the default settings?
      
      Test Plan: Added test case to db_test.cc
      
      Reviewers: sheki, vamsi, emayanke, heyongqiang
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8991
      806e2643
  19. 27 2月, 2013 1 次提交
  20. 23 2月, 2013 1 次提交
  21. 22 2月, 2013 1 次提交
    • A
      Counters for bytes written and read. · ec77366e
      Abhishek Kona 提交于
      Summary:
      * Counters for bytes read and write.
      as a part of this diff, I want to=>
      * Measure compaction times. @dhruba can you point which function, should
      * I time to get Compaction-times. Was looking at CompactRange.
      
      Test Plan: db_test
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8763
      ec77366e
  22. 21 2月, 2013 1 次提交
    • A
      Introduce histogram in statistics.h · fe10200d
      Abhishek Kona 提交于
      Summary:
      * Introduce is histogram in statistics.h
      * stop watch to measure time.
      * introduce two timers as a poc.
      Replaced NULL with nullptr to fight some lint errors
      Should be useful for google.
      
      Test Plan:
      ran db_bench and check stats.
      make all check
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8637
      fe10200d
  23. 19 2月, 2013 2 次提交
  24. 16 2月, 2013 1 次提交
    • A
      Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value · 4c696ed0
      amayank 提交于
      Summary:
      flush_on_destroy has a default value of false and the memtable is flushed
      in the dbimpl-destructor only when that is set to true. Because we want the memtable to be flushed everytime that
      the destructor is called(db is closed) and the cases where we work with the memtable only are very less
      it is a good idea to give this a default value of true. Thus the put from ldb
      wil have its data flushed to disk in the destructor and the next Get will be able to
      read it when opened with OpenForReadOnly. The reason that ldb could read the latest value when
      the db was opened in the normal Open mode is that the Get from normal Open first reads
      the memtable and directly finds the latest value written there and the Get from OpenForReadOnly
      doesn't have access to the memtable (which is correct because all its Put/Modify) are disabled
      
      Test Plan: make all; ldb put and get and scans
      
      Reviewers: dhruba, heyongqiang, sheki
      
      Reviewed By: heyongqiang
      
      CC: kosievdmerwe, zshao, dilipj, kailiu
      
      Differential Revision: https://reviews.facebook.net/D8631
      4c696ed0
  25. 05 2月, 2013 1 次提交
    • K
      Allow the logs to be purged by TTL. · b63aafce
      Kai Liu 提交于
      Summary:
      * Add a SplitByTTLLogger to enable this feature. In this diff I implemented generalized AutoSplitLoggerBase class to simplify the
      development of such classes.
      * Refactor the existing AutoSplitLogger and fix several bugs.
      
      Test Plan:
      * Added a unit tests for different types of "auto splitable" loggers individually.
      * Tested the composited logger which allows the log files to be splitted by both TTL and log size.
      
      Reviewers: heyongqiang, dhruba
      
      Reviewed By: heyongqiang
      
      CC: zshao, leveldb
      
      Differential Revision: https://reviews.facebook.net/D8037
      b63aafce
  26. 26 1月, 2013 1 次提交
    • C
      Fix poor error on num_levels mismatch and few other minor improvements · 0b83a831
      Chip Turner 提交于
      Summary:
      Previously, if you opened a db with num_levels set lower than
      the database, you received the unhelpful message "Corruption:
      VersionEdit: new-file entry."  Now you get a more verbose message
      describing the issue.
      
      Also, fix handling of compression_levels (both the run-over-the-end
      issue and the memory management of it).
      
      Lastly, unique_ptr'ify a couple of minor calls.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8151
      0b83a831
  27. 25 1月, 2013 2 次提交
    • C
      Stop continually re-creating build_version.c · 772f75b3
      Chip Turner 提交于
      Summary:
      We continually rebuilt build_version.c because we put the
      current date into it, but that's what __DATE__ already is.  This makes
      builds faster.
      
      This also fixes an issue with 'make clean FOO' not working properly.
      
      Also tweak the build rules to be more consistent, always have warnings,
      and add a 'make release' rule to handle flags for release builds.
      
      Test Plan: make, make clean
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D8139
      772f75b3
    • C
      Use fallocate to prevent excessive allocation of sst files and logs · 3dafdfb2
      Chip Turner 提交于
      Summary:
      On some filesystems, pre-allocation can be a considerable
      amount of space.  xfs in our production environment pre-allocates by
      1GB, for instance.  By using fallocate to inform the kernel of our
      expected file sizes, we eliminate this wasteage (that isn't recovered
      until the file is closed which, in the case of LOG files, can be a
      considerable amount of time).
      
      Test Plan:
      created an xfs loopback filesystem, mounted with
      allocsize=4M, and ran db_stress.  LOG file without this change was 4M,
      and with it it was 128k then grew to normal size.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: adsharma, leveldb
      
      Differential Revision: https://reviews.facebook.net/D7953
      3dafdfb2
  28. 24 1月, 2013 1 次提交
    • C
      Fix a number of object lifetime/ownership issues · 2fdf91a4
      Chip Turner 提交于
      Summary:
      Replace manual memory management with std::unique_ptr in a
      number of places; not exhaustive, but this fixes a few leaks with file
      handles as well as clarifies semantics of the ownership of file handles
      with log classes.
      
      Test Plan: db_stress, make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: zshao, leveldb, heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D8043
      2fdf91a4
  29. 18 1月, 2013 1 次提交