1. 01 4月, 2014 1 次提交
    • I
      Retry FS system calls on EINTR · 726c8084
      Igor Canadi 提交于
      Summary: EINTR means 'please retry'. We don't do that currenty. We should.
      
      Test Plan: make check, although it doesn't really test the new code. we'll just have to believe in the code!
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17349
      726c8084
  2. 29 3月, 2014 5 次提交
  3. 28 3月, 2014 2 次提交
  4. 27 3月, 2014 2 次提交
  5. 26 3月, 2014 1 次提交
  6. 25 3月, 2014 2 次提交
    • D
      [rocksdb] new CompactionFilterV2 API · b47812fb
      Danny Guo 提交于
      Summary:
      This diff adds a new CompactionFilterV2 API that roll up the
      decisions of kv pairs during compactions. These kv pairs must share the
      same key prefix. They are buffered inside the db.
      
          typedef std::vector<Slice> SliceVector;
          virtual std::vector<bool> Filter(int level,
                                       const SliceVector& keys,
                                       const SliceVector& existing_values,
                                       std::vector<std::string>* new_values,
                                       std::vector<bool>* values_changed
                                       ) const = 0;
      
      Application can override the Filter() function to operate
      on the buffered kv pairs. More details in the inline documentation.
      
      Test Plan:
      make check. Added unit tests to make sure Keep, Delete,
      Change all works.
      
      Reviewers: haobo
      
      CCs: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15087
      b47812fb
    • Y
      Enhance partial merge to support multiple arguments · cda4006e
      Yueh-Hsuan Chiang 提交于
      Summary:
      * PartialMerge api now takes a list of operands instead of two operands.
      * Add min_pertial_merge_operands to Options, indicating the minimum
        number of operands to trigger partial merge.
      * This diff is based on Schalk's previous diff (D14601), but it also
        includes necessary changes such as updating the pure C api for
        partial merge.
      
      Test Plan:
      * make check all
      * develop tests for cases where partial merge takes more than two
        operands.
      
      TODOs (from Schalk):
      * Add test with min_partial_merge_operands > 2.
      * Perform benchmarks to measure the performance improvements (can probably
        use results of task #2837810.)
      * Add description of problem to doc/index.html.
      * Change wiki pages to reflect the interface changes.
      
      Reviewers: haobo, igor, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16815
      cda4006e
  7. 22 3月, 2014 1 次提交
    • S
      Fix data corruption by LogBuffer · 83ab62e2
      sdong 提交于
      Summary: LogBuffer::AddLogToBuffer() uses vsnprintf() in the wrong way, which might cause buffer overflow when log line is too line. Fix it.
      
      Test Plan: Add a unit test to cover most LogBuffer's most logic.
      
      Reviewers: igor, haobo, dhruba
      
      Reviewed By: igor
      
      CC: ljin, yhchiang, leveldb
      
      Differential Revision: https://reviews.facebook.net/D17103
      83ab62e2
  8. 21 3月, 2014 2 次提交
  9. 20 3月, 2014 1 次提交
  10. 18 3月, 2014 1 次提交
    • I
      Optimize fallocation · f26cb0f0
      Igor Canadi 提交于
      Summary:
      Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads.
      
      This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x.
      
      At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported.
      
      Test Plan: make check
      
      Reviewers: dhruba, sdong, haobo, ljin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16761
      f26cb0f0
  11. 15 3月, 2014 5 次提交
  12. 13 3月, 2014 1 次提交
    • K
      A heuristic way to check if a memtable is full · 11da8bc5
      Kai Liu 提交于
      Summary:
      This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks.
      
      Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full"
      
      Test Plan: N/A
      
      Reviewers: haobo, sdong
      
      Reviewed By: sdong
      
      CC: leveldb, sdong
      
      Differential Revision: https://reviews.facebook.net/D15051
      11da8bc5
  13. 12 3月, 2014 2 次提交
    • S
      Fix data race against logging data structure because of LogBuffer · bd45633b
      sdong 提交于
      Summary:
      @igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs.
      
      With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes.
      
      Thanks @igor for this finding!
      
      Test Plan: make all check
      
      Reviewers: haobo, igor
      
      Reviewed By: haobo
      
      CC: dhruba, ljin, yhchiang, igor, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16767
      bd45633b
    • S
      Env to add a function to allow users to query waiting queue length · 01dcef11
      sdong 提交于
      Summary: Add a function to Env so that users can query the waiting queue length of each thread pool
      
      Test Plan: add a test in env_test
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: dhruba, igor, yhchiang, ljin, nkg-, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16755
      01dcef11
  14. 11 3月, 2014 3 次提交
    • L
      Consolidate SliceTransform object ownership · 8d007b4a
      Lei Jin 提交于
      Summary:
      (1) Fix SanitizeOptions() to also check HashLinkList. The current
      dynamic case just happens to work because the 2 classes have the same
      layout.
      (2) Do not delete SliceTransform object in HashSkipListFactory and
      HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
      prefix_extractor and SliceTransform to be the same object when
      Hash**Factory is used. This makes the behavior strange: when
      Hash**Factory is used, prefix_extractor will be released by RocksDB. If
      other memtable factory is used, prefix_extractor should be released by
      user.
      
      Test Plan: db_bench && make asan_check
      
      Reviewers: haobo, igor, sdong
      
      Reviewed By: igor
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D16587
      8d007b4a
    • H
      [RocksDB] LogBuffer Cleanup · 66da4679
      Haobo Xu 提交于
      Summary: Moved LogBuffer class to an internal header. Removed some unneccesary indirection. Enabled log buffer for BackgroundCallFlush. Forced log buffer flush right after Unlock to improve time ordering of info log.
      
      Test Plan: make check; db_bench compare LOG output
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      CC: leveldb, igor
      
      Differential Revision: https://reviews.facebook.net/D16707
      66da4679
    • I
      Add option verify_checksums_in_compaction · 04d2c26e
      Igor Canadi 提交于
      Summary:
      If verify_checksums_in_compaction is true, compaction will verify checksums. This is default.
      If it's false, compaction doesn't verify checksums. This is useful for in-memory workloads.
      
      Test Plan: corruption_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16695
      04d2c26e
  15. 08 3月, 2014 2 次提交
  16. 07 3月, 2014 2 次提交
  17. 06 3月, 2014 4 次提交
    • K
      Make sure GetUniqueID releated tests run on "regular" storage · abeee9f2
      Kai Liu 提交于
      Summary:
      With the use of tmpfs or ramfs, unit tests related to GetUniqueID()
      failed because of the failure from ioctl, which doesn't work with these
      fancy file systems at all.
      
      I fixed this issue and make sure all related tests run on the "regular"
      storage (disk or flash).
      
      Test Plan: TEST_TMPDIR=/dev/shm make check -j32
      
      Reviewers: igor, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16593
      abeee9f2
    • S
      Buffer info logs when picking compactions and write them out after releasing the mutex · ecb1ffa2
      sdong 提交于
      Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.
      
      Test Plan:
      make all check
      check the log lines while running some tests that trigger compactions.
      
      Reviewers: haobo, igor, dhruba
      
      Reviewed By: dhruba
      
      CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-
      
      Differential Revision: https://reviews.facebook.net/D16515
      ecb1ffa2
    • S
      Allow user to specify log level for info_log · 4405f3a0
      sdong 提交于
      Summary:
      Currently, there is no easy way for user to change log level of info log. Add a parameter in options to specify that.
      Also make the default level to INFO level. Removing the [INFO] tag if it is INFO level as I don't want to cause performance regression. (add [LOG] means another mem-copy and string formatting).
      
      Test Plan:
      make all check
      manual check the levels work as expected.
      
      Reviewers: dhruba, yhchiang
      
      Reviewed By: yhchiang
      
      CC: dhruba, igor, i.am.jin.lei, ljin, haobo, leveldb
      
      Differential Revision: https://reviews.facebook.net/D16563
      4405f3a0
    • L
      output perf_context in db_bench readrandom · 04298f8c
      Lei Jin 提交于
      Summary:
      Add helper function to print perf context data in db_bench if enabled.
      I didn't find any code that actually exports perf context data. Not sure
      if I missed anything
      
      Test Plan: ran db_bench
      
      Reviewers: haobo, sdong, igor
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16575
      04298f8c
  18. 04 3月, 2014 1 次提交
  19. 01 3月, 2014 1 次提交
    • I
      Make Log::Reader more robust · 58ca641d
      Igor Canadi 提交于
      Summary:
      This diff does two things:
      (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8#
      (2) Turn off mmap writes for all writes to log and manifest files
      
      (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing.
      
      Test Plan:
      Added unit tests from LevelDB
      Actually recovered a "corrupted" MANIFEST file.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16119
      58ca641d
  20. 28 2月, 2014 1 次提交