1. 24 2月, 2015 1 次提交
  2. 20 2月, 2015 1 次提交
  3. 13 2月, 2015 1 次提交
    • I
      Introduce job_id for flush and compaction · e7ea51a8
      Igor Canadi 提交于
      Summary:
      It would be good to assing background job their IDs. Two benefits:
      1) makes LOGs more readable
      2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize
      
      Test Plan: ran rocksdb, read the LOG
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D31617
      e7ea51a8
  4. 05 2月, 2015 1 次提交
    • Y
      Add a counter for collecting the wait time on db mutex. · 181191a1
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add a counter for collecting the wait time on db mutex.
      Also add MutexWrapper and CondVarWrapper for measuring wait time.
      
      Test Plan:
      ./db_test
      export ROCKSDB_TESTS=MutexWaitStats
      ./db_test
      
      verify stats output using db_bench
      make clean
      make release
      ./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10
      
      Sample output:
          rocksdb.db.mutex.wait.micros COUNT : 7546866
      
      Reviewers: MarkCallaghan, rven, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D32787
      181191a1
  5. 28 1月, 2015 2 次提交
  6. 27 1月, 2015 1 次提交
    • I
      Fix data race #1 · f1c88624
      Igor Canadi 提交于
      Summary:
      This is first in a series of diffs that fixes data races detected by thread sanitizer.
      
      Here the problem is that we call Ref() on a column family during a single-threaded write, without holding a mutex.
      
      Test Plan: TSAN is no longer complaining about LevelLimitReopen.
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D32121
      f1c88624
  7. 07 1月, 2015 1 次提交
    • I
      Simplify column family concurrency · 7731d51c
      Igor Canadi 提交于
      Summary:
      This patch changes concurrency guarantees around ColumnFamilySet::column_families_ and ColumnFamilySet::column_families_data_.
      
      Before:
      * When mutating: lock DB mutex and spin lock
      * When reading: lock DB mutex OR spin lock
      
      After:
      * When mutating: lock DB mutex and be in write thread
      * When reading: lock DB mutex or be in write thread
      
      That way, we eliminate the spin lock that protects these hash maps and  simplify concurrency. That means we don't need to lock the spin lock during writing, since writing is mutually exclusive with column family create/drop (the only operations that mutate those hash maps).
      
      With these new restrictions, I also needed to move column family create to the write thread (column family drop was already in the write thread).
      
      Even though we don't need to lock the spin lock during write, impact on performance should be minimal -- the spin lock is almost never busy, so locking it is almost free.
      
      This addresses task t5116919.
      
      Test Plan:
      make check
      
      Stress test with lots and lots of column family drop and create:
      
         time ./db_stress --threads=30 --ops_per_thread=5000000 --max_key=5000 --column_families=200 --clear_column_family_one_in=100000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress/
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30651
      7731d51c
  8. 20 12月, 2014 1 次提交
    • I
      Rewritten system for scheduling background work · fdb6be4e
      Igor Canadi 提交于
      Summary:
      When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
      
      The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
      
      Here are the performance results:
      
      Command:
      
          ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
      
      Before the patch:
      
           fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s
      
      After the patch:
      
            fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s
      
      Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
      
            fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s
      
      Test Plan:
      make check
      
      two stress tests:
      
      Big number of compactions and flushes:
      
          ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      max_background_flushes=0, to verify that this case also works correctly
      
          ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      Reviewers: ljin, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30123
      fdb6be4e
  9. 03 12月, 2014 2 次提交
    • J
      Enforce write buffer memory limit across column families · a14b7873
      Jonah Cohen 提交于
      Summary:
      Introduces a new class for managing write buffer memory across column
      families.  We supplement ColumnFamilyOptions::write_buffer_size with
      ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
      instance that enforces memory limits before flushing out to disk.
      
      Test Plan: Added SharedWriteBuffer unit test to db_test.cc
      
      Reviewers: sdong, rven, ljin, igor
      
      Reviewed By: igor
      
      Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D22581
      a14b7873
    • I
      Fix linters · 37d73d59
      Igor Canadi 提交于
      Summary:
      Two fixes:
      1. if cpplint is not present on the system, don't return a confusing error in the linter
      2. Add include_alpha, which means our includes should be sorted lexicographically
      
      Test Plan: Tried unsorting our includes, lint complained
      
      Reviewers: rven, ljin, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28845
      37d73d59
  10. 27 11月, 2014 1 次提交
  11. 19 11月, 2014 1 次提交
  12. 14 11月, 2014 2 次提交
  13. 08 11月, 2014 1 次提交
    • Y
      CompactFiles, EventListener and GetDatabaseMetaData · 28c82ff1
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff adds three sets of APIs to RocksDB.
      
      = GetColumnFamilyMetaData =
      * This APIs allow users to obtain the current state of a RocksDB instance on one column family.
      * See GetColumnFamilyMetaData in include/rocksdb/db.h
      
      = EventListener =
      * A virtual class that allows users to implement a set of
        call-back functions which will be called when specific
        events of a RocksDB instance happens.
      * To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners
      
      = CompactFiles =
      * CompactFiles API inputs a set of file numbers and an output level, and RocksDB
        will try to compact those files into the specified level.
      
      = Example =
      * Example code can be found in example/compact_files_example.cc, which implements
        a simple external compactor using EventListener, GetColumnFamilyMetaData, and
        CompactFiles API.
      
      Test Plan:
      listener_test
      compactor_test
      example/compact_files_example
      export ROCKSDB_TESTS=CompactFiles
      db_test
      export ROCKSDB_TESTS=MetaData
      db_test
      
      Reviewers: ljin, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D24705
      28c82ff1
  14. 07 11月, 2014 1 次提交
    • I
      Turn -Wshadow back on · 9f20395c
      Igor Canadi 提交于
      Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
      
      Test Plan: compiles
      
      Reviewers: ljin, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28131
      9f20395c
  15. 05 11月, 2014 2 次提交
  16. 01 11月, 2014 2 次提交
    • I
      Turn on -Wshadow · 9f7fc3ac
      Igor Canadi 提交于
      Summary:
      ...and fix all the errors :)
      
      Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
      
      Test Plan: compiles
      
      Reviewers: yhchiang, rven, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27711
      9f7fc3ac
    • S
      Make VersionBuilder unit testable · 4d2ba38b
      sdong 提交于
      Summary:
      Rename Version::Builder to VersionBuilder and expose its definition to a header.
      Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
      Add version_builder_test which has a simple test.
      
      Test Plan: make all check
      
      Reviewers: rven, yhchiang, igor, ljin
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D27969
      4d2ba38b
  17. 30 10月, 2014 2 次提交
    • S
      Make CompactionPicker more easily tested · 76d1c28e
      sdong 提交于
      Summary:
      Make compaction picker easier to test.
      The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
      
      It now passes most tests. Still two things need to be done:
      (1) deal with the FIFO compaction's file size.
      (2) write an example test to make sure the interface can do the job.
      
      Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
      
      Test Plan:
      Pass all unit tests and compaction_picker_test
      
      Reviewers: yhchiang, rven, igor, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D27639
      76d1c28e
    • Y
      Apply InfoLogLevel to the logs in db/column_family.cc · 34d436b7
      Yueh-Hsuan Chiang 提交于
      Summary: Apply InfoLogLevel to the logs in db/column_family.cc
      
      Test Plan: make
      
      Reviewers: ljin, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27843
      34d436b7
  18. 29 10月, 2014 2 次提交
    • I
      FlushProcess · a39e931e
      Igor Canadi 提交于
      Summary:
      Abstract out FlushProcess and take it out of DBImpl.
      This also includes taking DeletionState outside of DBImpl.
      
      Currently this diff is only doing the refactoring. Future work includes:
      1. Decoupling flush_process.cc, make it depend on less state
      2. Write flush_process_test, which will mock out everything that FlushProcess depends on and test it in isolation
      
      Test Plan: make check
      
      Reviewers: rven, yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27561
      a39e931e
    • L
      unfriend ColumnFamilyData from VersionSet · f981e081
      Lei Jin 提交于
      Summary: as title
      
      Test Plan:
      make release
      will run full test on all stacked diffs before committing
      
      Reviewers: sdong, yhchiang, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27591
      f981e081
  19. 28 10月, 2014 1 次提交
    • L
      dynamic inplace_update options · f1841985
      Lei Jin 提交于
      Summary:
      Make inplace_update_support and inplace_update_num_locks dynamic.
      inplace_callback becomes immutable
      We are almost free of references to cfd->options() in db_impl
      
      Test Plan: unit test
      
      Reviewers: igor, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D25293
      f1841985
  20. 17 10月, 2014 2 次提交
  21. 02 10月, 2014 1 次提交
    • L
      make compaction related options changeable · 5ec53f3e
      Lei Jin 提交于
      Summary:
      make compaction related options changeable. Most of changes are tedious,
      following the same convention: grabs MutableCFOptions at the beginning
      of compaction under mutex, then pass it throughout the job and register
      it in SuperVersion at the end.
      
      Test Plan: make all check
      
      Reviewers: igor, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23349
      5ec53f3e
  22. 25 9月, 2014 1 次提交
  23. 23 9月, 2014 1 次提交
    • S
      WriteBatchWithIndex to allow different Comparators for different column families · d0de413f
      sdong 提交于
      Summary:
      Previously, one single column family is given to WriteBatchWithIndex to index keys for all column families. An extra map from column family ID to comparator is maintained which can override the default comparator given in the constructor. A WriteBatchWithIndex::SetComparatorForCF() is added for user to add comparators per column family.
      
      Also move more codes into anonymous namespace.
      
      Test Plan: Add a unit test
      
      Reviewers: ljin, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb, yhchiang
      
      Differential Revision: https://reviews.facebook.net/D23355
      d0de413f
  24. 18 9月, 2014 1 次提交
  25. 11 9月, 2014 1 次提交
    • I
      Push model for flushing memtables · 3d9e6f77
      Igor Canadi 提交于
      Summary:
      When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path.
      
      There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this.
      
      Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while.
      
      Reviewers: yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23067
      3d9e6f77
  26. 09 9月, 2014 3 次提交
    • L
      MemTableOptions · 52311463
      Lei Jin 提交于
      Summary: removed reference to options in WriteBatch and DBImpl::Get()
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23049
      52311463
    • I
      Check stop level trigger-0 before slowdown level-0 trigger · 2d57828d
      Igor Canadi 提交于
      Summary: ...
      
      Test Plan: Can't repro the test failure, but let's see what jenkins says
      
      Reviewers: zagfox, sdong, ljin
      
      Reviewed By: sdong, ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D23061
      2d57828d
    • I
      Push- instead of pull-model for managing Write stalls · a2bb7c3c
      Igor Canadi 提交于
      Summary:
      Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
      * proceed with all writes without delay
      * delay all writes by fixed time
      * stop all writes
      
      The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
      
      When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
      
      This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
      
      Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
      
      Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: leveldb
      
      Differential Revision: https://reviews.facebook.net/D22791
      a2bb7c3c
  27. 05 9月, 2014 1 次提交
    • L
      introduce ImmutableOptions · 5665e5e2
      Lei Jin 提交于
      Summary:
      As a preparation to support updating some options dynamically, I'd like
      to first introduce ImmutableOptions, which is a subset of Options that
      cannot be changed during the course of a DB lifetime without restart.
      
      ColumnFamily will keep both Options and ImmutableOptions. Any component
      below ColumnFamily should only take ImmutableOptions in their
      constructor. Other options should be taken from APIs, which will be
      allowed to adjust dynamically.
      
      I am yet to make changes to memtable and other related classes to take
      ImmutableOptions in their ctor. That can be done in a seprate diff as
      this one is already pretty big.
      
      Test Plan: make all check
      
      Reviewers: yhchiang, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D22545
      5665e5e2
  28. 26 8月, 2014 1 次提交
  29. 19 8月, 2014 1 次提交
    • S
      WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index · 28b5c760
      sdong 提交于
      Summary:
      Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.
      
      WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.
      
      Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.
      
      I will add more unit tests if people are OK about this API.
      
      Test Plan:
      make all check
      Add unit tests.
      
      Reviewers: yhchiang, igor, MarkCallaghan, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D21381
      28b5c760
  30. 12 8月, 2014 1 次提交
    • M
      Changes to support unity build: · 93e6b5e9
      miguelportilla 提交于
      * Script for building the unity.cc file via Makefile
      * Unity executable Makefile target for testing builds
      * Source code changes to fix compilation of unity build
      93e6b5e9