1. 06 1月, 2015 1 次提交
    • I
      Fail DB::Open() on WAL corruption · d7b4bb62
      Igor Canadi 提交于
      Summary:
      This is a serious bug. If paranod_check == true and WAL is corrupted, we don't fail DB::Open(). I tried going into history and it seems we've been doing this for a long long time.
      
      I found this when investigating t5852041.
      
      Test Plan: Added unit test to verify correct behavior.
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30597
      d7b4bb62
  2. 24 12月, 2014 1 次提交
  3. 23 12月, 2014 1 次提交
    • Y
      Move GetThreadList() feature under Env. · 45bab305
      Yueh-Hsuan Chiang 提交于
      Summary:
      GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env.
      This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature
      on thread creation and destruction.
      
      Renamed ThreadStatusImpl to ThreadStatusUpdater.  Add ThreadStatusUtil, which is a static class contains
      utility functions for ThreadStatusUpdater.
      
      Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed.
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: ljin, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30057
      45bab305
  4. 22 12月, 2014 2 次提交
    • I
      Only execute flush from compaction if max_background_flushes = 0 · 4fd26f28
      Igor Canadi 提交于
      Summary: As title. We shouldn't need to execute flush from compaction if there are dedicated threads doing flushes.
      
      Test Plan: make check
      
      Reviewers: rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30579
      4fd26f28
    • I
      Speed up FindObsoleteFiles() · 0acc7388
      Igor Canadi 提交于
      Summary:
      There are two versions of FindObsoleteFiles():
      * full scan, which is executed every 6 hours (and it's terribly slow)
      * no full scan, which is executed every time a background process finishes and iterator is deleted
      
      This diff is optimizing the second case (no full scan). Here's what we do before the diff:
      * Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live.
      * Get the list of live files to avoid deleting files that are live.
      * Delete files that are in obsolete_files and not in live_files.
      
      After this diff:
      * The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files.
      * Get the list of obsolete files (which exclude moved files).
      * No need to get the list of live files, since all files in obsolete_files need to be deleted.
      
      I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123
      
      This depends on D30123.
      
      P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff.
      
      Test Plan:
      One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version.
      
      make check
      
      Big number of compactions and flushes:
      
        ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      Reviewers: yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30249
      0acc7388
  5. 21 12月, 2014 1 次提交
    • I
      Fix a SIGSEGV in BackgroundFlush · f8999fcf
      Igor Canadi 提交于
      Summary:
      This one wasn't easy to find :)
      
      What happens is we go through all cfds on flush_queue_ and find no cfds to flush, *but* the cfd is set to the last CF we looped through and following code assumes we want it flushed.
      
      BTW @sdong do you think we should also make BackgroundFlush() only check a single cfd for flushing instead of doing this `while (!flush_queue_.empty())`?
      
      Test Plan: regression test no longer fails
      
      Reviewers: sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: sdong, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30591
      f8999fcf
  6. 20 12月, 2014 1 次提交
    • I
      Rewritten system for scheduling background work · fdb6be4e
      Igor Canadi 提交于
      Summary:
      When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
      
      The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
      
      Here are the performance results:
      
      Command:
      
          ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
      
      Before the patch:
      
           fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s
      
      After the patch:
      
            fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s
      
      Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
      
            fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s
      
      Test Plan:
      make check
      
      two stress tests:
      
      Big number of compactions and flushes:
      
          ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      max_background_flushes=0, to verify that this case also works correctly
      
          ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
      
      Reviewers: ljin, rven, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30123
      fdb6be4e
  7. 17 12月, 2014 2 次提交
  8. 16 12月, 2014 1 次提交
    • V
      RocksDB: Allow Level-Style Compaction to Place Files in Different Paths · 153f4f07
      Venkatesh Radhakrishnan 提交于
      Summary:
      Allow Level-style compaction to place files in different paths
      This diff provides the code for task 4854591. We now support level-compaction
      to place files in different paths by specifying  them in db_paths  along with
      the minimum level for files to store in that path.
      
      Test Plan: ManualLevelCompactionOutputPathId in db_test.cc
      
      Reviewers: yhchiang, MarkCallaghan, dhruba, yoshinorim, sdong
      
      Reviewed By: sdong
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D29799
      153f4f07
  9. 12 12月, 2014 1 次提交
    • S
      Improve scalability of DB::GetSnapshot() · d7a48666
      sdong 提交于
      Summary: Now DB::GetSnapshot() doesn't scale to more column families, as it needs to go through all the column families to find whether snapshot is supported. This patch optimizes it.
      
      Test Plan:
      Add unit tests to cover negative cases.
      make all check
      
      Reviewers: yhchiang, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D30093
      d7a48666
  10. 09 12月, 2014 1 次提交
  11. 06 12月, 2014 1 次提交
  12. 04 12月, 2014 1 次提交
    • M
      Add Moved(GB) to Compaction IO stats · 32a0a038
      Mark Callaghan 提交于
      Summary:
      Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction
      IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt),
      Wnp1(cnt), Wnew(cnt).
      Example old output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     0/0          0   0.0      0.0     0.0      0.0    2130.8   2130.8    0.0   0.0      0.0    109.1        0         0         0         0      20002     25068    0.798      28.75     182059    0.16       0          0
        L1   142/0        509   1.0   4618.5  2036.5   2582.0    4602.1   2020.2    4.5   2.3     88.5     88.1    24220    701246   1215528    514282      53466      4229   12.643       0.00          0    0.002032745988  300688729
      
      Example new output:
      Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)     RecordIn   RecordDrop
      --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        L0     7/0         13   1.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     14.7        44       353    0.124       0.03        626    0.05            0            0
        L1     9/0         16   1.6      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000       0.00          0    0.00            0            0
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D29787
      32a0a038
  13. 03 12月, 2014 1 次提交
    • J
      Enforce write buffer memory limit across column families · a14b7873
      Jonah Cohen 提交于
      Summary:
      Introduces a new class for managing write buffer memory across column
      families.  We supplement ColumnFamilyOptions::write_buffer_size with
      ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
      instance that enforces memory limits before flushing out to disk.
      
      Test Plan: Added SharedWriteBuffer unit test to db_test.cc
      
      Reviewers: sdong, rven, ljin, igor
      
      Reviewed By: igor
      
      Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
      
      Differential Revision: https://reviews.facebook.net/D22581
      a14b7873
  14. 25 11月, 2014 1 次提交
  15. 22 11月, 2014 1 次提交
    • Y
      Fix leak when create_missing_column_families=true on ThreadStatus · 569853ed
      Yueh-Hsuan Chiang 提交于
      Summary:
      An entry of ConstantColumnFamilyInfo is created when:
      1. DB::Open
      2. CreateColumnFamily.
      
      However, there are cases that DB::Open could also call CreateColumnFamily
      when create_missing_column_families=true.  As a result, it will create
      duplicate ConstantColumnFamilyInfo and one of them would be leaked.
      
      Test Plan: ./deletefile_test
      
      Reviewers: igor, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D29307
      569853ed
  16. 21 11月, 2014 2 次提交
    • Y
      Add enable_thread_tracking to DBOptions · 4b63fcbf
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add enable_thread_tracking to DBOptions to allow
      tracking thread status related to the DB.  Default is off.
      
      Test Plan:
      export ROCKSDB_TESTS=ThreadList
      ./db_test
      
      Reviewers: ljin, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D29289
      4b63fcbf
    • Y
      Introduce GetThreadList API · d0c5f28a
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add GetThreadList API, which allows developer to track the
      status of each process.  Currently, calling GetThreadList will
      only get the list of background threads in RocksDB with their
      thread-id and thread-type (priority) set.  Will add more support
      on this in the later diffs.
      
      ThreadStatus currently has the following properties:
      
        // An unique ID for the thread.
        const uint64_t thread_id;
      
        // The type of the thread, it could be ROCKSDB_HIGH_PRIORITY,
        // ROCKSDB_LOW_PRIORITY, and USER_THREAD
        const ThreadType thread_type;
      
        // The name of the DB instance where the thread is currently
        // involved with.  It would be set to empty string if the thread
        // does not involve in any DB operation.
        const std::string db_name;
      
        // The name of the column family where the thread is currently
        // It would be set to empty string if the thread does not involve
        // in any column family.
        const std::string cf_name;
      
        // The event that the current thread is involved.
        // It would be set to empty string if the information about event
        // is not currently available.
      
      Test Plan:
      ./thread_list_test
      export ROCKSDB_TESTS=GetThreadList
      ./db_test
      
      Reviewers: rven, igor, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D25047
      d0c5f28a
  17. 19 11月, 2014 1 次提交
  18. 15 11月, 2014 2 次提交
    • I
      Clean job context in DeleteFile · 84af2ff8
      Igor Canadi 提交于
      84af2ff8
    • I
      Explicitly clean JobContext · 5c04acda
      Igor Canadi 提交于
      Summary: This way we can gurantee that old MemTables get destructed before DBImpl gets destructed, which might be useful if we want to make them depend on state from DBImpl.
      
      Test Plan: make check with asserts in JobContext's destructor
      
      Reviewers: ljin, sdong, yhchiang, rven, jonahcohen
      
      Reviewed By: jonahcohen
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28959
      5c04acda
  19. 14 11月, 2014 4 次提交
    • Y
      Fix SIGSEGV · 4161de92
      Yueh-Hsuan Chiang 提交于
      Summary: As a short-term fix, let's go back to previous way of calculating NeedsCompaction(). SIGSEGV happens because NeedsCompaction() can happen before super_version (and thus MutableCFOptions) is initialized.
      
      Test Plan: make check
      
      Reviewers: ljin, sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28875
      4161de92
    • I
      No CompactFiles in ROCKSDB_LITE · 772bc97f
      Igor Canadi 提交于
      Summary: It adds lots of code.
      
      Test Plan: compile for iOS, compile for mac. works.
      
      Reviewers: rven, sdong, ljin, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28857
      772bc97f
    • Y
      Move NeedsCompaction() from VersionStorageInfo to CompactionPicker · 1d1a64f5
      Yueh-Hsuan Chiang 提交于
      Summary:
      Move NeedsCompaction() from VersionStorageInfo to CompactionPicker
      to allow different compaction strategy to have their own way to
      determine whether doing compaction is necessary.
      
      When compaction style is set to kCompactionStyleNone, then
      NeedsCompaction() will always return false.
      
      Test Plan:
      export ROCKSDB_TESTS=Compact
      ./db_test
      
      Reviewers: ljin, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28719
      1d1a64f5
    • I
      Fix iOS compile with -Wshorten-64-to-32 · 25f27302
      Igor Canadi 提交于
      Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :(
      
      Test Plan: TARGET_OS=IOS make static_lib
      
      Reviewers: dhruba, ljin, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28743
      25f27302
  20. 12 11月, 2014 1 次提交
    • I
      Turn on -Wshorten-64-to-32 and fix all the errors · 767777c2
      Igor Canadi 提交于
      Summary:
      We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
      
      This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
      
      Test Plan: compiles
      
      Reviewers: ljin, rven, yhchiang, sdong
      
      Reviewed By: yhchiang
      
      Subscribers: bobbaldwin, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28689
      767777c2
  21. 11 11月, 2014 2 次提交
  22. 08 11月, 2014 4 次提交
    • I
      Get rid of mutex in CompactionJob's state · e3d3567b
      Igor Canadi 提交于
      Summary: Based on @sdong's feedback in the diff, we shouldn't keep db_mutex in CompactionJob's state. This diff removes db_mutex from CompactionJob state, by making next_file_number_ atomic. That way we only need to pass the lock to InstallCompactionResults() because of LogAndApply()
      
      Test Plan: make check
      
      Reviewers: ljin, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: sdong, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28491
      e3d3567b
    • Y
      Fixed compile error in db/db_impl.cc · b8b39034
      Yueh-Hsuan Chiang 提交于
      Summary:
      Fixed compile error in db/db_impl.cc
      
      Test Plan:
      make
      b8b39034
    • Y
      CompactFiles, EventListener and GetDatabaseMetaData · 28c82ff1
      Yueh-Hsuan Chiang 提交于
      Summary:
      This diff adds three sets of APIs to RocksDB.
      
      = GetColumnFamilyMetaData =
      * This APIs allow users to obtain the current state of a RocksDB instance on one column family.
      * See GetColumnFamilyMetaData in include/rocksdb/db.h
      
      = EventListener =
      * A virtual class that allows users to implement a set of
        call-back functions which will be called when specific
        events of a RocksDB instance happens.
      * To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners
      
      = CompactFiles =
      * CompactFiles API inputs a set of file numbers and an output level, and RocksDB
        will try to compact those files into the specified level.
      
      = Example =
      * Example code can be found in example/compact_files_example.cc, which implements
        a simple external compactor using EventListener, GetColumnFamilyMetaData, and
        CompactFiles API.
      
      Test Plan:
      listener_test
      compactor_test
      example/compact_files_example
      export ROCKSDB_TESTS=CompactFiles
      db_test
      export ROCKSDB_TESTS=MetaData
      db_test
      
      Reviewers: ljin, igor, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: MarkCallaghan, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D24705
      28c82ff1
    • I
      Redesign pending_outputs_ · 53af5d87
      Igor Canadi 提交于
      Summary:
      Here's a prototype of redesigning pending_outputs_. This way, we don't have to expose pending_outputs_ to other classes (CompactionJob, FlushJob, MemtableList). DBImpl takes care of it.
      
      Still have to write some comments, but should be good enough to start the discussion.
      
      Test Plan: make check, will also run stress test
      
      Reviewers: ljin, sdong, rven, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28353
      53af5d87
  23. 05 11月, 2014 4 次提交
  24. 04 11月, 2014 1 次提交
    • S
      DB::Open() to automatically increase thread pool size if it is smaller than... · 09899f0b
      sdong 提交于
      DB::Open() to automatically increase thread pool size if it is smaller than max number of parallel compactions or flushes
      
      Summary:
      With the patch, thread pool size will be automatically increased if DB's options ask for more parallelism of compactions or flushes.
      
      Too many users have been confused by the API. Change it to make it harder for users to make mistakes
      
      Test Plan: Add two unit tests to cover the function.
      
      Reviewers: yhchiang, rven, igor, MarkCallaghan, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27555
      09899f0b
  25. 01 11月, 2014 2 次提交
    • I
      CompactionJob · 74eb4fbe
      Igor Canadi 提交于
      Summary:
      Long awaited CompactionJob class! Move most compaction-related things from DBImpl to CompactionJob, making CompactionJob easier to test and understand.
      
      Currently this is just replicating exactly the same functionality with as little as change as possible. As future work, we should:
      1. Add CompactionJob tests (I think I'll do that tomorrow)
      2. Reduce CompactionJob's state that it inherits from DBImpl
      3. Figure out how to do yielding to flush better. Currently I implemented a callback as we agreed yesterday, but I don't think it's a good long term solution.
      
      This reduces db_impl.cc from 5000+ LOC to 3400!
      
      Test Plan: make check, will add CompactionJob-specific tests, probably also move some tests from db_test to compaction_job_test
      
      Reviewers: rven, yhchiang, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27957
      74eb4fbe
    • I
      Turn on -Wshadow · 9f7fc3ac
      Igor Canadi 提交于
      Summary:
      ...and fix all the errors :)
      
      Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
      
      Test Plan: compiles
      
      Reviewers: yhchiang, rven, sdong, ljin
      
      Reviewed By: ljin
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D27711
      9f7fc3ac