1. 12 11月, 2013 1 次提交
    • I
      Fixing failed delete file test · 94e139f9
      Igor Canadi 提交于
      Summary: FindObsoleteFiles() has to be called before PurgeObsoleteFiles() because FindObsoleteFiles() sets manifest_file_number, log_number and prev_log_number to valid values.
      
      Test Plan: deletefile_test now works
      
      Reviewers: dhruba, emayanke, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13995
      94e139f9
  2. 09 11月, 2013 1 次提交
    • I
      Speed up FindObsoleteFiles · 1510339e
      Igor Canadi 提交于
      Summary:
      Here's one solution we discussed on speeding up FindObsoleteFiles. Keep a set of all files in DBImpl and update the set every time we create a file. I probably missed few other spots where we create a file.
      
      It might speed things up a bit, but makes code uglier. I don't really like it.
      
      Much better approach would be to abstract all file handling to a separate class. Think of it as layer between DBImpl and Env. Having a separate class deal with file namings and deletion would benefit both code cleanliness (especially with huge DBImpl) and speed things up. It will take a huge effort to do this, though.
      
      Let's discuss offline today.
      
      Test Plan: Ran ./db_stress, verified that files are getting deleted
      
      Reviewers: dhruba, haobo, kailiu, emayanke
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D13827
      1510339e
  3. 08 11月, 2013 2 次提交
    • K
      Provide mechanism to configure when to flush the block · fd075d6e
      Kai Liu 提交于
      Summary: Allow block based table to configure the way flushing the blocks. This feature will allow us to add support for prefix-aligned block.
      
      Test Plan: make check
      
      Reviewers: dhruba, haobo, sdong, igor
      
      Reviewed By: sdong
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13875
      fd075d6e
    • I
      Flush the log outside of lock · 444cf88a
      Igor Canadi 提交于
      Summary:
      Added a new call LogFlush() that flushes the log contents to the OS buffers. We never call it with lock held.
      
      We call it once for every Read/Write and often in compaction/flush process so the frequency should not be a problem.
      
      Test Plan: db_test
      
      Reviewers: dhruba, haobo, kailiu, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13935
      444cf88a
  4. 07 11月, 2013 2 次提交
    • H
      [RocksDB] Generalize prefix-aware iterator to be used for more than one Seek · fd204488
      Haobo Xu 提交于
      Summary: Added a prefix_seek flag in ReadOptions to indicate that Seek is prefix aware(might not return data with different prefix), and also not bound to a specific prefix. Multiple Seeks and range scans can be invoked on the same iterator. If a specific prefix is specified, this flag will be ignored. Just a quick prototype that works for PrefixHashRep, the new lockless memtable could be easily extended with this support too.
      
      Test Plan: test it on Leaf
      
      Reviewers: dhruba, kailiu, sdong, igor
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13929
      fd204488
    • S
      WAL log retention policy based on archive size. · c2be2cba
      shamdor 提交于
      Summary:
      Archive cleaning will still happen every WAL_ttl seconds
      but archived logs will be deleted only if archive size
      is greater then a WAL_size_limit value.
      Empty archived logs will be deleted evety WAL_ttl.
      
      Test Plan:
      1. Unit tests pass.
      2. Benchmark.
      
      Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13869
      c2be2cba
  5. 05 11月, 2013 1 次提交
    • M
      Making the transaction log iterator more robust · f837f5b1
      Mayank Agarwal 提交于
      Summary:
      strict essentially means that we MUST find the startsequence. Thus we should return if starteSequence is not found in the first file in case strict is set. This will take care of ending the iterator in case of permanent gaps due to corruptions in the log files
      Also created NextImpl function that will have internal variable to distinguish whether Next is being called from StartSequence or by application.
      Set NotFoudn::gaps status to give an indication of gaps happeneing.
      Polished the inline documentation at various places
      
      Test Plan:
      * db_repl_stress test
      * db_test relating to transaction log iterator
      * fbcode/wormhole/rocksdb/rocks_log_iterator
      * sigma production machine sigmafio032.prn1
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13689
      f837f5b1
  6. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  7. 01 11月, 2013 1 次提交
    • H
      [RocksDB] Add OnCompactionStart to CompactionFilter class · 8cbe5bb5
      Haobo Xu 提交于
      Summary: This is to give application compaction filter a chance to access context information of a specific compaction run. For example, depending on whether a compaction goes through all data files, the application could do things differently.
      
      Test Plan: make check
      
      Reviewers: dhruba, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13683
      8cbe5bb5
  8. 31 10月, 2013 1 次提交
    • S
      Follow-up Cleaning-up After D13521 · f03b2df0
      Siying Dong 提交于
      Summary:
      This patch is to address @haobo's comments on D13521:
      1. rename Table to be TableReader and make its factory function to be GetTableReader
      2. move the compression type selection logic out of TableBuilder but to compaction logic
      3. more accurate comments
      4. Move stat name constants into BlockBasedTable implementation.
      5. remove some uncleaned codes in simple_table_db_test
      
      Test Plan: pass test suites.
      
      Reviewers: haobo, dhruba, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13785
      f03b2df0
  9. 29 10月, 2013 3 次提交
    • S
      Make "Table" pluggable · d4eec30e
      Siying Dong 提交于
      Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder.
      
      Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented.
      
      Reviewers: dhruba, haobo, kailiu, emayanke, vamsi
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13521
      d4eec30e
    • K
      Support user-defined table stats collector · 994575c1
      Kai Liu 提交于
      Summary:
      1. Added a new option that support user-defined table stats collection.
      2. Added a deleted key stats collector in `utilities`
      
      Test Plan:
      Added a unit test for newly added code.
      Also ran make check to make sure other tests are not broken.
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13491
      994575c1
    • I
      If a Put fails, fail all other puts · 100fa8e0
      Igor Canadi 提交于
      Summary:
      When a Put fails, it can leave database in a messy state. We don't want to pretend that everything is OK when it may not be. We fail every write following the failed one.
      
      I added checks for corruption to DBImpl::Write(). Is there anywhere else I need to add them?
      
      Test Plan: Corruption unit test.
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13671
      100fa8e0
  10. 25 10月, 2013 2 次提交
    • M
      Unify DeleteFile and DeleteWalFiles · 56305221
      Mayank Agarwal 提交于
      Summary:
      This is to simplify rocksdb public APIs and improve the code quality.
      Created an additional parameter to ParseFileName for log sub type and improved the code for deleting a wal file.
      Wrote exhaustive unit-tests in delete_file_test
      Unification of other redundant APIs can be taken up in a separate diff
      
      Test Plan: Expanded delete_file test
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13647
      56305221
    • K
      Fix the log number bug when updating MANIFEST file · c17607a2
      Kai Liu 提交于
      Summary:
      Crash may occur during the flushes of more than two mem tables.
      
      As the info log suggested, even when both were successfully flushed,
      the recovery process still pick up one of the memtable's log for recovery.
      
      This diff fix the problem by setting the correct "log number" in MANIFEST.
      
      Test Plan: make test; deployed to leaf4 and make sure it doesn't result in crashes of this type.
      
      Reviewers: haobo, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13659
      c17607a2
  11. 24 10月, 2013 1 次提交
  12. 23 10月, 2013 1 次提交
    • M
      Dbid feature · 9b50106f
      Mayank Agarwal 提交于
      Summary:
      Create a new type of file on startup if it doesn't already exist called DBID.
      This will store a unique number generated from boost library's uuid header file.
      The use-case is to identify the case of a db losing all its data and coming back up either empty or from an image(backup/live replica's recovery)
      the key point to note is that DBID is not stored in a backup or db snapshot
      It's preferable to use Boost for uuid because:
      1) A non-standard way of generating uuid is not good
      2) /proc/sys/kernel/random/uuid generates a uuid but only on linux environments and the solution would not be clean
      3) c++ doesn't have any direct way to get a uuid
      4) Boost is a very good library that was already having linkage in rocksdb from third-party
      Note: I had to update the TOOLCHAIN_REV in build files to get latest verison of boost from third-party as the older version had a bug.
      I had to put Wno-uninitialized in Makefile because boost-1.51 has an unitialized variable and rocksdb would not comiple otherwise. Latet open-source for boost is 1.54 but is not there in third-party. I have notified the concerned people in fbcode about it.
      @kailiu : While releasing to third-party, an additional dependency will need to be created for boost in TARGETS file. I can help identify.
      
      Test Plan:
      Expand db_test to test 2 cases
      1) Restarting db with Id file present - verify that no change to Id
      2)Restarting db with Id file deleted - verify that a different Id is there after reopen
      Also run make all check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13587
      9b50106f
  13. 18 10月, 2013 1 次提交
    • S
      Universal Compaction to Have a Size Percentage Threshold To Decide Whether to Compress · 9edda370
      Siying Dong 提交于
      Summary:
      This patch adds a option for universal compaction to allow us to only compress output files if the files compacted previously did not yet reach a specified ratio, to save CPU costs in some cases.
      
      Compression is always skipped for flushing. This is because the size information is not easy to evaluate for flushing case. We can improve it later.
      
      Test Plan:
      add test
      DBTest.UniversalCompactionCompressRatio1 and DBTest.UniversalCompactionCompressRatio12
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13467
      9edda370
  14. 17 10月, 2013 2 次提交
    • D
      Add appropriate LICENSE and Copyright message. · 9cd22109
      Dhruba Borthakur 提交于
      Summary:
      Add appropriate LICENSE and Copyright message.
      
      Test Plan:
      make check
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      9cd22109
    • S
      Enable background flush thread by default and fix issues related to it · 073cbfc8
      Siying Dong 提交于
      Summary:
      Enable background flush thread in this patch and fix unit tests with:
      (1) After background flush, schedule a background compaction if condition satisfied;
      (2) Fix a bug that if universal compaction is enabled and number of levels are set to be 0, compaction will not be automatically triggered
      (3) Fix unit tests to wait for compaction to finish instead of flush, before checking the compaction results.
      
      Test Plan: pass all unit tests
      
      Reviewers: haobo, xjin, dhruba
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13461
      073cbfc8
  15. 15 10月, 2013 3 次提交
    • M
      Features in Transaction log iterator · fe371396
      Mayank Agarwal 提交于
      Summary:
      * Logstore requests a valid change of reutrning an empty iterator and not an error in case of no log files.
      * Changed the code to return the writebatch containing the sequence number requested from GetupdatesSince even if it lies in the middle. Earlier we used to return the next writebatch,. This also allows me oto guarantee that no files played upon by the iterator are redundant. I mean the starting log file has at least a sequence number >= the sequence number requested form GetupdatesSince.
      * Cleaned up redundant logic in Iterator::Next and made a new function SeekToStartSequence for greater readability and maintainibilty.
      * Modified a test in db_test accordingly
      Please check the logic carefully and suggest improvements. I have a separate patch out for more improvements like restricting reader to read till written sequences.
      
      Test Plan:
      * transaction log iterator tests in db_test,
      * db_repl_stress.
      * rocks_log_iterator_test in fbcode/wormhole/rocksdb/test - 2 tests thriving on hacks till now can get simplified
      * testing on the shadow setup for sigma with replication
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13437
      fe371396
    • K
      Add statistics to sst file · 86ef6c3f
      Kai Liu 提交于
      Summary:
      So far we only have key/value pairs as well as bloom filter stored in the
      sst file.  It will be great if we are able to store more metadata about
      this table itself, for example, the entry size, bloom filter name, etc.
      
      This diff is the first step of this effort. It allows table to keep the
      basic statistics mentioned in http://fburl.com/14995441, as well as
      allowing writing user-collected stats to stats block.
      
      After this diff, we will figure out the interface of how to allow user to collect their interested statistics.
      
      Test Plan:
      1. Added several unit tests.
      2. Ran `make check` to ensure it doesn't break other tests.
      
      Reviewers: dhruba, haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13419
      86ef6c3f
    • S
      Change Function names from Compaction->Flush When they really mean Flush · 88f2f890
      Siying Dong 提交于
      Summary: When I debug the unit test failures when enabling background flush thread, I feel the function names can be made clearer for people to understand. Also, if the names are fixed, in many places, some tests' bugs are obvious (and some of those tests are failing). This patch is to clean it up for future maintenance.
      
      Test Plan: Run test suites.
      
      Reviewers: haobo, dhruba, xjin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13431
      88f2f890
  16. 11 10月, 2013 1 次提交
  17. 09 10月, 2013 1 次提交
    • N
      Add option for storing transaction logs in a separate dir · cbf4a064
      Naman Gupta 提交于
      Summary: In some cases, you might not want to store the data log (write ahead log) files in the same dir as the sst files. An example use case is leaf, which stores sst files in tmpfs. And would like to save the log files in a separate dir (disk) to save memory.
      
      Test Plan: make all. Ran db_test test. A few test failing. P2785018. If you guys don't see an obvious problem with the code, maybe somebody from the rocksdb team could help me debug the issue here. Running this on leaf worked well. I could see logs stored on disk, and deleted appropriately after compactions. Obviously this is only one set of options. The unit tests cover different options. Seems like I'm missing some edge cases.
      
      Reviewers: dhruba, haobo, leveldb
      
      CC: xinyaohu, sumeet
      
      Differential Revision: https://reviews.facebook.net/D13239
      cbf4a064
  18. 06 10月, 2013 2 次提交
  19. 05 10月, 2013 3 次提交
  20. 04 10月, 2013 2 次提交
  21. 03 10月, 2013 1 次提交
    • X
      Fix SIGSEGV issue in universal compaction · 658a3ce2
      Xing Jin 提交于
      Summary:
      We saw SIGSEGV when set options.num_levels=1 in universal compaction
      style. Dug into this issue for a while, and finally found the root cause (thank Haobo for discussion).
      
      Test Plan: Add new unit test. It throws SIGSEGV without this change. Also run "make all check".
      
      Reviewers: haobo, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13251
      658a3ce2
  22. 29 9月, 2013 1 次提交
  23. 27 9月, 2013 1 次提交
  24. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Remove Log file immediately after memtable flush · 0e422308
      Haobo Xu 提交于
      Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later.
      
      Test Plan: make check; db_bench
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11709
      0e422308
  25. 07 9月, 2013 1 次提交
    • D
      Flush was hanging because the configured options specified that more than 1... · 32c965d4
      Dhruba Borthakur 提交于
      Flush was hanging because the configured options specified that more than 1 memtable need to be merged.
      
      Summary:
      There is an config option called Options.min_write_buffer_number_to_merge
      that specifies the minimum number of write buffers to merge in memory
      before flushing to a file in L0. But in the the case when the db is
      being closed, we should not be using this config, instead we should
      flush whatever write buffers were available at that time.
      
      Test Plan: Unit test attached.
      
      Reviewers: haobo, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12717
      32c965d4
  26. 05 9月, 2013 2 次提交
    • M
      Return pathname relative to db dir in LogFile and cleanup AppendSortedWalsOfType · aa5c897d
      Mayank Agarwal 提交于
      Summary: So that replication can just download from wherever LogFile.Pathname is pointing them.
      
      Test Plan: make all check;./db_repl_stress
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12609
      aa5c897d
    • X
      New ldb command to convert compaction style · 42c109cc
      Xing Jin 提交于
      Summary:
      Add new command "change_compaction_style" to ldb tool. For
      universal->level, it shows "nothing to do". For level->universal, it
      compacts all files into a single one and moves the file to level 0.
      
      Also add check for number of files at level 1+ when opening db with
      universal compaction style.
      
      Test Plan:
      'make all check'. New unit test for internal convertion function. Also manully test various
      cmd like:
      
      ./ldb change_compaction_style --old_compaction_style=0
      --new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test
      
      Reviewers: haobo, dhruba
      
      Reviewed By: haobo
      
      CC: vamsi, emayanke
      
      Differential Revision: https://reviews.facebook.net/D12603
      42c109cc
  27. 02 9月, 2013 1 次提交
    • M
      Fix bug in Counters and record Sequencenumber using only TickerCount · c34271a5
      Mayank Agarwal 提交于
      Summary:
      The way counters/statistics are implemented in rocksdb demands that enum Tickers and TickerNameMap follow the same order, otherwise statistics exposed from fbcode/rocks get out-of-sync. 2 counters for prefix had violated this order and when I built counters for fbcode/mcrocksdb, statistics for sequence number were appearing out-of-sync.
      The other change is to record sequence-number using setTickerCount only and not recordTick. This is because of difference in statistics as understood by rocks/utils which uses ServiceData::statistics function and rocksdb statistics. In rocksdb there is just 1 counter for a countername. But in ServiceData there are 4 independent buckets for every countername-Count, Sum, Average and Rate. SetTickerCount and RecordTick update the same variable in rocksdb but different buckets in ServiceData. Therefore, I had to choose one consistent function from RecordTick or SetTickerCount for sequence number in rocksdb. I chose SetTickerCount because the statistics object in options passed during rocksdb-open is user-dependent and SetTickerCount makes sense there.
      There will be a corresponding diff to mcorcksdb in fbcode shortly.
      
      Test Plan: make all check; check ticker value using fprintfs
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12669
      c34271a5