1. 28 8月, 2015 3 次提交
    • I
      Merge pull request #701 from PraveenSinghRao/usewinapi_notcruntime · e2db15ef
      Igor Canadi 提交于
      Remove usage of C runtime API that has file handle limitation
      e2db15ef
    • A
      Fix DBTest.ApproximateMemoryUsage · e853191c
      Andres Noetzli 提交于
      Summary:
      This patch fixes two issues in DBTest.ApproximateMemoryUsage:
      - It was possible that a flush happened between getting the two properties in
        Phase 1, resulting in different numbers for the properties and failing the
        assertion. This is fixed by waiting for the flush to finish before getting
        the properties.
      - There was a similar issue in Phase 2 and additionally there was an issue that
        rocksdb.size-all-mem-tables was not monotonically increasing because it was
        possible that a flush happened just after getting the properties and then
        another flush just before getting the properties in the next round. In this
        situation, the reported memory usage decreased. This is fixed by forcing a
        flush before getting the properties.
      
      Note: during testing, I found that kFlushesPerRound does not seem very
      accurate. I added a TODO for this and it would be great to get some input on
      what to do there.
      
      Test Plan:
      The first issue can be made more likely to trigger by inserting a
      `usleep(10000);` between the calls to GetIntProperty() in Phase 1.
      The second issue can be made more likely to trigger by inserting a
      `if (r != 0) usleep(10000);` before the calls to GetIntProperty() and a
      `usleep(10000);` after the calls.
      Then execute make db_test && ./db_test --gtest_filter=DBTest.ApproximateMemoryUsage
      
      Reviewers: rven, yhchiang, igor, sdong, anthony
      
      Reviewed By: anthony
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45675
      e853191c
    • S
      Merge pull request #699 from OpenChannelSSD/to_fb_master · 7c916a5d
      Siying Dong 提交于
      Helper functions to support direct IO
      7c916a5d
  2. 27 8月, 2015 8 次提交
    • J
      Helper functions to support direct IO · 0886f4f6
      Javier González 提交于
      Summary:
      This patch adds the helper functions and variables to allow a backend
      implementing WritableFile to support direct IO when persisting a
      memtable.
      
      Test Plan:
      Since there is no upstream implementation of WritableFile supporting
      direct IO, the new behavior is disabled.
      
      Tests should be provided by the backend implementing WritableFile.
      0886f4f6
    • P
      7e327980
    • Y
      Add argument --show_table_properties to db_bench · 8ef0144e
      Yueh-Hsuan Chiang 提交于
      Summary:
      Add argument --show_table_properties to db_bench
      
        -show_table_properties (If true, then per-level table properties will be
          printed on every stats-interval when stats_interval is set and
          stats_per_interval is on.) type: bool default: false
      
      Test Plan:
      ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1
      ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 --num_column_families=2
      
      Sample Output:
      
          Compaction Stats [column_family_name_000001]
          Level    Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)  KeyIn KeyDrop
          ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
            L0      3/0          5   0.8      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     86.3         0        17    0.021          0       0      0
            L1      5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
            L2      9/0         16   0.2      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0         0         0    0.000          0       0      0
           Sum     17/0         31   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     86.3         0        17    0.021          0       0      0
           Int      0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     83.9         0         2    0.022          0       0      0
          Flush(GB): cumulative 0.030, interval 0.004
          Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
      
          Level[0]: # data blocks=2571; # entries=84813; raw key size=2035512; raw average key size=24.000000; raw value size=8481300; raw average value size=100.000000; data block size=5690119; index block size=82415; filter block size=0; (estimated) table size=5772534; filter policy name=N/A;
          Level[1]: # data blocks=4285; # entries=141355; raw key size=3392520; raw average key size=24.000000; raw value size=14135500; raw average value size=100.000000; data block size=9487353; index block size=137377; filter block size=0; (estimated) table size=9624730; filter policy name=N/A;
          Level[2]: # data blocks=7713; # entries=254439; raw key size=6106536; raw average key size=24.000000; raw value size=25443900; raw average value size=100.000000; data block size=17077893; index block size=247269; filter block size=0; (estimated) table size=17325162; filter policy name=N/A;
          Level[3]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
          Level[4]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
          Level[5]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
          Level[6]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
      
      Reviewers: anthony, IslamAbdelRahman, MarkCallaghan, sdong, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45651
      8ef0144e
    • Y
      ColumnFamilyOptions serialization / deserialization. · 1fb2abae
      Yueh-Hsuan Chiang 提交于
      Summary:
      This patch adds GetStringFromColumnFamilyOptions(), the inverse function
      of the existing GetColumnFamilyOptionsFromString(), and improves
      the implementation of GetColumnFamilyOptionsFromString().
      
      Test Plan: Add a test in options_test.cc
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: noetzli, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45009
      1fb2abae
    • I
      ReadaheadRandomAccessFile -- userspace readahead · 5f4166c9
      Igor Canadi 提交于
      Summary:
      ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.
      
      We add ReadaheadRandomAccessFile layer only when file is read during compactions.
      
      D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45123
      5f4166c9
    • I
      Mmap reads should not return error if reading past file · 16ebe3a2
      Igor Canadi 提交于
      Summary:
      Currently, mmap returns IOError when user tries to read data past the end of the file. This diff changes the behavior. Now, we return just the bytes that we can, and report the size we returned via a Slice result. This is consistent with non-mmap behavior and also pread() system call.
      
      This diff is taken out of D45123.
      
      Test Plan: make check
      
      Reviewers: sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45645
      16ebe3a2
    • S
      DBIter to out extra keys with higher sequence numbers when changing direction... · d286b5df
      sdong 提交于
      DBIter to out extra keys with higher sequence numbers when changing direction from forward to backward
      
      Summary:
      When DBIter changes iterating direction from forward to backward, it might see some much larger keys with higher sequence ID. With this commit, these rows will be actively filtered out. It should fix existing disabled tests in db_iter_test.
      
      This may not be a perfect fix, but it introduces least impact on existing codes, in order to be safe.
      
      Test Plan:
      Enable existing tests and make sure they pass. Add a new test DBIterWithMergeIterTest.InnerMergeIteratorDataRace8.
      Also run all existing tests.
      
      Reviewers: yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45567
      d286b5df
    • A
      Fix DBTest.GetProperty · 3795449c
      Andres Noetzli 提交于
      Summary:
      DBTest.GetProperty was failing occasionally (see task #8131266). The reason was
      that the test closed the database before the compaction was done. When the test
      reopened the database, RocksDB would schedule a compaction which in turn
      created table readers and lead the test to fail the assertion that
      rocksdb.estimate-table-readers-mem is 0. In most cases, GetIntProperty() of
      rocksdb.estimate-table-readers-mem happened before the compaction created the
      table readers, hiding the problem. This patch changes the
      WaitForFlushMemTable() to WaitForCompact(). WaitForFlushMemTable() is not
      necessary because it is already being called a couple of lines before without
      any insertions in-between.
      
      Test Plan:
      Insert `usleep(10000);` just after `Reopen(options);` on line 2333 to make the issue more likely, then run:
      make db_test && while ./db_test --gtest_filter=DBTest.GetProperty; do true; done
      
      Reviewers: rven, yhchiang, anthony, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45603
      3795449c
  3. 26 8月, 2015 12 次提交
  4. 25 8月, 2015 4 次提交
    • A
      Common base class for transactions · 20d1e547
      agiardullo 提交于
      Summary:
      As I keep adding new features to transactions, I keep creating more duplicate code.  This diff cleans this up by creating a base implementation class for Transaction and OptimisticTransaction to inherit from.
      
      The code in TransactionBase.h/.cc is all just copied from elsewhere.  The only entertaining part of this class worth looking at is the virtual TryLock method which allows OptimisticTransactions and Transactions to share the same common code for Put/Get/etc.
      
      The rest of this diff is mostly red and easy on the eyes.
      
      Test Plan: No functionality change.  existing tests pass.
      
      Reviewers: sdong, jkedgar, rven, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45135
      20d1e547
    • A
      Fixing race condition in DBTest.DynamicMemtableOptions · 20508329
      Andres Noetzli 提交于
      Summary:
      This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases,
      it was possible that the main thread would fill up both memtables before the flush
      job acquired its work. Then, the flush job was flushing both memtables together,
      producing only one L0 file while the test expected two. Now, the test waits for
      flushes to finish earlier, to make sure that the memtables are flushed in separate
      flush jobs.
      
      Test Plan:
      Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush()
      to make the issue more likely. Then test with:
      make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done
      
      Reviewers: rven, sdong, yhchiang, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45429
      20508329
    • I
      Remove an extra 's' from cur-size-all-mem-tabless · e46bcc08
      Igor Canadi 提交于
      Summary: As title
      
      Test Plan: make check
      
      Reviewers: yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45447
      e46bcc08
    • I
      Smarter purging during flush · 4ab26c5a
      Igor Canadi 提交于
      Summary:
      Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
      
      This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
      
      I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
      
      Test Plan:
      make check
      I had to adjust some unit tests to understand this new behavior
      
      Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
      
      Reviewed By: noetzli
      
      Subscribers: yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D42087
      4ab26c5a
  5. 23 8月, 2015 1 次提交
    • M
      Fix benchmark report script · 4c81ac0c
      Mark Callaghan 提交于
      Summary:
      db_bench output now displays Percentile many times with --statistics after
      read IO latency histograms were added. So I only need the last one in the report output.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run run_flash_bench.sh
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D45093
      4c81ac0c
  6. 22 8月, 2015 2 次提交
  7. 21 8月, 2015 10 次提交
    • S
      Add options.new_table_reader_for_compaction_inputs · 9130873a
      sdong 提交于
      Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.
      
      Test Plan: Add the option.
      
      Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang
      
      Reviewed By: igor
      
      Subscribers: igor, MarkCallaghan, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43311
      9130873a
    • S
      Add a counter about estimated pending compaction bytes · 07d2d341
      sdong 提交于
      Summary:
      Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
      In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
      
      Test Plan: Add unit tests
      
      Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D44205
      07d2d341
    • M
      Improve defaults for benchmarks · 41a0e281
      Mark Callaghan 提交于
      Summary:
      Changes include:
      * don't sync-on-commit for single writer thread in readwhile... tests
      * make default block size 8kb rather than 4kb to avoid too small blocks after compression
      * use snappy instead of zlib to avoid stalls from compression latency
      * disable statistics
      * use bytes_per_sync=8M to reduce throughput loss on disk
      * use open_files=-1 to reduce mutex contention
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run benchmark
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: igor
      
      Reviewed By: igor
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D44961
      41a0e281
    • Y
      Fixed a rare deadlock in DBTest.ThreadStatusFlush · a203b913
      Yueh-Hsuan Chiang 提交于
      Summary:
      Currently, ThreadStatusFlush uses two sync-points to ensure
      there's a flush currently running when calling GetThreadList().
      However, one of the sync-point is inside db-mutex, which could
      cause deadlock in case there's a DB::Get() call.
      
      This patch fix this issue by moving the sync-point to a better
      place where the flush job does not hold the mutex.
      
      Test Plan: db_test
      
      Reviewers: igor, sdong, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D45045
      a203b913
    • S
      Merge pull request #695 from yuslepukhin/address_windows_build · 962aa642
      Siying Dong 提交于
      Address windows build issues caused by introducing Subcompaction
      962aa642
    • D
      More indent adjustment. · 5bf89076
      Dmitri Smirnov 提交于
      5bf89076
    • D
      Adjust indent · e2a9f43d
      Dmitri Smirnov 提交于
      e2a9f43d
    • D
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb... · 6e9a260b
      Dmitri Smirnov 提交于
      Merge branch 'address_windows_build' of https://github.com/yuslepukhin/rocksdb into address_windows_build
      6e9a260b
    • D
      Address windows build issues · 1cac89c9
      Dmitri Smirnov 提交于
       Intro SubCompactionState move functionality
       =delete copy functionality
       #ifdef SyncPoint in tests for Windows Release builds
      1cac89c9
    • D
      Address windows build issues · f25f06dd
      Dmitri Smirnov 提交于
        Intro SubCompactionState move functionality
        =delete copy functionality
        #ifdef SyncPoint in tests for Windows Release builds
      f25f06dd