1. 02 11月, 2012 1 次提交
  2. 30 10月, 2012 4 次提交
    • M
      Use timer to measure sleep rather than assume it is 1000 usecs · 3e7e2692
      Mark Callaghan 提交于
      Summary:
      This makes the stall timers in MakeRoomForWrite more accurate by timing
      the sleeps. From looking at the logs the real sleep times are usually
      about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are:
      2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger
      2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench, look at DB/LOG
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6297
      3e7e2692
    • D
      Allow having different compression algorithms on different levels. · 321dfdc3
      Dhruba Borthakur 提交于
      Summary:
      The leveldb API is enhanced to support different compression algorithms at
      different levels.
      
      This adds the option min_level_to_compress to db_bench that specifies
      the minimum level for which compression should be done when
      compression is enabled. This can be used to disable compression for levels
      0 and 1 which are likely to suffer from stalls because of the CPU load
      for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
      gets frequent memtable flushes. Level 1 is special as it frequently
      gets all:all file compactions between it and level 0. But all other levels
      could be the same. For any level N where N > 1, the rate of sequential
      IO for that level should be the same. The last level is the
      exception because it might not be full and because files from it are
      not read to compact with the next larger level.
      
      The same amount of time will be spent doing compaction at any
      level N excluding N=0, 1 or the last level. By this standard all
      of those levels should use the same compression. The difference is that
      the loss (using more disk space) from a faster compression algorithm
      is less significant for N=2 than for N=3. So we might be willing to
      trade disk space for faster write rates with no compression
      for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
      algorithm for the mid levels also allows us to reclaim some cpu
      without trading off much loss in disk space overhead.
      
      Also note that little is to be gained by compressing levels 0 and 1. For
      a 4-level tree they account for 10% of the data. For a 5-level tree they
      account for 1% of the data.
      
      With compression enabled:
      * memtable flush rate is ~18MB/second
      * (L0,L1) compaction rate is ~30MB/second
      
      With compression enabled but min_level_to_compress=2
      * memtable flush rate is ~320MB/second
      * (L0,L1) compaction rate is ~560MB/second
      
      This practicaly takes the same code from https://reviews.facebook.net/D6225
      but makes the leveldb api more general purpose with a few additional
      lines of code.
      
      Test Plan: make check
      
      Differential Revision: https://reviews.facebook.net/D6261
      321dfdc3
    • M
      Add more rates to db_bench output · acc8567b
      Mark Callaghan 提交于
      Summary:
      Adds the "MB/sec in" and "MB/sec out" to this line:
      Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out
      
      Changes all values to be reported per interval and since test start for this line:
      ... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6291
      acc8567b
    • M
      Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0
      Mark Callaghan 提交于
      Summary:
      Adds a method that returns the score for the next level that most
      needs compaction. That method is then used by db_bench to rate limit threads.
      Threads are put to sleep at the end of each stats interval until the score
      is less than the limit. The limit is set via the --rate_limit=$double option.
      The specified value must be > 1.0. Also adds the option --stats_per_interval
      to enable additional metrics reported every stats interval.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6243
      70c42bf0
  3. 27 10月, 2012 2 次提交
  4. 25 10月, 2012 1 次提交
    • M
      Improve statistics · e7206f43
      Mark Callaghan 提交于
      Summary:
      This adds more statistics to be reported by GetProperty("leveldb.stats").
      The new stats include time spent waiting on stalls in MakeRoomForWrite.
      This also includes the total amplification rate where that is:
          (#bytes of sequential IO during compaction) / (#bytes from Put)
      This also includes a lot more data for the per-level compaction report.
      * Rn(MB) - MB read from level N during compaction between levels N and N+1
      * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
      * Wnew(MB) - new data written to the level during compaction
      * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
      * Rn - files read from level N during compaction between levels N and N+1
      * Rnp1 - files read from level N+1 during compaction between levels N and N+1
      * Wnp1 - files written to level N+1 during compaction between levels N and N+1
      * NewW - new files written to level N+1 during compaction
      * Count - number of compactions done for this level
      
      This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)
      
                                     Compactions
      Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
      -------------------------------------------------------------------------------------------------------------------------------------
        0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
        1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
        2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
      Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
      Uptime(secs): 439.8
      Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)
      
      Reviewers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6153
      e7206f43
  5. 23 10月, 2012 1 次提交
  6. 17 10月, 2012 1 次提交
    • D
      The deletion of obsolete files should not occur very frequently. · aa73538f
      Dhruba Borthakur 提交于
      Summary:
      The method DeleteObsolete files is a very costly methind, especially
      when the number of files in a system is large. It makes a list of
      all live-files and then scans the directory to compute the diff.
      By default, this method is executed after every compaction run.
      
      This patch makes it such that DeleteObsolete files is never
      invoked twice within a configured period.
      
      Test Plan: run all unit tests
      
      Reviewers: heyongqiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6045
      aa73538f
  7. 26 9月, 2012 1 次提交
  8. 22 9月, 2012 1 次提交
    • D
      Segfault in DoCompactionWork caused by buffer overflow · bb2dcd24
      Dhruba Borthakur 提交于
      Summary:
      The code was allocating 200 bytes on the stack but it
      writes 256 bytes into the array.
      
      x8a8ea5 std::_Rb_tree<>::erase()
          @     0x7f134bee7eb0 (unknown)
          @           0x8a8ea5 std::_Rb_tree<>::erase()
          @           0x8a35d6 leveldb::DBImpl::CleanupCompaction()
          @           0x8a7810 leveldb::DBImpl::BackgroundCompaction()
          @           0x8a804d leveldb::DBImpl::BackgroundCall()
          @           0x8c4eff leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper()
          @     0x7f134b3c010d start_thread
          @     0x7f134bf9f10d clone
      
      Test Plan: run db_bench with overwrite option
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5595
      bb2dcd24
  9. 19 9月, 2012 1 次提交
  10. 18 9月, 2012 1 次提交
  11. 13 9月, 2012 1 次提交
  12. 07 9月, 2012 1 次提交
    • H
      put log in a seperate dir · 0f43aa47
      heyongqiang 提交于
      Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.
      
      Test Plan: db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D5205
      0f43aa47
  13. 30 8月, 2012 1 次提交
  14. 29 8月, 2012 3 次提交
  15. 28 8月, 2012 1 次提交
    • D
      Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync). · fc20273e
      Dhruba Borthakur 提交于
      Summary:
      Introduce a new method Env->Fsync() that issues fsync (instead of fdatasync).
      This is needed for data durability when running on ext3 filesystems.
      Added options to the benchmark db_bench to generate performance numbers
      with either fsync or fdatasync enabled.
      
      Cleaned up Makefile to build leveldb_shell only when building the thrift
      leveldb server.
      
      Test Plan: build and run benchmark
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D4911
      fc20273e
  16. 25 8月, 2012 1 次提交
    • H
      add more logs · 1c99b0a6
      heyongqiang 提交于
      Summary:
      
      as subject
      
      Test Plan:db_test
      
      Reviewers: dhruba
      1c99b0a6
  17. 23 8月, 2012 1 次提交
  18. 22 8月, 2012 3 次提交
  19. 18 8月, 2012 2 次提交
  20. 04 8月, 2012 1 次提交
  21. 07 7月, 2012 1 次提交
  22. 06 7月, 2012 1 次提交
  23. 28 6月, 2012 1 次提交
  24. 02 6月, 2012 1 次提交
  25. 17 4月, 2012 1 次提交
    • S
      Added bloom filter support. · 85584d49
      Sanjay Ghemawat 提交于
      In particular, we add a new FilterPolicy class.  An instance
      of this class can be supplied in Options when opening a
      database.  If supplied, the instance is used to generate
      summaries of keys (e.g., a bloom filter) which are placed in
      sstables.  These summaries are consulted by DB::Get() so we
      can avoid reading sstable blocks that are guaranteed to not
      contain the key we are looking for.
      
      This change provides one implementation of FilterPolicy
      based on bloom filters.
      
      Other changes:
      - Updated version number to 1.4.
      - Some build tweaks.
      - C binding for CompactRange.
      - A few more benchmarks: deleteseq, deleterandom, readmissing, seekrandom.
      - Minor .gitignore update.
      85584d49
  26. 09 3月, 2012 2 次提交
  27. 26 1月, 2012 1 次提交
  28. 01 11月, 2011 1 次提交
    • H
      A number of fixes: · 36a5f8ed
      Hans Wennborg 提交于
      - Replace raw slice comparison with a call to user comparator.
        Added test for custom comparators.
      
      - Fix end of namespace comments.
      
      - Fixed bug in picking inputs for a level-0 compaction.
      
        When finding overlapping files, the covered range may expand
        as files are added to the input set.  We now correctly expand
        the range when this happens instead of continuing to use the
        old range.  For example, suppose L0 contains files with the
        following ranges:
      
            F1: a .. d
            F2:    c .. g
            F3:       f .. j
      
        and the initial compaction target is F3.  We used to search
        for range f..j which yielded {F2,F3}.  However we now expand
        the range as soon as another file is added.  In this case,
        when F2 is added, we expand the range to c..j and restart the
        search.  That picks up file F1 as well.
      
        This change fixes a bug related to deleted keys showing up
        incorrectly after a compaction as described in Issue 44.
      
      (Sync with upstream @25072954)
      36a5f8ed
  29. 06 10月, 2011 1 次提交
    • G
      A number of bugfixes: · 299ccedf
      Gabor Cselle 提交于
      - Added DB::CompactRange() method.
      
        Changed manual compaction code so it breaks up compactions of
        big ranges into smaller compactions.
      
        Changed the code that pushes the output of memtable compactions
        to higher levels to obey the grandparent constraint: i.e., we
        must never have a single file in level L that overlaps too
        much data in level L+1 (to avoid very expensive L-1 compactions).
      
        Added code to pretty-print internal keys.
      
      - Fixed bug where we would not detect overlap with files in
        level-0 because we were incorrectly using binary search
        on an array of files with overlapping ranges.
      
        Added "leveldb.sstables" property that can be used to dump
        all of the sstables and ranges that make up the db state.
      
      - Removing post_write_snapshot support.  Email to leveldb mailing
        list brought up no users, just confusion from one person about
        what it meant.
      
      - Fixing static_cast char to unsigned on BIG_ENDIAN platforms.
      
        Fixes	Issue 35 and Issue 36.
      
      - Comment clarification to address leveldb Issue 37.
      
      - Change license in posix_logger.h to match other files.
      
      - A build problem where uint32 was used instead of uint32_t.
      
      Sync with upstream @24408625
      299ccedf
  30. 02 9月, 2011 1 次提交
    • G
      Bugfixes: for Get(), don't hold mutex while writing log. · 72630236
      gabor@google.com 提交于
      - Fix bug in Get: when it triggers a compaction, it could sometimes
        mark the compaction with the wrong level (if there was a gap
        in the set of levels examined for the Get).
      
      - Do not hold mutex while writing to the log file or to the
        MANIFEST file.
      
        Added a new benchmark that runs a writer thread concurrently with
        reader threads.
      
        Percentiles
        ------------------------------
        micros/op: avg  median 99   99.9  99.99  99.999 max
        ------------------------------------------------------
        before:    42   38     110  225   32000  42000  48000
        after:     24   20     55   65    130    1100   7000
      
      - Fixed race in optimized Get.  It should have been using the
        pinned memtables, not the current memtables.
      
      
      
      git-svn-id: https://leveldb.googlecode.com/svn/trunk@50 62dab493-f737-651d-591e-8d6aee1b9529
      72630236