1. 11 12月, 2012 2 次提交
  2. 08 12月, 2012 1 次提交
    • A
      GetUpdatesSince API to enable replication. · 80550089
      Abhishek Kona 提交于
      Summary:
      How it works:
      * GetUpdatesSince takes a SequenceNumber.
      * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
      * Seek in the logFile till the requested SeqNumber is found.
      * Return an iterator which contains logic to return record's one by one.
      
      Test Plan:
      * Test case included to check the good code path.
      * Will update with more test-cases.
      * Feedback required on test-cases.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7119
      80550089
  3. 05 12月, 2012 1 次提交
  4. 29 11月, 2012 3 次提交
    • S
      Move WAL files to archive directory, instead of deleting. · d4627e6d
      sheki 提交于
      Summary:
      Create a directory "archive" in the DB directory.
      During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
      instead of deleting.
      
      Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6975
      d4627e6d
    • A
      Fix all the lint errors. · d29f1819
      Abhishek Kona 提交于
      Summary:
      Scripted and removed all trailing spaces and converted all tabs to
      spaces.
      
      Also fixed other lint errors.
      All lint errors from this point of time should be taken seriously.
      
      Test Plan: make all check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7059
      d29f1819
    • D
      Delete non-visible keys during a compaction even in the presense of snapshots. · 9a357847
      Dhruba Borthakur 提交于
      Summary:
       LevelDB should delete almost-new keys when a long-open snapshot exists.
      The previous behavior is to keep all versions that were created after the
      oldest open snapshot. This can lead to database size bloat for
      high-update workloads when there are long-open snapshots and long-open
      snapshot will be used for logical backup. By "almost new" I mean that the
      key was updated more than once after the oldest snapshot.
      
      If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
      we find two instances of the same key k1 that lie entirely within s1 and
      s2 (i.e. s1 < k1 < s2), then the earlier version
      of k1 can be safely deleted because that version is not visible in any snapshot.
      
      Test Plan:
      unit test attached
      make clean check
      
      Differential Revision: https://reviews.facebook.net/D6999
      9a357847
  5. 28 11月, 2012 1 次提交
  6. 27 11月, 2012 1 次提交
  7. 21 11月, 2012 1 次提交
  8. 19 11月, 2012 1 次提交
    • D
      enhance dbstress to simulate hard crash · 62e7583f
      Dhruba Borthakur 提交于
      Summary:
      dbstress has an option to reopen the database. Make it such that the
      previous handle is not closed before we reopen, this simulates a
      situation similar to a process crash.
      
      Added new api to DMImpl to remove the lock file.
      
      Test Plan: run db_stress
      
      Reviewers: emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6777
      62e7583f
  9. 14 11月, 2012 1 次提交
  10. 10 11月, 2012 2 次提交
  11. 08 11月, 2012 2 次提交
    • D
      Move filesize-based-sorting to outside the Mutex · 95dda378
      Dhruba Borthakur 提交于
      Summary:
      When a new version is created, we sort all the files at every
      level based on their size. This is necessary because we want
      to compact the largest file first. The sorting takes quite a
      bit of CPU.
      
      Moved the sorting code to be outside the mutex. Also, the
      earlier code was sorting files at all levels but we do not
      need to sort the highest-number level because those files
      are never the cause of any compaction. To reduce sorting
      costs, we sort only the first few files in each level
      because it is likely that those are the only files in that
      level that will be picked for compaction.
      
      At steady state, I have seen that this patch increase
      throughout from 1500 writes/sec to 1700 writes/sec at the
      end of a 72 hour run. The cpu saving by not sorting the
      last level was not distinctive in this test run because
      there were only 100K files in the highest numbered level.
      I expect the cpu saving to be significant when the number of
      files is much higher.
      
      This is mostly an early preview and not ready for rigorous review.
      
      With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs.
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D6411
      95dda378
    • H
      Add a readonly db · 3fcf533e
      heyongqiang 提交于
      Summary: as subject
      
      Test Plan: run db_bench readrandom
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: MarkCallaghan, emayanke, sheki
      
      Differential Revision: https://reviews.facebook.net/D6495
      3fcf533e
  12. 07 11月, 2012 2 次提交
  13. 06 11月, 2012 1 次提交
    • D
      Ability to invoke application hook for every key during compaction. · 5273c814
      Dhruba Borthakur 提交于
      Summary:
      There are certain use-cases where the application intends to
      delete older keys aftre they have expired a certian time period.
      One option for those applications is to periodically scan the
      entire database and delete appropriate keys.
      
      A better way is to allow the application to hook into the
      compaction process. This patch allows the application to set
      a method callback for every key that is being compacted. If
      this method returns true, then the key is not preserved in
      the output of the compaction.
      
      Test Plan:
      This is mostly to preview the proposed new public api.
      Since it is a public api, please do due diligence on reviewing it.
      
      I will be writing test cases for this api in mynext version of
      this patch.
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: sheki, adsharma
      
      Differential Revision: https://reviews.facebook.net/D6285
      5273c814
  14. 02 11月, 2012 1 次提交
  15. 30 10月, 2012 4 次提交
    • M
      Use timer to measure sleep rather than assume it is 1000 usecs · 3e7e2692
      Mark Callaghan 提交于
      Summary:
      This makes the stall timers in MakeRoomForWrite more accurate by timing
      the sleeps. From looking at the logs the real sleep times are usually
      about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are:
      2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger
      2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench, look at DB/LOG
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6297
      3e7e2692
    • D
      Allow having different compression algorithms on different levels. · 321dfdc3
      Dhruba Borthakur 提交于
      Summary:
      The leveldb API is enhanced to support different compression algorithms at
      different levels.
      
      This adds the option min_level_to_compress to db_bench that specifies
      the minimum level for which compression should be done when
      compression is enabled. This can be used to disable compression for levels
      0 and 1 which are likely to suffer from stalls because of the CPU load
      for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
      gets frequent memtable flushes. Level 1 is special as it frequently
      gets all:all file compactions between it and level 0. But all other levels
      could be the same. For any level N where N > 1, the rate of sequential
      IO for that level should be the same. The last level is the
      exception because it might not be full and because files from it are
      not read to compact with the next larger level.
      
      The same amount of time will be spent doing compaction at any
      level N excluding N=0, 1 or the last level. By this standard all
      of those levels should use the same compression. The difference is that
      the loss (using more disk space) from a faster compression algorithm
      is less significant for N=2 than for N=3. So we might be willing to
      trade disk space for faster write rates with no compression
      for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
      algorithm for the mid levels also allows us to reclaim some cpu
      without trading off much loss in disk space overhead.
      
      Also note that little is to be gained by compressing levels 0 and 1. For
      a 4-level tree they account for 10% of the data. For a 5-level tree they
      account for 1% of the data.
      
      With compression enabled:
      * memtable flush rate is ~18MB/second
      * (L0,L1) compaction rate is ~30MB/second
      
      With compression enabled but min_level_to_compress=2
      * memtable flush rate is ~320MB/second
      * (L0,L1) compaction rate is ~560MB/second
      
      This practicaly takes the same code from https://reviews.facebook.net/D6225
      but makes the leveldb api more general purpose with a few additional
      lines of code.
      
      Test Plan: make check
      
      Differential Revision: https://reviews.facebook.net/D6261
      321dfdc3
    • M
      Add more rates to db_bench output · acc8567b
      Mark Callaghan 提交于
      Summary:
      Adds the "MB/sec in" and "MB/sec out" to this line:
      Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out
      
      Changes all values to be reported per interval and since test start for this line:
      ... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6291
      acc8567b
    • M
      Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0
      Mark Callaghan 提交于
      Summary:
      Adds a method that returns the score for the next level that most
      needs compaction. That method is then used by db_bench to rate limit threads.
      Threads are put to sleep at the end of each stats interval until the score
      is less than the limit. The limit is set via the --rate_limit=$double option.
      The specified value must be > 1.0. Also adds the option --stats_per_interval
      to enable additional metrics reported every stats interval.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6243
      70c42bf0
  16. 27 10月, 2012 2 次提交
  17. 25 10月, 2012 1 次提交
    • M
      Improve statistics · e7206f43
      Mark Callaghan 提交于
      Summary:
      This adds more statistics to be reported by GetProperty("leveldb.stats").
      The new stats include time spent waiting on stalls in MakeRoomForWrite.
      This also includes the total amplification rate where that is:
          (#bytes of sequential IO during compaction) / (#bytes from Put)
      This also includes a lot more data for the per-level compaction report.
      * Rn(MB) - MB read from level N during compaction between levels N and N+1
      * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
      * Wnew(MB) - new data written to the level during compaction
      * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
      * Rn - files read from level N during compaction between levels N and N+1
      * Rnp1 - files read from level N+1 during compaction between levels N and N+1
      * Wnp1 - files written to level N+1 during compaction between levels N and N+1
      * NewW - new files written to level N+1 during compaction
      * Count - number of compactions done for this level
      
      This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)
      
                                     Compactions
      Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
      -------------------------------------------------------------------------------------------------------------------------------------
        0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
        1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
        2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
      Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
      Uptime(secs): 439.8
      Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)
      
      Reviewers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6153
      e7206f43
  18. 23 10月, 2012 1 次提交
  19. 21 10月, 2012 2 次提交
  20. 20 10月, 2012 1 次提交
    • D
      This is the mega-patch multi-threaded compaction · 1ca05843
      Dhruba Borthakur 提交于
      published in https://reviews.facebook.net/D5997.
      
      Summary:
      This patch allows compaction to occur in multiple background threads
      concurrently.
      
      If a manual compaction is issued, the system falls back to a
      single-compaction-thread model. This is done to ensure correctess
      and simplicity of code. When the manual compaction is finished,
      the system resumes its concurrent-compaction mode automatically.
      
      The updates to the manifest are done via group-commit approach.
      
      Test Plan: run db_bench
      1ca05843
  21. 17 10月, 2012 1 次提交
    • D
      The deletion of obsolete files should not occur very frequently. · aa73538f
      Dhruba Borthakur 提交于
      Summary:
      The method DeleteObsolete files is a very costly methind, especially
      when the number of files in a system is large. It makes a list of
      all live-files and then scans the directory to compute the diff.
      By default, this method is executed after every compaction run.
      
      This patch makes it such that DeleteObsolete files is never
      invoked twice within a configured period.
      
      Test Plan: run all unit tests
      
      Reviewers: heyongqiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6045
      aa73538f
  22. 26 9月, 2012 1 次提交
  23. 22 9月, 2012 1 次提交
    • D
      Segfault in DoCompactionWork caused by buffer overflow · bb2dcd24
      Dhruba Borthakur 提交于
      Summary:
      The code was allocating 200 bytes on the stack but it
      writes 256 bytes into the array.
      
      x8a8ea5 std::_Rb_tree<>::erase()
          @     0x7f134bee7eb0 (unknown)
          @           0x8a8ea5 std::_Rb_tree<>::erase()
          @           0x8a35d6 leveldb::DBImpl::CleanupCompaction()
          @           0x8a7810 leveldb::DBImpl::BackgroundCompaction()
          @           0x8a804d leveldb::DBImpl::BackgroundCall()
          @           0x8c4eff leveldb::(anonymous namespace)::PosixEnv::BGThreadWrapper()
          @     0x7f134b3c010d start_thread
          @     0x7f134bf9f10d clone
      
      Test Plan: run db_bench with overwrite option
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5595
      bb2dcd24
  24. 19 9月, 2012 1 次提交
  25. 18 9月, 2012 1 次提交
  26. 13 9月, 2012 1 次提交
  27. 07 9月, 2012 1 次提交
    • H
      put log in a seperate dir · 0f43aa47
      heyongqiang 提交于
      Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.
      
      Test Plan: db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D5205
      0f43aa47
  28. 30 8月, 2012 1 次提交
  29. 29 8月, 2012 1 次提交
    • H
      merge 1.5 · a4f9b8b4
      heyongqiang 提交于
      Summary:
      
      as subject
      
      Test Plan:
      
      db_test table_test
      
      Reviewers: dhruba
      a4f9b8b4