1. 24 1月, 2013 1 次提交
    • C
      Fix a number of object lifetime/ownership issues · 2fdf91a4
      Chip Turner 提交于
      Summary:
      Replace manual memory management with std::unique_ptr in a
      number of places; not exhaustive, but this fixes a few leaks with file
      handles as well as clarifies semantics of the ownership of file handles
      with log classes.
      
      Test Plan: db_stress, make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: zshao, leveldb, heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D8043
      2fdf91a4
  2. 18 1月, 2013 2 次提交
    • A
      Add counters to count gets and writes · 16903c35
      Abhishek Kona 提交于
      Summary: Add Tickers to count Write's and Get's
      
      Test Plan: make check
      
      Reviewers: dhruba, chip
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7977
      16903c35
    • K
      Fixed issues Valgrind found. · 3c3df740
      Kosie van der Merwe 提交于
      Summary:
      Found issues with `db_test` and `db_stress` when running valgrind.
      
      `DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards).
      
      Test Plan:
      Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared.
      
      Ran `make check` to see if there were no regressions.
      
      Reviewers: vamsi, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8001
      3c3df740
  3. 17 1月, 2013 1 次提交
    • A
      rollover manifest file. · 7d5a4383
      Abhishek Kona 提交于
      Summary:
      Check in LogAndApply if the file size is more than the limit set in
      Options.
      Things to consider : will this be expensive?
      
      Test Plan: make all check. Inputs on a new unit test?
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7701
      7d5a4383
  4. 15 1月, 2013 1 次提交
    • C
      Various build cleanups/improvements · c0cb289d
      Chip Turner 提交于
      Summary:
      Specific changes:
      
      1) Turn on -Werror so all warnings are errors
      2) Fix some warnings the above now complains about
      3) Add proper dependency support so changing a .h file forces a .c file
      to rebuild
      4) Automatically use fbcode gcc on any internal machine rather than
      whatever system compiler is laying around
      5) Fix jemalloc to once again be used in the builds (seemed like it
      wasn't being?)
      6) Fix issue where 'git' would fail in build_detect_version because of
      LD_LIBRARY_PATH being set in the third-party build system
      
      Test Plan:
      make, make check, make clean, touch a header file, make sure
      rebuild is expected
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D7887
      c0cb289d
  5. 09 1月, 2013 1 次提交
  6. 08 1月, 2013 1 次提交
    • K
      Added clearer error message for failure to create db directory in DBImpl::Recover() · d6e873f2
      Kosie van der Merwe 提交于
      Summary:
      Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error.
      
      Fixed CreateDirIfMissing() and added Env.DirExists()
      
      Test Plan:
      make check to test for regessions
      
      Ran the following to test if the error message is not about lock files not existing
      ./db_bench --db=dir/testdb
      
      After creating a file "testdb", ran the following to see if it failed with sane error message:
      ./db_bench --db=testdb
      
      Reviewers: dhruba, emayanke, vamsi, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7707
      d6e873f2
  7. 20 12月, 2012 1 次提交
    • D
      Enhance ReadOnly mode to process the all committed transactions. · f4c2b7cf
      Dhruba Borthakur 提交于
      Summary:
      Leveldb has an api OpenForReadOnly() that opens the database
      in readonly mode. This call had an option to not process the
      transaction log.  This patch removes this option and always
      processes all transactions that had been committed. It has
      been done in such a way that it does not create/write to
      any new files in the process. The invariant of "no-writes"
      to the leveldb data directory is still true.
      
      This enhancement allows multiple threads to open the same database
      in readonly mode and access all trancations that were committed right
      upto the OpenForReadOnly call.
      
      I changed the public API to match the new semantics because
      there are no users who are currently using this api.
      
      Test Plan: make clean check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7479
      f4c2b7cf
  8. 18 12月, 2012 3 次提交
    • D
      Enhancements to rocksdb for better support for replication. · 3d1e92b0
      Dhruba Borthakur 提交于
      Summary:
      1. The OpenForReadOnly() call should not lock the db. This is useful
      so that multiple processes can open the same database concurrently
      for reading.
      2. GetUpdatesSince should not error out if the archive directory
      does not exist.
      3. A new constructor for WriteBatch that can takes a serialized
      string as a parameter of the constructor.
      
      Test Plan: make clean check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7449
      3d1e92b0
    • K
      Added meta-database support. · 62d48571
      Kosie van der Merwe 提交于
      Summary:
      Added kMetaDatabase for meta-databases in db/filename.h along with supporting
      fuctions.
      Fixed switch in DBImpl so that it also handles kMetaDatabase.
      Fixed DestroyDB() that it can handle destroying meta-databases.
      
      Test Plan: make check
      
      Reviewers: sheki, emayanke, vamsi, dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D7245
      62d48571
    • A
      Fix a bug. Where DestroyDB deletes a non-existant archive directory. · 2f0585fb
      Abhishek Kona 提交于
      Summary:
      C tests would fail sometimes as DestroyDB would return a Failure Status
      message when deleting an archival directory which was not created
      (WAL_ttl_seconds = 0).
      
      Fix: Ignore the Status returned on Deleting Archival Directory.
      
      Test Plan: * make check
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7395
      2f0585fb
  9. 12 12月, 2012 1 次提交
  10. 11 12月, 2012 2 次提交
  11. 08 12月, 2012 1 次提交
    • A
      GetUpdatesSince API to enable replication. · 80550089
      Abhishek Kona 提交于
      Summary:
      How it works:
      * GetUpdatesSince takes a SequenceNumber.
      * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
      * Seek in the logFile till the requested SeqNumber is found.
      * Return an iterator which contains logic to return record's one by one.
      
      Test Plan:
      * Test case included to check the good code path.
      * Will update with more test-cases.
      * Feedback required on test-cases.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7119
      80550089
  12. 05 12月, 2012 1 次提交
  13. 29 11月, 2012 3 次提交
    • S
      Move WAL files to archive directory, instead of deleting. · d4627e6d
      sheki 提交于
      Summary:
      Create a directory "archive" in the DB directory.
      During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
      instead of deleting.
      
      Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6975
      d4627e6d
    • A
      Fix all the lint errors. · d29f1819
      Abhishek Kona 提交于
      Summary:
      Scripted and removed all trailing spaces and converted all tabs to
      spaces.
      
      Also fixed other lint errors.
      All lint errors from this point of time should be taken seriously.
      
      Test Plan: make all check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7059
      d29f1819
    • D
      Delete non-visible keys during a compaction even in the presense of snapshots. · 9a357847
      Dhruba Borthakur 提交于
      Summary:
       LevelDB should delete almost-new keys when a long-open snapshot exists.
      The previous behavior is to keep all versions that were created after the
      oldest open snapshot. This can lead to database size bloat for
      high-update workloads when there are long-open snapshots and long-open
      snapshot will be used for logical backup. By "almost new" I mean that the
      key was updated more than once after the oldest snapshot.
      
      If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
      we find two instances of the same key k1 that lie entirely within s1 and
      s2 (i.e. s1 < k1 < s2), then the earlier version
      of k1 can be safely deleted because that version is not visible in any snapshot.
      
      Test Plan:
      unit test attached
      make clean check
      
      Differential Revision: https://reviews.facebook.net/D6999
      9a357847
  14. 28 11月, 2012 1 次提交
  15. 27 11月, 2012 1 次提交
  16. 21 11月, 2012 1 次提交
  17. 19 11月, 2012 1 次提交
    • D
      enhance dbstress to simulate hard crash · 62e7583f
      Dhruba Borthakur 提交于
      Summary:
      dbstress has an option to reopen the database. Make it such that the
      previous handle is not closed before we reopen, this simulates a
      situation similar to a process crash.
      
      Added new api to DMImpl to remove the lock file.
      
      Test Plan: run db_stress
      
      Reviewers: emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6777
      62e7583f
  18. 14 11月, 2012 1 次提交
  19. 10 11月, 2012 2 次提交
  20. 08 11月, 2012 2 次提交
    • D
      Move filesize-based-sorting to outside the Mutex · 95dda378
      Dhruba Borthakur 提交于
      Summary:
      When a new version is created, we sort all the files at every
      level based on their size. This is necessary because we want
      to compact the largest file first. The sorting takes quite a
      bit of CPU.
      
      Moved the sorting code to be outside the mutex. Also, the
      earlier code was sorting files at all levels but we do not
      need to sort the highest-number level because those files
      are never the cause of any compaction. To reduce sorting
      costs, we sort only the first few files in each level
      because it is likely that those are the only files in that
      level that will be picked for compaction.
      
      At steady state, I have seen that this patch increase
      throughout from 1500 writes/sec to 1700 writes/sec at the
      end of a 72 hour run. The cpu saving by not sorting the
      last level was not distinctive in this test run because
      there were only 100K files in the highest numbered level.
      I expect the cpu saving to be significant when the number of
      files is much higher.
      
      This is mostly an early preview and not ready for rigorous review.
      
      With this patch, the writs/sec is now bottlenecked not by the sorting code but by GetOverlappingInputs. I am working on a patch to optimize GetOverlappingInputs.
      
      Test Plan: make check
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D6411
      95dda378
    • H
      Add a readonly db · 3fcf533e
      heyongqiang 提交于
      Summary: as subject
      
      Test Plan: run db_bench readrandom
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: MarkCallaghan, emayanke, sheki
      
      Differential Revision: https://reviews.facebook.net/D6495
      3fcf533e
  21. 07 11月, 2012 2 次提交
  22. 06 11月, 2012 1 次提交
    • D
      Ability to invoke application hook for every key during compaction. · 5273c814
      Dhruba Borthakur 提交于
      Summary:
      There are certain use-cases where the application intends to
      delete older keys aftre they have expired a certian time period.
      One option for those applications is to periodically scan the
      entire database and delete appropriate keys.
      
      A better way is to allow the application to hook into the
      compaction process. This patch allows the application to set
      a method callback for every key that is being compacted. If
      this method returns true, then the key is not preserved in
      the output of the compaction.
      
      Test Plan:
      This is mostly to preview the proposed new public api.
      Since it is a public api, please do due diligence on reviewing it.
      
      I will be writing test cases for this api in mynext version of
      this patch.
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: sheki, adsharma
      
      Differential Revision: https://reviews.facebook.net/D6285
      5273c814
  23. 02 11月, 2012 1 次提交
  24. 30 10月, 2012 4 次提交
    • M
      Use timer to measure sleep rather than assume it is 1000 usecs · 3e7e2692
      Mark Callaghan 提交于
      Summary:
      This makes the stall timers in MakeRoomForWrite more accurate by timing
      the sleeps. From looking at the logs the real sleep times are usually
      about 2000 usecs each when SleepForMicros(1000) is called. The modified LOG messages are:
      2012/10/29-12:06:33.271984 2b3cc872f700 delaying write 13 usecs for level0_slowdown_writes_trigger
      2012/10/29-12:06:34.688939 2b3cc872f700 delaying write 1728 usecs for rate limits with max score 3.83
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench, look at DB/LOG
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6297
      3e7e2692
    • D
      Allow having different compression algorithms on different levels. · 321dfdc3
      Dhruba Borthakur 提交于
      Summary:
      The leveldb API is enhanced to support different compression algorithms at
      different levels.
      
      This adds the option min_level_to_compress to db_bench that specifies
      the minimum level for which compression should be done when
      compression is enabled. This can be used to disable compression for levels
      0 and 1 which are likely to suffer from stalls because of the CPU load
      for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
      gets frequent memtable flushes. Level 1 is special as it frequently
      gets all:all file compactions between it and level 0. But all other levels
      could be the same. For any level N where N > 1, the rate of sequential
      IO for that level should be the same. The last level is the
      exception because it might not be full and because files from it are
      not read to compact with the next larger level.
      
      The same amount of time will be spent doing compaction at any
      level N excluding N=0, 1 or the last level. By this standard all
      of those levels should use the same compression. The difference is that
      the loss (using more disk space) from a faster compression algorithm
      is less significant for N=2 than for N=3. So we might be willing to
      trade disk space for faster write rates with no compression
      for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
      algorithm for the mid levels also allows us to reclaim some cpu
      without trading off much loss in disk space overhead.
      
      Also note that little is to be gained by compressing levels 0 and 1. For
      a 4-level tree they account for 10% of the data. For a 5-level tree they
      account for 1% of the data.
      
      With compression enabled:
      * memtable flush rate is ~18MB/second
      * (L0,L1) compaction rate is ~30MB/second
      
      With compression enabled but min_level_to_compress=2
      * memtable flush rate is ~320MB/second
      * (L0,L1) compaction rate is ~560MB/second
      
      This practicaly takes the same code from https://reviews.facebook.net/D6225
      but makes the leveldb api more general purpose with a few additional
      lines of code.
      
      Test Plan: make check
      
      Differential Revision: https://reviews.facebook.net/D6261
      321dfdc3
    • M
      Add more rates to db_bench output · acc8567b
      Mark Callaghan 提交于
      Summary:
      Adds the "MB/sec in" and "MB/sec out" to this line:
      Amplification: 1.7 rate, 0.01 GB in, 0.02 GB out, 8.24 MB/sec in, 13.75 MB/sec out
      
      Changes all values to be reported per interval and since test start for this line:
      ... thread 0: (10000,60000) ops and (19155.6,27307.5) ops/second in (0.522041,2.197198) seconds
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6291
      acc8567b
    • M
      Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0
      Mark Callaghan 提交于
      Summary:
      Adds a method that returns the score for the next level that most
      needs compaction. That method is then used by db_bench to rate limit threads.
      Threads are put to sleep at the end of each stats interval until the score
      is less than the limit. The limit is set via the --rate_limit=$double option.
      The specified value must be > 1.0. Also adds the option --stats_per_interval
      to enable additional metrics reported every stats interval.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6243
      70c42bf0
  25. 27 10月, 2012 2 次提交
  26. 25 10月, 2012 1 次提交
    • M
      Improve statistics · e7206f43
      Mark Callaghan 提交于
      Summary:
      This adds more statistics to be reported by GetProperty("leveldb.stats").
      The new stats include time spent waiting on stalls in MakeRoomForWrite.
      This also includes the total amplification rate where that is:
          (#bytes of sequential IO during compaction) / (#bytes from Put)
      This also includes a lot more data for the per-level compaction report.
      * Rn(MB) - MB read from level N during compaction between levels N and N+1
      * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
      * Wnew(MB) - new data written to the level during compaction
      * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
      * Rn - files read from level N during compaction between levels N and N+1
      * Rnp1 - files read from level N+1 during compaction between levels N and N+1
      * Wnp1 - files written to level N+1 during compaction between levels N and N+1
      * NewW - new files written to level N+1 during compaction
      * Count - number of compactions done for this level
      
      This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)
      
                                     Compactions
      Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
      -------------------------------------------------------------------------------------------------------------------------------------
        0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
        1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
        2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
      Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
      Uptime(secs): 439.8
      Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)
      
      Reviewers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6153
      e7206f43
  27. 23 10月, 2012 1 次提交