1. 29 11月, 2012 1 次提交
    • A
      Fix all the lint errors. · d29f1819
      Abhishek Kona 提交于
      Summary:
      Scripted and removed all trailing spaces and converted all tabs to
      spaces.
      
      Also fixed other lint errors.
      All lint errors from this point of time should be taken seriously.
      
      Test Plan: make all check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7059
      d29f1819
  2. 27 11月, 2012 1 次提交
    • C
      Fix broken test; some ldb commands can run without a db_ · 6caf3b8e
      Chip Turner 提交于
      Summary:
      It would appear our unit tests make use of code from ldb_cmd,
      and don't always require a valid database handle.  D6855 was not aware
      db_ could sometimes be NULL for such commands, and so it broke
      reduce_levels_test.
      
      This moves the check elsewhere to (at least) fix the 'ldb dump' case of
      segfaulting when it couldn't open a database.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6903
      6caf3b8e
  3. 22 11月, 2012 2 次提交
    • C
      Fix ldb segfault and use static libsnappy for all builds · 879e45eb
      Chip Turner 提交于
      Summary:
      Link statically against snappy, using the gvfs one for facebook
      environments, and the bundled one otherwise.
      
      In addition, fix a few minor segfaults in ldb when it couldn't open the
      database, and update .gitignore to include a few other build artifacts.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6855
      879e45eb
    • D
      Support taking a configurable number of files from the same level to compact... · 7632fdb5
      Dhruba Borthakur 提交于
      Support taking a configurable number of  files from the same level to compact in a single compaction run.
      
      Summary:
      The compaction process takes some files from LevelK and
      merges it into LevelK+1. The number of files it picks from
      LevelK was capped such a way that the total amount of
      data picked does not exceed the maxfilesize of that level.
      This essentially meant that only one file from LevelK
      is picked for a single compaction.
      
      For bulkloads, we would like to take many many file from
      LevelK and compact them using a single compaction run.
      
      This patch introduces a option called the 'source_compaction_factor'
      (similar to expanded_compaction_factor). It is a multiplier
      that is multiplied by the maxfilesize of that level to arrive
      at the limit that is used to throttle the number of source
      files from LevelK.  For bulk loads, set source_compaction_factor
      to a very high number so that multiple files from the same
      level are picked for compaction in a single compaction.
      
      The default value of source_compaction_factor is 1, so that
      we can keep backward compatibilty with existing compaction semantics.
      
      Test Plan: make clean check
      
      Reviewers: emayanke, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6867
      7632fdb5
  4. 21 11月, 2012 1 次提交
  5. 20 11月, 2012 3 次提交
    • A
      Fix LDB dumpwal to print the messages as in the file. · 661dc157
      Abhishek Kona 提交于
      Summary:
      StringStream.clear() does not clear the stream. It sets some flags.
      Who knew? Fixing that is not printing the stuff again and again.
      
      Test Plan: ran it on a local db
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6795
      661dc157
    • A
      LDB can read WAL. · 30742e16
      Abhishek Kona 提交于
      Summary:
      Add option to read WAL and print a summary for each record.
      facebook task => #1885013
      
      E.G. Output :
      ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header
      Sequence,Count,ByteSize
      49981,1,100033
      49981,1,100033
      49982,1,100033
      49981,1,100033
      49982,1,100033
      49983,1,100033
      49981,1,100033
      49982,1,100033
      49983,1,100033
      49984,1,100033
      49981,1,100033
      49982,1,100033
      
      Test Plan:
      Works run
      ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: emayanke, leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D6675
      30742e16
    • A
      Fix LDB dumpwal to print the messages as in the file. · b648401a
      Abhishek Kona 提交于
      Summary:
      StringStream.clear() does not clear the stream. It sets some flags.
      Who knew? Fixing that is not printing the stuff again and again.
      
      Test Plan: ran it on a local db
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6795
      b648401a
  6. 17 11月, 2012 1 次提交
    • A
      LDB can read WAL. · f5cdf931
      Abhishek Kona 提交于
      Summary:
      Add option to read WAL and print a summary for each record.
      facebook task => #1885013
      
      E.G. Output :
      ./ldb dump_wal --walfile=/tmp/leveldbtest-5907/dbbench/026122.log --header
      Sequence,Count,ByteSize
      49981,1,100033
      49981,1,100033
      49982,1,100033
      49981,1,100033
      49982,1,100033
      49983,1,100033
      49981,1,100033
      49982,1,100033
      49983,1,100033
      49984,1,100033
      49981,1,100033
      49982,1,100033
      
      Test Plan:
      Works run
      ./ldb read_wal --wal-file=/tmp/leveldbtest-5907/dbbench/000078.log --header
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: emayanke, leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D6675
      f5cdf931
  7. 14 11月, 2012 1 次提交
  8. 13 11月, 2012 1 次提交
    • H
      Fix test failure of reduce_num_levels · c64796fd
      heyongqiang 提交于
      Summary:
      I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels.
      And that change breaks the assert in reduce_levels_test
      
      Test Plan: run reduce_levels_test
      
      Reviewers: dhruba, MarkCallaghan
      
      Reviewed By: dhruba
      
      CC: emayanke, sheki
      
      Differential Revision: https://reviews.facebook.net/D6651
      c64796fd
  9. 11 11月, 2012 1 次提交
    • D
      Compilation error while compiling with OPT=-g · 9c6c232e
      Dhruba Borthakur 提交于
      Summary:
      make clean check OPT=-g fails
      leveldb::DBStatistics::getTickerCount(leveldb::Tickers)’:
      ./db/db_statistics.h:34: error: ‘MAX_NO_TICKERS’ was not declared in this scope
      util/ldb_cmd.cc:255: warning: left shift count >= width of type
      
      Test Plan:
      make clean check OPT=-g
      
      Reviewers:
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      9c6c232e
  10. 10 11月, 2012 1 次提交
  11. 07 11月, 2012 1 次提交
  12. 06 11月, 2012 2 次提交
    • D
      Ability to invoke application hook for every key during compaction. · 5273c814
      Dhruba Borthakur 提交于
      Summary:
      There are certain use-cases where the application intends to
      delete older keys aftre they have expired a certian time period.
      One option for those applications is to periodically scan the
      entire database and delete appropriate keys.
      
      A better way is to allow the application to hook into the
      compaction process. This patch allows the application to set
      a method callback for every key that is being compacted. If
      this method returns true, then the key is not preserved in
      the output of the compaction.
      
      Test Plan:
      This is mostly to preview the proposed new public api.
      Since it is a public api, please do due diligence on reviewing it.
      
      I will be writing test cases for this api in mynext version of
      this patch.
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: sheki, adsharma
      
      Differential Revision: https://reviews.facebook.net/D6285
      5273c814
    • H
      Add a tool to change number of levels · d55c2ba3
      heyongqiang 提交于
      Summary: as subject.
      
      Test Plan: manually test it, will add a testcase
      
      Reviewers: dhruba, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6345
      d55c2ba3
  13. 03 11月, 2012 1 次提交
  14. 02 11月, 2012 1 次提交
  15. 30 10月, 2012 2 次提交
    • D
      Allow having different compression algorithms on different levels. · 321dfdc3
      Dhruba Borthakur 提交于
      Summary:
      The leveldb API is enhanced to support different compression algorithms at
      different levels.
      
      This adds the option min_level_to_compress to db_bench that specifies
      the minimum level for which compression should be done when
      compression is enabled. This can be used to disable compression for levels
      0 and 1 which are likely to suffer from stalls because of the CPU load
      for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
      gets frequent memtable flushes. Level 1 is special as it frequently
      gets all:all file compactions between it and level 0. But all other levels
      could be the same. For any level N where N > 1, the rate of sequential
      IO for that level should be the same. The last level is the
      exception because it might not be full and because files from it are
      not read to compact with the next larger level.
      
      The same amount of time will be spent doing compaction at any
      level N excluding N=0, 1 or the last level. By this standard all
      of those levels should use the same compression. The difference is that
      the loss (using more disk space) from a faster compression algorithm
      is less significant for N=2 than for N=3. So we might be willing to
      trade disk space for faster write rates with no compression
      for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
      algorithm for the mid levels also allows us to reclaim some cpu
      without trading off much loss in disk space overhead.
      
      Also note that little is to be gained by compressing levels 0 and 1. For
      a 4-level tree they account for 10% of the data. For a 5-level tree they
      account for 1% of the data.
      
      With compression enabled:
      * memtable flush rate is ~18MB/second
      * (L0,L1) compaction rate is ~30MB/second
      
      With compression enabled but min_level_to_compress=2
      * memtable flush rate is ~320MB/second
      * (L0,L1) compaction rate is ~560MB/second
      
      This practicaly takes the same code from https://reviews.facebook.net/D6225
      but makes the leveldb api more general purpose with a few additional
      lines of code.
      
      Test Plan: make check
      
      Differential Revision: https://reviews.facebook.net/D6261
      321dfdc3
    • M
      Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0
      Mark Callaghan 提交于
      Summary:
      Adds a method that returns the score for the next level that most
      needs compaction. That method is then used by db_bench to rate limit threads.
      Threads are put to sleep at the end of each stats interval until the score
      is less than the limit. The limit is set via the --rate_limit=$double option.
      The specified value must be > 1.0. Also adds the option --stats_per_interval
      to enable additional metrics reported every stats interval.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6243
      70c42bf0
  16. 27 10月, 2012 2 次提交
  17. 20 10月, 2012 2 次提交
    • D
      db_bench was not correctly initializing the value for delete_obsolete_files_period_micros option. · cf5adc80
      Dhruba Borthakur 提交于
      Summary:
      The parameter delete_obsolete_files_period_micros controls the
      periodicity of deleting obsolete files. db_bench was reading in
      this parameter intoa local variable called 'l' but was incorrectly
      using another local variable called 'n' while setting it in the
      db.options data structure.
      This patch also logs the value of delete_obsolete_files_period_micros
      in the LOG file at db startup time.
      
      I am hoping that this will improve the overall write throughput drastically.
      
      Test Plan: run db_bench
      
      Reviewers: MarkCallaghan, heyongqiang
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6099
      cf5adc80
    • D
      This is the mega-patch multi-threaded compaction · 1ca05843
      Dhruba Borthakur 提交于
      published in https://reviews.facebook.net/D5997.
      
      Summary:
      This patch allows compaction to occur in multiple background threads
      concurrently.
      
      If a manual compaction is issued, the system falls back to a
      single-compaction-thread model. This is done to ensure correctess
      and simplicity of code. When the manual compaction is finished,
      the system resumes its concurrent-compaction mode automatically.
      
      The updates to the manifest are done via group-commit approach.
      
      Test Plan: run db_bench
      1ca05843
  18. 17 10月, 2012 1 次提交
    • D
      The deletion of obsolete files should not occur very frequently. · aa73538f
      Dhruba Borthakur 提交于
      Summary:
      The method DeleteObsolete files is a very costly methind, especially
      when the number of files in a system is large. It makes a list of
      all live-files and then scans the directory to compute the diff.
      By default, this method is executed after every compaction run.
      
      This patch makes it such that DeleteObsolete files is never
      invoked twice within a configured period.
      
      Test Plan: run all unit tests
      
      Reviewers: heyongqiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6045
      aa73538f
  19. 04 10月, 2012 2 次提交
    • D
      Implement RowLocks for assoc schema · f7975ac7
      Dhruba Borthakur 提交于
      Summary:
      Each assoc is identified by (id1, assocType). This is the rowkey.
      Each row has a read/write rowlock. There is statically allocated array
      of 2000 read/write locks. A rowkey is murmur-hashed to one of the
      read/write locks.
      
      assocPut and assocDelete acquires the rowlock in Write mode.
      The key-updates are done within the rowlock with a atomic nosync
      batch write to leveldb. Then the rowlock is released and
      a write-with-sync is done to sync leveldb transaction log.
      
      Test Plan: added unit test
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5859
      f7975ac7
    • D
      An configurable option to write data using write instead of mmap. · c1006d42
      Dhruba Borthakur 提交于
      Summary:
      We have seen that reading data via the pread call (instead of
      mmap) is much faster on Linux 2.6.x kernels. This patch makes
      an equivalent option to switch off mmaps for the write path
      as well.
      
      db_bench --mmap_write=0 will use write() instead of mmap() to
      write data to a file.
      
      This change is backward compatible, the default
      option is to continue using mmap for writing to a file.
      
      Test Plan: "make check all"
      
      Differential Revision: https://reviews.facebook.net/D5781
      c1006d42
  20. 02 10月, 2012 1 次提交
  21. 30 9月, 2012 1 次提交
  22. 25 9月, 2012 1 次提交
    • D
      The BackupAPI should also list the length of the manifest file. · ae36e509
      Dhruba Borthakur 提交于
      Summary:
      The GetLiveFiles() api lists the set of sst files and the current
      MANIFEST file. But the database continues to append new data to the
      MANIFEST file even when the application is backing it up to the
      backup location. This means that the database-version that is
      stored in the MANIFEST FILE in the backup location
      does not correspond to the sst files returned by GetLiveFiles.
      
      This API adds a new parameter to GetLiveFiles. This new parmeter
      returns the current size of the MANIFEST file.
      
      Test Plan: Unit test attached.
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5631
      ae36e509
  23. 20 9月, 2012 1 次提交
    • D
      Allow a configurable number of background threads. · 9e84834e
      Dhruba Borthakur 提交于
      Summary:
      The background threads are necessary for compaction.
      For slower storage, it might be necessary to have more than
      one compaction thread per DB. This patch allows creating
      a configurable number of worker threads.
      The default reamins at 1 (to maintain backward compatibility).
      
      Test Plan:
      run all unit tests. changes to db-bench coming in
      a separate patch.
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D5559
      9e84834e
  24. 18 9月, 2012 1 次提交
    • H
      add an option to disable seek compaction · a8464ed8
      heyongqiang 提交于
      Summary:
      as subject. This diff should be good for benchmarking.
      
      will send another diff to make it better in the case the seek compaction is enable.
      In that coming diff, will not count a seek if the bloomfilter filters.
      
      Test Plan: build
      
      Reviewers: dhruba, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D5481
      a8464ed8
  25. 17 9月, 2012 1 次提交
  26. 15 9月, 2012 1 次提交
    • M
      Remove use of mmap for random reads · 33323f21
      Mark Callaghan 提交于
      Summary:
      Reads via mmap on concurrent workloads are much slower than pread.
      For example on a 24-core server with storage that can do 100k IOPS or more
      I can get no more than 10k IOPS with mmap reads and 32+ threads.
      
      Test Plan: db_bench benchmarks
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5433
      33323f21
  27. 14 9月, 2012 2 次提交
  28. 13 9月, 2012 1 次提交
  29. 07 9月, 2012 1 次提交
    • H
      put log in a seperate dir · 0f43aa47
      heyongqiang 提交于
      Summary: added a new option db_log_dir, which points the log dir. Inside that dir, in order to make log names unique, the log file name is prefixed with the leveldb data dir absolute path.
      
      Test Plan: db_test
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D5205
      0f43aa47
  30. 30 8月, 2012 2 次提交
    • D
      Clean up compiler warnings generated by -Wall option. · fe936316
      Dhruba Borthakur 提交于
      Summary:
      Clean up compiler warnings generated by -Wall option.
      make clean all OPT=-Wall
      
      This is a pre-requisite before making a new release.
      
      Test Plan: compile and run unit tests
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5019
      fe936316
    • D
      The sharding of the block cache is limited to 2*20 pieces. · e5fe80e4
      Dhruba Borthakur 提交于
      Summary:
      The numbers of shards that the block cache is divided into is
      configurable. However, if the user specifies that he/she wants
      the block cache to be divided into more than 2**20 pieces, then
      the system will rey to allocate a huge array of that size) that
      could fail.
      
      It is better to limit the sharding of the block cache to an
      upper bound. The default sharding is 16 shards (i.e. 2**4)
      and the maximum is now 2 million shards (i.e. 2**20).
      
      Also, fixed a bug with the LRUCache where the numShardBits
      should be a private member of the LRUCache object rather than
      a static variable.
      
      Test Plan:
      run db_bench with --cache_numshardbits=64.
      
      Task ID: #
      
      Blame Rev:
      
      Reviewers: heyongqiang
      
      Reviewed By: heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D5013
      e5fe80e4