1. 24 7月, 2013 1 次提交
    • M
      Use KeyMayExist for WriteBatch-Deletes · bf66c10b
      Mayank Agarwal 提交于
      Summary:
      Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
      Added code to skip getting Table from disk if not already present in table_cache.
      Some renaming of variables.
      Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
      Changed KeyMayExist to not be pure virtual and provided a default implementation.
      Expanded unit-tests in db_test to check appropriately.
      Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
      
      Test Plan: db_stress;make check
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D11745
      bf66c10b
  2. 20 7月, 2013 1 次提交
  3. 12 7月, 2013 1 次提交
    • M
      Make rocksdb-deletes faster using bloom filter · 2a986919
      Mayank Agarwal 提交于
      Summary:
      Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
      1. Put of delete type
      2. Space in the db,and
      3. Compaction time
      
      Test Plan:
      make all check;
      will run db_stress and db_bench and enhance unit-test once the basic design gets approved
      
      Reviewers: dhruba, haobo, vamsi
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11607
      2a986919
  4. 19 6月, 2013 2 次提交
  5. 13 6月, 2013 1 次提交
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  6. 06 6月, 2013 1 次提交
    • D
      Very basic Multiget and simple test cases. · d8c7c45e
      Deon Nicholas 提交于
      Summary:
      Implemented the MultiGet operator which takes in a list of keys
      and returns their associated values. Currently uses std::vector as its
      container data structure. Otherwise, it works identically to "Get".
      
      Test Plan:
       1. make db_test      ; compile it
       2. ./db_test         ; test it
       3. make all check    ; regress / run all tests
       4. make release      ; (optional) compile with release settings
      
      Reviewers: haobo, MarkCallaghan, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10875
      d8c7c45e
  7. 04 6月, 2013 1 次提交
    • M
      Improve output for GetProperty('leveldb.stats') · d9f538e1
      Mark Callaghan 提交于
      Summary:
      Display separate values for read, write & total compaction IO.
      Display compaction amplification and write amplification.
      Add similar values for the period since the last call to GetProperty. Results since the server started
      are reported as "cumulative" stats. Results since the last call to GetProperty are reported as
      "interval" stats.
      
      Level  Files Size(MB) Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count  Ln-stall
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        7       13        21         0       211         0         0       211     0.0       0.0        10.1        0        0        0        0      113       0.0
        1       79      157        88       993       989       198       795       194     9.0      11.3        11.2      106      405      502       97       14       0.0
        2       19       36         5        63        63        37        27        36     2.4      12.3        12.2       19       14       32       18       12       0.0
      >>>>>>>>>>>>>>>>>>>>>>>>> text below has been is new and/or reformatted
      Uptime(secs): 122.2 total, 0.9 interval
      Compaction IO cumulative (GB): 0.21 new, 1.03 read, 1.23 write, 2.26 read+write
      Compaction IO cumulative (MB/sec): 1.7 new, 8.6 read, 10.3 write, 19.0 read+write
      Amplification cumulative: 6.0 write, 11.0 compaction
      Compaction IO interval (MB): 5.59 new, 0.00 read, 5.59 write, 5.59 read+write
      Compaction IO interval (MB/sec): 6.5 new, 0.0 read, 6.5 write, 6.5 read+write
      Amplification interval: 1.0 write, 1.0 compaction
      >>>>>>>>>>>>>>>>>>>>>>>> text above is new and/or reformatted
      Stalls(secs): 90.574 level0_slowdown, 0.000 level0_numfiles, 10.165 memtable_compaction, 0.000 leveln_slowdown
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      make check, run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11049
      d9f538e1
  8. 30 5月, 2013 1 次提交
  9. 25 5月, 2013 1 次提交
  10. 24 5月, 2013 1 次提交
  11. 07 5月, 2013 1 次提交
    • A
      [RocksDB] Clear Archive WAL files · 988c20b9
      Abhishek Kona 提交于
      Summary:
      WAL files are moved to archive directory and clear only at DB::Open.
      Can lead to a lot of space consumption in a Database. Added logic to periodically clear Archive Directory too.
      
      Test Plan: make all check + add unit test
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D10617
      988c20b9
  12. 04 5月, 2013 1 次提交
    • H
      [Rocksdb] Support Merge operation in rocksdb · 05e88540
      Haobo Xu 提交于
      Summary:
      This diff introduces a new Merge operation into rocksdb.
      The purpose of this review is mostly getting feedback from the team (everyone please) on the design.
      
      Please focus on the four files under include/leveldb/, as they spell the client visible interface change.
      include/leveldb/db.h
      include/leveldb/merge_operator.h
      include/leveldb/options.h
      include/leveldb/write_batch.h
      
      Please go over local/my_test.cc carefully, as it is a concerete use case.
      
      Please also review the impelmentation files to see if the straw man implementation makes sense.
      
      Note that, the diff does pass all make check and truly supports forward iterator over db and a version
      of Get that's based on iterator.
      
      Future work:
      - Integration with compaction
      - A raw Get implementation
      
      I am working on a wiki that explains the design and implementation choices, but coding comes
      just naturally and I think it might be a good idea to share the code earlier. The code is
      heavily commented.
      
      Test Plan: run all local tests
      
      Reviewers: dhruba, heyongqiang
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao, sheki, emayanke, MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D9651
      05e88540
  13. 29 3月, 2013 1 次提交
  14. 21 3月, 2013 1 次提交
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  15. 20 3月, 2013 1 次提交
  16. 07 3月, 2013 1 次提交
    • A
      Do not allow Transaction Log Iterator to fall ahead when writer is writing the same file · d68880a1
      Abhishek Kona 提交于
      Summary:
      Store the last flushed, seq no. in db_impl. Check against it in
      transaction Log iterator. Do not attempt to read ahead if we do not know
      if the data is flushed completely.
      Does not work if flush is disabled. Any ideas on fixing that?
      * Minor change, iter->Next is called the first time automatically for
      * the first time.
      
      Test Plan:
      existing test pass.
      More ideas on testing this?
      Planning to run some stress test.
      
      Reviewers: dhruba, heyongqiang
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9087
      d68880a1
  17. 04 3月, 2013 1 次提交
    • M
      Add rate_delay_limit_milliseconds · 993543d1
      Mark Callaghan 提交于
      Summary:
      This adds the rate_delay_limit_milliseconds option to make the delay
      configurable in MakeRoomForWrite when the max compaction score is too high.
      This delay is called the Ln slowdown. This change also counts the Ln slowdown
      per level to make it possible to see where the stalls occur.
      
      From IO-bound performance testing, the Level N stalls occur:
      * with compression -> at the largest uncompressed level. This makes sense
                            because compaction for compressed levels is much
                            slower. When Lx is uncompressed and Lx+1 is compressed
                            then files pile up at Lx because the (Lx,Lx+1)->Lx+1
                            compaction process is the first to be slowed by
                            compression.
      * without compression -> at level 1
      
      Task ID: #1832108
      
      Blame Rev:
      
      Test Plan:
      run with real data, added test
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D9045
      993543d1
  18. 23 2月, 2013 1 次提交
  19. 21 2月, 2013 1 次提交
    • A
      Fix for the weird behaviour encountered by ldb Get where it could read only the second-latest value · b2c50f1c
      amayank 提交于
      Summary:
      Changed the Get and Scan options with openForReadOnly mode to have access to the memtable.
      Changed the visibility of NewInternalIterator in db_impl from private to protected so that
      the derived class db_impl_read_only can call that in its NewIterator function for the
      scan case. The previous approach which changed the default for flush_on_destroy_ from false to true
      caused many problems in the unit tests due to empty sst files that it created. All
      unit tests pass now.
      
      Test Plan: make clean; make all check; ldb put and get and scans
      
      Reviewers: dhruba, heyongqiang, sheki
      
      Reviewed By: dhruba
      
      CC: kosievdmerwe, zshao, dilipj, kailiu
      
      Differential Revision: https://reviews.facebook.net/D8697
      b2c50f1c
  20. 26 1月, 2013 1 次提交
    • C
      Fix poor error on num_levels mismatch and few other minor improvements · 0b83a831
      Chip Turner 提交于
      Summary:
      Previously, if you opened a db with num_levels set lower than
      the database, you received the unhelpful message "Corruption:
      VersionEdit: new-file entry."  Now you get a more verbose message
      describing the issue.
      
      Also, fix handling of compression_levels (both the run-over-the-end
      issue and the memory management of it).
      
      Lastly, unique_ptr'ify a couple of minor calls.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8151
      0b83a831
  21. 24 1月, 2013 1 次提交
    • C
      Fix a number of object lifetime/ownership issues · 2fdf91a4
      Chip Turner 提交于
      Summary:
      Replace manual memory management with std::unique_ptr in a
      number of places; not exhaustive, but this fixes a few leaks with file
      handles as well as clarifies semantics of the ownership of file handles
      with log classes.
      
      Test Plan: db_stress, make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: zshao, leveldb, heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D8043
      2fdf91a4
  22. 17 1月, 2013 1 次提交
    • A
      rollover manifest file. · 7d5a4383
      Abhishek Kona 提交于
      Summary:
      Check in LogAndApply if the file size is more than the limit set in
      Options.
      Things to consider : will this be expensive?
      
      Test Plan: make all check. Inputs on a new unit test?
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7701
      7d5a4383
  23. 20 12月, 2012 1 次提交
    • D
      Enhance ReadOnly mode to process the all committed transactions. · f4c2b7cf
      Dhruba Borthakur 提交于
      Summary:
      Leveldb has an api OpenForReadOnly() that opens the database
      in readonly mode. This call had an option to not process the
      transaction log.  This patch removes this option and always
      processes all transactions that had been committed. It has
      been done in such a way that it does not create/write to
      any new files in the process. The invariant of "no-writes"
      to the leveldb data directory is still true.
      
      This enhancement allows multiple threads to open the same database
      in readonly mode and access all trancations that were committed right
      upto the OpenForReadOnly call.
      
      I changed the public API to match the new semantics because
      there are no users who are currently using this api.
      
      Test Plan: make clean check
      
      Reviewers: sheki
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7479
      f4c2b7cf
  24. 11 12月, 2012 2 次提交
  25. 08 12月, 2012 1 次提交
    • A
      GetUpdatesSince API to enable replication. · 80550089
      Abhishek Kona 提交于
      Summary:
      How it works:
      * GetUpdatesSince takes a SequenceNumber.
      * A LogFile with the first SequenceNumber nearest and lesser than the requested Sequence Number is found.
      * Seek in the logFile till the requested SeqNumber is found.
      * Return an iterator which contains logic to return record's one by one.
      
      Test Plan:
      * Test case included to check the good code path.
      * Will update with more test-cases.
      * Feedback required on test-cases.
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7119
      80550089
  26. 29 11月, 2012 3 次提交
    • S
      Move WAL files to archive directory, instead of deleting. · d4627e6d
      sheki 提交于
      Summary:
      Create a directory "archive" in the DB directory.
      During DeleteObsolteFiles move the WAL files (*.log) to the Archive directory,
      instead of deleting.
      
      Test Plan: Created a DB using DB_Bench. Reopened it. Checked if files move.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6975
      d4627e6d
    • A
      Fix all the lint errors. · d29f1819
      Abhishek Kona 提交于
      Summary:
      Scripted and removed all trailing spaces and converted all tabs to
      spaces.
      
      Also fixed other lint errors.
      All lint errors from this point of time should be taken seriously.
      
      Test Plan: make all check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7059
      d29f1819
    • D
      Delete non-visible keys during a compaction even in the presense of snapshots. · 9a357847
      Dhruba Borthakur 提交于
      Summary:
       LevelDB should delete almost-new keys when a long-open snapshot exists.
      The previous behavior is to keep all versions that were created after the
      oldest open snapshot. This can lead to database size bloat for
      high-update workloads when there are long-open snapshots and long-open
      snapshot will be used for logical backup. By "almost new" I mean that the
      key was updated more than once after the oldest snapshot.
      
      If there were two snapshots with seq numbers s1 and s2 (s1 < s2), and if
      we find two instances of the same key k1 that lie entirely within s1 and
      s2 (i.e. s1 < k1 < s2), then the earlier version
      of k1 can be safely deleted because that version is not visible in any snapshot.
      
      Test Plan:
      unit test attached
      make clean check
      
      Differential Revision: https://reviews.facebook.net/D6999
      9a357847
  27. 19 11月, 2012 1 次提交
    • D
      enhance dbstress to simulate hard crash · 62e7583f
      Dhruba Borthakur 提交于
      Summary:
      dbstress has an option to reopen the database. Make it such that the
      previous handle is not closed before we reopen, this simulates a
      situation similar to a process crash.
      
      Added new api to DMImpl to remove the lock file.
      
      Test Plan: run db_stress
      
      Reviewers: emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D6777
      62e7583f
  28. 08 11月, 2012 1 次提交
  29. 07 11月, 2012 1 次提交
  30. 30 10月, 2012 1 次提交
    • M
      Adds DB::GetNextCompaction and then uses that for rate limiting db_bench · 70c42bf0
      Mark Callaghan 提交于
      Summary:
      Adds a method that returns the score for the next level that most
      needs compaction. That method is then used by db_bench to rate limit threads.
      Threads are put to sleep at the end of each stats interval until the score
      is less than the limit. The limit is set via the --rate_limit=$double option.
      The specified value must be > 1.0. Also adds the option --stats_per_interval
      to enable additional metrics reported every stats interval.
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6243
      70c42bf0
  31. 25 10月, 2012 1 次提交
    • M
      Improve statistics · e7206f43
      Mark Callaghan 提交于
      Summary:
      This adds more statistics to be reported by GetProperty("leveldb.stats").
      The new stats include time spent waiting on stalls in MakeRoomForWrite.
      This also includes the total amplification rate where that is:
          (#bytes of sequential IO during compaction) / (#bytes from Put)
      This also includes a lot more data for the per-level compaction report.
      * Rn(MB) - MB read from level N during compaction between levels N and N+1
      * Rnp1(MB) - MB read from level N+1 during compaction between levels N and N+1
      * Wnew(MB) - new data written to the level during compaction
      * Amplify - ( Write(MB) + Rnp1(MB) ) / Rn(MB)
      * Rn - files read from level N during compaction between levels N and N+1
      * Rnp1 - files read from level N+1 during compaction between levels N and N+1
      * Wnp1 - files written to level N+1 during compaction between levels N and N+1
      * NewW - new files written to level N+1 during compaction
      * Count - number of compactions done for this level
      
      This is the new output from DB::GetProperty("leveldb.stats"). The old output stopped at Write(MB)
      
                                     Compactions
      Level  Files Size(MB) Time(sec) Read(MB) Write(MB)  Rn(MB) Rnp1(MB) Wnew(MB) Amplify Read(MB/s) Write(MB/s)   Rn Rnp1 Wnp1 NewW Count
      -------------------------------------------------------------------------------------------------------------------------------------
        0        3        6        33        0       576       0        0      576    -1.0       0.0         1.3     0    0    0    0   290
        1      127      242       351     5316      5314     570     4747      567    17.0      12.1        12.1   287 2399 2685  286    32
        2      161      328        54      822       824     326      496      328     4.0       1.9         1.9   160  251  411  160   161
      Amplification: 22.3 rate, 0.56 GB in, 12.55 GB out
      Uptime(secs): 439.8
      Stalls(secs): 206.938 level0_slowdown, 0.000 level0_numfiles, 24.129 memtable_compaction
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      (cherry picked from commit ecdeead38f86cc02e754d0032600742c4f02fec8)
      
      Reviewers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D6153
      e7206f43
  32. 23 10月, 2012 1 次提交
  33. 21 10月, 2012 2 次提交
  34. 20 10月, 2012 1 次提交
    • D
      This is the mega-patch multi-threaded compaction · 1ca05843
      Dhruba Borthakur 提交于
      published in https://reviews.facebook.net/D5997.
      
      Summary:
      This patch allows compaction to occur in multiple background threads
      concurrently.
      
      If a manual compaction is issued, the system falls back to a
      single-compaction-thread model. This is done to ensure correctess
      and simplicity of code. When the manual compaction is finished,
      the system resumes its concurrent-compaction mode automatically.
      
      The updates to the manifest are done via group-commit approach.
      
      Test Plan: run db_bench
      1ca05843
  35. 17 10月, 2012 1 次提交
    • D
      The deletion of obsolete files should not occur very frequently. · aa73538f
      Dhruba Borthakur 提交于
      Summary:
      The method DeleteObsolete files is a very costly methind, especially
      when the number of files in a system is large. It makes a list of
      all live-files and then scans the directory to compute the diff.
      By default, this method is executed after every compaction run.
      
      This patch makes it such that DeleteObsolete files is never
      invoked twice within a configured period.
      
      Test Plan: run all unit tests
      
      Reviewers: heyongqiang, MarkCallaghan
      
      Reviewed By: MarkCallaghan
      
      Differential Revision: https://reviews.facebook.net/D6045
      aa73538f