1. 04 2月, 2014 3 次提交
  2. 03 2月, 2014 1 次提交
  3. 01 2月, 2014 3 次提交
    • D
      Fix corruption_test failure caused by auto-enablement of checksum verification. · 30a70065
      Dhruba Borthakur 提交于
      Summary:
      Patch  https://reviews.facebook.net/D15591 enabled checksum
      verification by default. This caused the unit test to fail.
      
      Test Plan: ./corruption_test
      
      Reviewers: igor, kailiu
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15795
      30a70065
    • I
      Mark the log_number file number used · dbbffbd7
      Igor Canadi 提交于
      Summary:
      VersionSet::next_file_number_ is always assumed to be strictly greater than VersionSet::log_number_. In our new recovery code, we artificially set log_number_  to be (log_number + 1), so that once we flush, we don't recover from the same log file again (this is important because of merge operator non-idempotence)
      
      When we set VersionSet::log_number_ to (log_number + 1), we also have to mark that file number used, such that next_file_number_ is increased to a legal level. Otherwise, VersionSet might assert.
      
      This has not be a problem so far because here's what happens:
      1. assume next_file_number is 5, we're recovering log_number 10
      2. in DBImpl::Recover() we call MarkFileNumberUsed with 10. This will set VersionSet::next_file_number_ to 11.
      3. If there are some updates, we will call WriteTable0ForRecovery(), which will use file number 11 as a new table file and advance VersionSet::next_file_number_ to 12.
      4. When we LogAndApply() with log_number 11, assertion is true: assert(11 <= 12);
      
      However, this was a lucky occurrence. Even though this diff doesn't cause a bug, I think the issue is important to fix.
      
      Test Plan: In column families I have different recovery logic and this code path asserted. When adding MarkFileNumberUsed(log_number + 1) assert is gone.
      
      Reviewers: dhruba, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15783
      dbbffbd7
    • S
      When using Universal Compaction, Zero out seqID in the last file too · 56bea9f8
      Siying Dong 提交于
      Summary: I didn't figure out the reason why the feature of zeroing out earlier sequence ID is disabled in universal compaction. I do see bottommost_level is set correctly. It should simply work if we remove the constraint of universal compaction.
      
      Test Plan: make all check
      
      Reviewers: haobo, dhruba, kailiu, igor
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15423
      56bea9f8
  4. 31 1月, 2014 1 次提交
  5. 30 1月, 2014 8 次提交
    • I
      InternalStatistics · 3c0dcf0e
      Igor Canadi 提交于
      Summary:
      In DBImpl we keep track of some statistics internally and expose them via GetProperty(). This diff encapsulates all the internal statistics into a class InternalStatisics. Most of it is copy/paste.
      
      Apart from cleaning up db_impl.cc, this diff is also necessary for Column families, since every column family should have its own CompactionStats, MakeRoomForWrite-stall stats, etc. It's much easier to keep track of it in every column family if it's nicely encapsulated in its own class.
      
      Test Plan: make check
      
      Reviewers: dhruba, kailiu, haobo, sdong, emayanke
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15273
      3c0dcf0e
    • L
      set bg_error_ when background flush goes wrong · d118707f
      Lei Jin 提交于
      Summary: as title
      
      Test Plan: unit test
      
      Reviewers: haobo, igor, sdong, kailiu, dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15435
      d118707f
    • L
      convert Tickers back to array with padding and alignment · fb84c49a
      Lei Jin 提交于
      Summary:
      Pad each Ticker structure to be 64 bytes and make them align on 64 bytes
      boundary to avoid cache line false sharing issue.
      Please refer to task 3615553 for more details
      
      Test Plan:
      db_bench
      
      LevelDB:    version 2.0s
      Date:       Wed Jan 29 12:23:17 2014
      CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
      CPUCache:   20480 KB
      rocksdb.build.overwrite.qps 49638
      rocksdb.build.overwrite.p50_micros 58.73
      rocksdb.build.overwrite.p75_micros 210.56
      rocksdb.build.overwrite.p99_micros 733.28
      rocksdb.build.fillseq.qps 366729
      rocksdb.build.fillseq.p50_micros 1.00
      rocksdb.build.fillseq.p75_micros 1.00
      rocksdb.build.fillseq.p99_micros 2.65
      rocksdb.build.readrandom.qps 1152995
      rocksdb.build.readrandom.p50_micros 11.27
      rocksdb.build.readrandom.p75_micros 15.69
      rocksdb.build.readrandom.p99_micros 33.59
      rocksdb.build.readrandom_smallblockcache.qps 956047
      rocksdb.build.readrandom_smallblockcache.p50_micros 15.23
      rocksdb.build.readrandom_smallblockcache.p75_micros 17.31
      rocksdb.build.readrandom_smallblockcache.p99_micros 31.49
      rocksdb.build.readrandom_memtable_sst.qps 1105183
      rocksdb.build.readrandom_memtable_sst.p50_micros 12.04
      rocksdb.build.readrandom_memtable_sst.p75_micros 15.78
      rocksdb.build.readrandom_memtable_sst.p99_micros 32.49
      rocksdb.build.readrandom_fillunique_random.qps 487856
      rocksdb.build.readrandom_fillunique_random.p50_micros 29.65
      rocksdb.build.readrandom_fillunique_random.p75_micros 40.93
      rocksdb.build.readrandom_fillunique_random.p99_micros 78.68
      rocksdb.build.memtablefillrandom.qps 91304
      rocksdb.build.memtablefillrandom.p50_micros 171.05
      rocksdb.build.memtablefillrandom.p75_micros 196.12
      rocksdb.build.memtablefillrandom.p99_micros 291.73
      rocksdb.build.memtablereadrandom.qps 1340411
      rocksdb.build.memtablereadrandom.p50_micros 9.48
      rocksdb.build.memtablereadrandom.p75_micros 13.95
      rocksdb.build.memtablereadrandom.p99_micros 30.36
      rocksdb.build.readwhilewriting.qps 491004
      rocksdb.build.readwhilewriting.p50_micros 29.58
      rocksdb.build.readwhilewriting.p75_micros 40.34
      rocksdb.build.readwhilewriting.p99_micros 76.78
      
      Reviewers: igor, haobo
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15573
      fb84c49a
    • K
      LIBNAME in Makefile is not really configurable · b7db2411
      Kai Liu 提交于
      Summary:
      In new third-party release tool, `LIBNAME=<customized_library> make`
      will not really change the LIBNAME.
      
      However it's very odd that the same approach works with old third-party
      release tools. I checked previous rocksdb version and both librocksdb.a
      and librocksdb_debug.a were correctly generated and copied to the
      right place.
      
      Test Plan:
      `LIBNAME=hello make -j32` generates hello.a
      `make -j32` generates librocksdb.a
      
      Reviewers: igor, sdong, haobo, dhruba
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15555
      b7db2411
    • K
      Canonicalize "RocksDB" in make_new_version.sh · b1874af8
      kailiu 提交于
      Summary: Change all occurrences of "rocksdb" to its canonical form "RocksDB".
      
      Test Plan: N/A
      
      Reviewers: igor
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15549
      b1874af8
    • K
      Improve make_new_version.sh · c9eef784
      kailiu 提交于
      c9eef784
    • I
      Installation instructions for CentOS · 9a597dc6
      Igor Canadi 提交于
      9a597dc6
    • I
      add include <atomic> to version_set.h · e57f0cc1
      Igor Canadi 提交于
      e57f0cc1
  6. 29 1月, 2014 6 次提交
    • K
      Add history log and revise script · 9fe60d50
      kailiu 提交于
      Summary:
      * Add a change log for rocksdb releases.
      * Remove the hacky parts of make_new_version.sh, which are either
        no longer useful or will be done in our dedicated 3rd-party release
        tool.
      
      Test Plan: N/A
      
      Reviewers: igor, haobo, sdong, dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15543
      9fe60d50
    • L
      only corrupt private file checksum in backupable_db_test · 9a126ba3
      Lei Jin 提交于
      Summary:
      if it happens (randomly) to corrupt shared file in the test, then the
          checksum will be inconsistent between meta files from different backup.
          BackupEngine will then detect this issue and fail. But in reality, this
          does not happen since the checksum is checked on every backup. So here,
          only corrupt checksum of private file to let BackupEngine to construct
          properly (but fail during restore).
      
      Test Plan: run test with valgrind
      
      Reviewers: igor
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15531
      9a126ba3
    • I
      Only get the manifest file size if there is no error · 5d2c6282
      Igor Canadi 提交于
      Summary:
      I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry.
      
      @kailiu, can we get this in release, too?
      
      Test Plan: make check
      
      Reviewers: kailiu, dhruba, haobo
      
      Reviewed By: haobo
      
      CC: leveldb, kailiu
      
      Differential Revision: https://reviews.facebook.net/D15513
      5d2c6282
    • I
      Better interface to create BackupEngine · e5ec7384
      Igor Canadi 提交于
      Summary: I think it looks nicer. In RocksDB we have both styles, but I think that static method is the more common version.
      
      Test Plan: backupable_db_test
      
      Reviewers: ljin, benj, swk
      
      Reviewed By: ljin
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15519
      e5ec7384
    • I
      Export BackupEngine · ec2fa4a6
      Igor Canadi 提交于
      Summary:
      Lots of clients have problems with using StackableDB interface. It's nice to have BackupableDB as a layer on top of DB, but not necessary.
      
      This diff exports BackupEngine, which can be used to create backups without forcing clients to use StackableDB interface.
      
      Test Plan: backupable_db_test
      
      Reviewers: dhruba, ljin, swk
      
      Reviewed By: ljin
      
      CC: leveldb, benj
      
      Differential Revision: https://reviews.facebook.net/D15477
      ec2fa4a6
    • L
      add checksum for backup files · 9dc29414
      Lei Jin 提交于
      Summary: Keep checksum of each backuped file in meta file. When it restores these files, compute their checksum on the fly and compare against what is in the meta file. Fail the restore process if checksum mismatch.
      
      Test Plan: unit test
      
      Reviewers: haobo, igor, sdong, kailiu
      
      Reviewed By: igor
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D15381
      9dc29414
  7. 28 1月, 2014 4 次提交
    • M
      Update monitoring to include average time per compaction and stall · 90f29ccb
      Mark Callaghan 提交于
      Summary:
      The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds.
      Level  Files Size(MB) Score Time(sec)  Read(MB) Write(MB)    Rn(MB)  Rnp1(MB)  Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s)      Rn     Rnp1     Wnp1     NewW    Count   msComp   msStall  Ln-stall Stall-cnt
      ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
        0        8       15   1.5         2         0        30         0         0        30        0.0       0.0        15.5        0        0        0        0       16      112       0.2       1.3      7568
        1        8       16   1.6         1        26        26        15        11        16        3.5      17.6        18.1        8        6       13        7        3      362       0.0       0.0         0
        2        1        2   0.0         0         0         2         0         0         2        0.0       0.0        18.4        0        0        0        0        1       50       0.0       0.0         0
      
      Task ID: #
      
      Blame Rev:
      
      Test Plan:
      run db_bench
      
      Revert Plan:
      
      Database Impact:
      
      Memcache Impact:
      
      Other Notes:
      
      EImportant:
      
      - begin *PUBLIC* platform impact section -
      Bugzilla: #
      - end platform impact -
      
      Reviewers: haobo
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15345
      90f29ccb
    • S
      Fix UnmarkEOF for partial blocks · 3d33da75
      Schalk-Willem Kruger 提交于
      Summary:
      Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch.
      
      This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called).
      
      Test Plan:
      - Added unit tests
      - Stress test (with replication). Check dbdir/LOG file for corruptions.
      - Test on test tier
      
      Reviewers: emayanke, haobo, dhruba
      
      Reviewed By: haobo
      
      CC: vamsi, sheki, dhruba, kailiu, igor
      
      Differential Revision: https://reviews.facebook.net/D15249
      3d33da75
    • I
      Fsync directory after we create a new file · 832158e7
      Igor Canadi 提交于
      Summary:
      @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.
      
      Should I also add FsyncDir() to new files that get created by a compaction?
      
      Test Plan: Confirmed that FsyncDir is returning Status::OK()
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D14751
      832158e7
    • I
      Move NeedsCompaction() from VersionSet to Version · 6c2ca1d3
      Igor Canadi 提交于
      Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet.
      
      Test Plan: make check
      
      Reviewers: kailiu, haobo, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15333
      6c2ca1d3
  8. 27 1月, 2014 1 次提交
  9. 26 1月, 2014 1 次提交
  10. 25 1月, 2014 11 次提交
    • I
      Fix reduce levels · 04afa321
      Igor Canadi 提交于
      ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we
      didn't change the number of levels in VersionSet (we consider them
      immutable from now on). This fixes the problem.
      04afa321
    • S
      Moving Some includes from options.h to forward declaration · 8477255d
      Siying Dong 提交于
      Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
      
      Test Plan: make all check
      
      Reviewers: kailiu, haobo, igor, dhruba
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15411
      8477255d
    • I
      Fixing iterator cleanup for Tailing iterator · f653fdcf
      Igor Canadi 提交于
      Immutable tailing iterator doesn't set CleanupState::mem, so we don't
      have to unref it.
      f653fdcf
    • I
      Add a call DisownData() to Cache, which should speed up shutdown · b13bdfa5
      Igor Canadi 提交于
      Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache.
      
      Test Plan:
      I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368
      
      Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB.
      
      Reviewers: dhruba, haobo, MarkCallaghan, xjin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15069
      b13bdfa5
    • I
      Make VersionSet::ReduceNumberOfLevels() static · 677fee27
      Igor Canadi 提交于
      Summary:
      A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method.
      
      Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file.
      
      Test Plan: reduce_levels_test
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15303
      677fee27
    • I
      MemTableListVersion · c583157d
      Igor Canadi 提交于
      Summary:
      MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
      
      This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
      
      I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
      
      Test Plan: `make check` works. I will also do `make valgrind_check` before commit
      
      Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15255
      c583157d
    • K
      Add a make target for shared library · f131d4c2
      kailiu 提交于
      Summary:
      Previous we made `make release` also compile shared library. However it takes a long time to complete.
      
      To make our development process more efficient. I added a new make target shared_lib.
      
      User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side.
      
      Test Plan: make shared_lib
      
      Reviewers: igor, sdong, haobo, dhruba
      
      Reviewed By: igor
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15309
      f131d4c2
    • I
      Revert "Moving to glibc-fb" · e832e72b
      Igor Canadi 提交于
      This reverts commit d24961b6.
      
      For some reason, glibc2.17-fb breaks gflags. Reverting for now
      e832e72b
    • K
      Temporarily disable caching index/filter blocks · 66dc033a
      kailiu 提交于
      Summary:
      Mixing index/filter blocks with data blocks resulted in some known
      issues.  To make sure in next release our users won't be affected,
      we added a new option in BlockBasedTableFactory::TableOption to
      conceal this functionality for now.
      
      This patch also introduced a BlockBasedTableReader::OpenOptions,
      which avoids the "infinite" growth of parameters in
      BlockBasedTableReader::Open().
      
      Test Plan: make check
      
      Reviewers: haobo, sdong, igor, dhruba
      
      Reviewed By: igor
      
      CC: leveldb, tnovak
      
      Differential Revision: https://reviews.facebook.net/D15327
      66dc033a
    • I
      Moving to glibc-fb · d24961b6
      Igor Canadi 提交于
      Summary:
      It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
      
      This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
      
      Test Plan:
      Compiled
      Ran ./db_bench ./db_stress ./db_repl_stress
      
      Reviewers: kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15405
      d24961b6
    • S
      If User setting of compaction multipliers overflow, use default value 1 instead · 4605e20c
      Siying Dong 提交于
      Summary: Currently, compaction multipliers can overflow and cause unexpected behaviors. In this patch, we detect those overflows and use multiplier 1 for them.
      
      Test Plan: make all check
      
      Reviewers: dhruba, haobo, igor, kailiu
      
      Reviewed By: kailiu
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D15321
      4605e20c
  11. 24 1月, 2014 1 次提交