1. 01 4月, 2014 1 次提交
    • I
      Retry FS system calls on EINTR · 726c8084
      Igor Canadi 提交于
      Summary: EINTR means 'please retry'. We don't do that currenty. We should.
      
      Test Plan: make check, although it doesn't really test the new code. we'll just have to believe in the code!
      
      Reviewers: haobo, ljin
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D17349
      726c8084
  2. 29 3月, 2014 1 次提交
  3. 26 3月, 2014 1 次提交
  4. 20 3月, 2014 1 次提交
  5. 18 3月, 2014 1 次提交
    • I
      Optimize fallocation · f26cb0f0
      Igor Canadi 提交于
      Summary:
      Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads.
      
      This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x.
      
      At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported.
      
      Test Plan: make check
      
      Reviewers: dhruba, sdong, haobo, ljin
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D16761
      f26cb0f0
  6. 12 3月, 2014 1 次提交
  7. 08 3月, 2014 1 次提交
  8. 07 3月, 2014 1 次提交
  9. 26 2月, 2014 1 次提交
    • L
      thread local pointer storage · b2795b79
      Lei Jin 提交于
      Summary:
      This is not a generic thread local implementation in the sense that it
      only takes pointer. But it does support multiple instances per thread
      and lets user plugin function to perform cleanup when thread exits or an
      instance gets destroyed.
      
      Test Plan: unit test for now
      
      Reviewers: haobo, igor, sdong, dhruba
      
      Reviewed By: igor
      
      CC: leveldb, kailiu
      
      Differential Revision: https://reviews.facebook.net/D16131
      b2795b79
  10. 07 2月, 2014 1 次提交
  11. 28 1月, 2014 1 次提交
    • I
      Fsync directory after we create a new file · 832158e7
      Igor Canadi 提交于
      Summary:
      @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder.
      
      Should I also add FsyncDir() to new files that get created by a compaction?
      
      Test Plan: Confirmed that FsyncDir is returning Status::OK()
      
      Reviewers: dhruba, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D14751
      832158e7
  12. 08 1月, 2014 1 次提交
    • M
      Eliminate stdout message when launching a posix thread. · 4c75e21c
      Mike Lin 提交于
      This seems out of place as it's the only time RocksDB prints to stdout in the
      normal course of operations. Thread IDs can still be retrieved from the LOG
      file: cut -d ' ' -f2 LOG | sort | uniq | egrep -x '[0-9a-f]+'
      4c75e21c
  13. 12 12月, 2013 1 次提交
  14. 11 12月, 2013 1 次提交
  15. 05 12月, 2013 1 次提交
  16. 02 12月, 2013 1 次提交
    • L
      Fix build without glibc · 45a2f2d8
      lovro 提交于
      Summary: The preprocessor does not follow normal rules of && evaluation, tries to evaluate __GLIBC_PREREQ(2, 12) even though the defined() check fails.  This breaks the build if __GLIBC_PREREQ is absent.
      
      Test Plan: Try adding #undef __GLIBC_PREREQ above the offending line, build no longer breaks
      
      Reviewed By: igor
      
      Blame Rev: 4c813836
      45a2f2d8
  17. 28 11月, 2013 1 次提交
  18. 27 11月, 2013 1 次提交
  19. 21 11月, 2013 1 次提交
    • S
      A Simple Plain Table · b59d4d5a
      Siying Dong 提交于
      Summary:
      A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes.
      
      Test Plan:Add unit test
      
      Reviewers:haobo,dhruba,kailiu
      
      CC:
      
      Task ID: #
      
      Blame Rev:
      b59d4d5a
  20. 17 11月, 2013 1 次提交
  21. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  22. 01 11月, 2013 1 次提交
  23. 23 10月, 2013 1 次提交
    • M
      Dbid feature · 9b50106f
      Mayank Agarwal 提交于
      Summary:
      Create a new type of file on startup if it doesn't already exist called DBID.
      This will store a unique number generated from boost library's uuid header file.
      The use-case is to identify the case of a db losing all its data and coming back up either empty or from an image(backup/live replica's recovery)
      the key point to note is that DBID is not stored in a backup or db snapshot
      It's preferable to use Boost for uuid because:
      1) A non-standard way of generating uuid is not good
      2) /proc/sys/kernel/random/uuid generates a uuid but only on linux environments and the solution would not be clean
      3) c++ doesn't have any direct way to get a uuid
      4) Boost is a very good library that was already having linkage in rocksdb from third-party
      Note: I had to update the TOOLCHAIN_REV in build files to get latest verison of boost from third-party as the older version had a bug.
      I had to put Wno-uninitialized in Makefile because boost-1.51 has an unitialized variable and rocksdb would not comiple otherwise. Latet open-source for boost is 1.54 but is not there in third-party. I have notified the concerned people in fbcode about it.
      @kailiu : While releasing to third-party, an additional dependency will need to be created for boost in TARGETS file. I can help identify.
      
      Test Plan:
      Expand db_test to test 2 cases
      1) Restarting db with Id file present - verify that no change to Id
      2)Restarting db with Id file deleted - verify that a different Id is there after reopen
      Also run make all check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13587
      9b50106f
  24. 17 10月, 2013 1 次提交
  25. 10 10月, 2013 1 次提交
    • I
      Env class that can randomly read and write · d0beadd4
      Igor Canadi 提交于
      Summary: I have implemented basic simple use case that I need for External Value Store I'm working on. There is a potential for making this prettier by refactoring/combining WritableFile and RandomAccessFile, avoiding some copypasta. However, I decided to implement just the basic functionality, so I can continue working on the other diff.
      
      Test Plan: Added a unittest
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13365
      d0beadd4
  26. 06 10月, 2013 1 次提交
  27. 05 10月, 2013 1 次提交
  28. 24 9月, 2013 1 次提交
  29. 16 9月, 2013 2 次提交
  30. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Enhance Env to support two thread pools LOW and HIGH · 1565dab8
      Haobo Xu 提交于
      Summary:
      this is the ground work for separating memtable flush jobs to their own thread pool.
      Both SetBackgroundThreads and Schedule take a third parameter Priority to indicate which thread pool they are working on. The names LOW and HIGH are just identifiers for two different thread pools, and does not indicate real difference in 'priority'. We can set number of threads in the pools independently.
      The thread pool implementation is refactored.
      
      Test Plan: make check
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12885
      1565dab8
  31. 08 9月, 2013 1 次提交
    • H
      [RocksDB] Added nano second stopwatch and new perf counters to track block read cost · f2f4c807
      Haobo Xu 提交于
      Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call.
      
      Test Plan: perf_context_test, also testing with viewstate shadow traffic.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D12351
      f2f4c807
  32. 24 8月, 2013 1 次提交
  33. 20 6月, 2013 1 次提交
  34. 19 6月, 2013 1 次提交
  35. 13 6月, 2013 2 次提交
    • H
      [RocksDB] Sync file to disk incrementally · 778e1790
      Haobo Xu 提交于
      Summary:
      During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic.
      This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO.
      In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too.
      
      Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark.
      What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller.
      For the write only test 800bytes test:
      Before the change:  await  occillate between 10ms and 3ms
      After the change: await ranges 1-3ms
      
      Will test against read-modify-write workload too, see if high read latency P99 could be resolved.
      
      Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions.
      
      Test Plan: make check; db_bench; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11115
      778e1790
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  36. 31 5月, 2013 1 次提交
    • H
      [RocksDB] [Performance] Allow different posix advice to be applied to the same table file · ab8d2f6a
      Haobo Xu 提交于
      Summary:
      Current posix advice implementation ties up the access pattern hint with the creation of a file.
      It is not possible to apply different advice for different access (random get vs compaction read),
      without keeping two open files for the same table. This patch extended the RandomeAccessFile interface
      to accept new access hint at anytime. Particularly, we are able to set different access hint on the same
      table file based on when/how the file is used.
      Two options are added to set the access hint, after the file is first opened and after the file is being
      compacted.
      
      Test Plan: make check; db_stress; db_bench
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: MarkCallaghan, leveldb
      
      Differential Revision: https://reviews.facebook.net/D10905
      ab8d2f6a
  37. 22 5月, 2013 1 次提交
    • V
      [Kill randomly at various points in source code for testing] · 760dd475
      Vamsi Ponnekanti 提交于
      Summary:
      This is initial version. A few ways in which this could
      be extended in the future are:
      (a) Killing from more places in source code
      (b) Hashing stack and using that hash in determining whether to crash.
          This is to avoid crashing more often at source lines that are executed
          more often.
      (c) Raising exceptions or returning errors instead of killing
      
      Test Plan:
      This whole thing is for testing.
      
      Here is part of output:
      
      python2.7 tools/db_crashtest2.py -d 600
      Running db_stress
      
      db_stress retncode -15 output LevelDB version     : 1.5
      Number of threads   : 32
      Ops per thread      : 10000000
      Read percentage     : 50
      Write-buffer-size   : 4194304
      Delete percentage   : 30
      Max key             : 1000
      Ratio #ops/#keys    : 320000
      Num times DB reopens: 0
      Batches/snapshots   : 1
      Purge redundant %   : 50
      Num keys per lock   : 4
      Compression         : snappy
      ------------------------------------------------
      No lock creation because test_batches_snapshots set
      2013/04/26-17:55:17  Starting database operations
      Created bg thread 0x7fc1f07ff700
      ... finished 60000 ops
      Running db_stress
      
      db_stress retncode -15 output LevelDB version     : 1.5
      Number of threads   : 32
      Ops per thread      : 10000000
      Read percentage     : 50
      Write-buffer-size   : 4194304
      Delete percentage   : 30
      Max key             : 1000
      Ratio #ops/#keys    : 320000
      Num times DB reopens: 0
      Batches/snapshots   : 1
      Purge redundant %   : 50
      Num keys per lock   : 4
      Compression         : snappy
      ------------------------------------------------
      Created bg thread 0x7ff0137ff700
      No lock creation because test_batches_snapshots set
      2013/04/26-17:56:15  Starting database operations
      ... finished 90000 ops
      
      Revert Plan: OK
      
      Task ID: #2252691
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D10581
      760dd475
  38. 23 4月, 2013 1 次提交
    • K
      Avoid global static initialization in Env::Default() · 958b9c80
      Kai Liu 提交于
      Summary:
      Mark's task description from #2316777
      
      Env::Default() comes from util/env_posix.cc
      
      This is a static global.
      
      static PosixEnv default_env;
      
      Env* Env::Default() {
        return &default_env;
      }
      
      -----
      
      These globals assume default_env was initialized first. I don't think that is safe or correct to do (http://stackoverflow.com/questions/1005685/c-static-initialization-order)
      
      const string AutoRollLoggerTest::kTestDir(
      test::TmpDir() + "/db_log_test");
      const string AutoRollLoggerTest::kLogFile(
      test::TmpDir() + "/db_log_test/LOG");
      Env* AutoRollLoggerTest::env = Env::Default();
      
      Test Plan:
      run make clean && make && make check
      But how can I know if it works in Ubuntu?
      
      Reviewers: MarkCallaghan, chip
      
      Reviewed By: chip
      
      CC: leveldb, dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D10491
      958b9c80