1. 11 12月, 2013 1 次提交
  2. 02 12月, 2013 1 次提交
    • L
      Fix build without glibc · 45a2f2d8
      lovro 提交于
      Summary: The preprocessor does not follow normal rules of && evaluation, tries to evaluate __GLIBC_PREREQ(2, 12) even though the defined() check fails.  This breaks the build if __GLIBC_PREREQ is absent.
      
      Test Plan: Try adding #undef __GLIBC_PREREQ above the offending line, build no longer breaks
      
      Reviewed By: igor
      
      Blame Rev: 4c813836
      45a2f2d8
  3. 28 11月, 2013 1 次提交
  4. 17 11月, 2013 1 次提交
  5. 02 11月, 2013 1 次提交
    • D
      Implement a compressed block cache. · b4ad5e89
      Dhruba Borthakur 提交于
      Summary:
      Rocksdb can now support a uncompressed block cache, or a compressed
      block cache or both. Lookups first look for a block in the
      uncompressed cache, if it is not found only then it is looked up
      in the compressed cache. If it is found in the compressed cache,
      then it is uncompressed and inserted into the uncompressed cache.
      
      It is possible that the same block resides in the compressed cache
      as well as the uncompressed cache at the same time. Both caches
      have their own individual LRU policy.
      
      Test Plan: Unit test case attached.
      
      Reviewers: kailiu, sdong, haobo, leveldb
      
      Reviewed By: haobo
      
      CC: xjin, haobo
      
      Differential Revision: https://reviews.facebook.net/D12675
      b4ad5e89
  6. 01 11月, 2013 1 次提交
  7. 23 10月, 2013 1 次提交
    • M
      Dbid feature · 9b50106f
      Mayank Agarwal 提交于
      Summary:
      Create a new type of file on startup if it doesn't already exist called DBID.
      This will store a unique number generated from boost library's uuid header file.
      The use-case is to identify the case of a db losing all its data and coming back up either empty or from an image(backup/live replica's recovery)
      the key point to note is that DBID is not stored in a backup or db snapshot
      It's preferable to use Boost for uuid because:
      1) A non-standard way of generating uuid is not good
      2) /proc/sys/kernel/random/uuid generates a uuid but only on linux environments and the solution would not be clean
      3) c++ doesn't have any direct way to get a uuid
      4) Boost is a very good library that was already having linkage in rocksdb from third-party
      Note: I had to update the TOOLCHAIN_REV in build files to get latest verison of boost from third-party as the older version had a bug.
      I had to put Wno-uninitialized in Makefile because boost-1.51 has an unitialized variable and rocksdb would not comiple otherwise. Latet open-source for boost is 1.54 but is not there in third-party. I have notified the concerned people in fbcode about it.
      @kailiu : While releasing to third-party, an additional dependency will need to be created for boost in TARGETS file. I can help identify.
      
      Test Plan:
      Expand db_test to test 2 cases
      1) Restarting db with Id file present - verify that no change to Id
      2)Restarting db with Id file deleted - verify that a different Id is there after reopen
      Also run make all check
      
      Reviewers: dhruba, haobo, kailiu, sdong
      
      Reviewed By: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13587
      9b50106f
  8. 17 10月, 2013 1 次提交
  9. 10 10月, 2013 1 次提交
    • I
      Env class that can randomly read and write · d0beadd4
      Igor Canadi 提交于
      Summary: I have implemented basic simple use case that I need for External Value Store I'm working on. There is a potential for making this prettier by refactoring/combining WritableFile and RandomAccessFile, avoiding some copypasta. However, I decided to implement just the basic functionality, so I can continue working on the other diff.
      
      Test Plan: Added a unittest
      
      Reviewers: dhruba, haobo, kailiu
      
      Reviewed By: haobo
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D13365
      d0beadd4
  10. 06 10月, 2013 1 次提交
  11. 05 10月, 2013 1 次提交
  12. 24 9月, 2013 1 次提交
  13. 16 9月, 2013 2 次提交
  14. 13 9月, 2013 1 次提交
    • H
      [RocksDB] Enhance Env to support two thread pools LOW and HIGH · 1565dab8
      Haobo Xu 提交于
      Summary:
      this is the ground work for separating memtable flush jobs to their own thread pool.
      Both SetBackgroundThreads and Schedule take a third parameter Priority to indicate which thread pool they are working on. The names LOW and HIGH are just identifiers for two different thread pools, and does not indicate real difference in 'priority'. We can set number of threads in the pools independently.
      The thread pool implementation is refactored.
      
      Test Plan: make check
      
      Reviewers: dhruba, emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D12885
      1565dab8
  15. 08 9月, 2013 1 次提交
    • H
      [RocksDB] Added nano second stopwatch and new perf counters to track block read cost · f2f4c807
      Haobo Xu 提交于
      Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call.
      
      Test Plan: perf_context_test, also testing with viewstate shadow traffic.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, xjin
      
      Differential Revision: https://reviews.facebook.net/D12351
      f2f4c807
  16. 24 8月, 2013 1 次提交
  17. 20 6月, 2013 1 次提交
  18. 19 6月, 2013 1 次提交
  19. 13 6月, 2013 2 次提交
    • H
      [RocksDB] Sync file to disk incrementally · 778e1790
      Haobo Xu 提交于
      Summary:
      During compaction, we sync the output files after they are fully written out. This causes unnecessary blocking of the compaction thread and burstiness of the write traffic.
      This diff simply asks the OS to sync data incrementally as they are written, on the background. The hope is that, at the final sync, most of the data are already on disk and we would block less on the sync call. Thus, each compaction runs faster and we could use fewer number of compaction threads to saturate IO.
      In addition, the write traffic will be smoothed out, hopefully reducing the IO P99 latency too.
      
      Some quick tests show 10~20% improvement in per thread compaction throughput. Combined with posix advice on compaction read, just 5 threads are enough to almost saturate the udb flash bandwidth for 800 bytes write only benchmark.
      What's more promising is that, with saturated IO, iostat shows average wait time is actually smoother and much smaller.
      For the write only test 800bytes test:
      Before the change:  await  occillate between 10ms and 3ms
      After the change: await ranges 1-3ms
      
      Will test against read-modify-write workload too, see if high read latency P99 could be resolved.
      
      Will introduce a parameter to control the sync interval in a follow up diff after cleaning up EnvOptions.
      
      Test Plan: make check; db_bench; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11115
      778e1790
    • H
      [RocksDB] cleanup EnvOptions · bdf10859
      Haobo Xu 提交于
      Summary:
      This diff simplifies EnvOptions by treating it as POD, similar to Options.
      - virtual functions are removed and member fields are accessed directly.
      - StorageOptions is removed.
      - Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
      - Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
      
      Test Plan: make check; db_stress
      
      Reviewers: dhruba
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D11175
      bdf10859
  20. 31 5月, 2013 1 次提交
    • H
      [RocksDB] [Performance] Allow different posix advice to be applied to the same table file · ab8d2f6a
      Haobo Xu 提交于
      Summary:
      Current posix advice implementation ties up the access pattern hint with the creation of a file.
      It is not possible to apply different advice for different access (random get vs compaction read),
      without keeping two open files for the same table. This patch extended the RandomeAccessFile interface
      to accept new access hint at anytime. Particularly, we are able to set different access hint on the same
      table file based on when/how the file is used.
      Two options are added to set the access hint, after the file is first opened and after the file is being
      compacted.
      
      Test Plan: make check; db_stress; db_bench
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: MarkCallaghan, leveldb
      
      Differential Revision: https://reviews.facebook.net/D10905
      ab8d2f6a
  21. 22 5月, 2013 1 次提交
    • V
      [Kill randomly at various points in source code for testing] · 760dd475
      Vamsi Ponnekanti 提交于
      Summary:
      This is initial version. A few ways in which this could
      be extended in the future are:
      (a) Killing from more places in source code
      (b) Hashing stack and using that hash in determining whether to crash.
          This is to avoid crashing more often at source lines that are executed
          more often.
      (c) Raising exceptions or returning errors instead of killing
      
      Test Plan:
      This whole thing is for testing.
      
      Here is part of output:
      
      python2.7 tools/db_crashtest2.py -d 600
      Running db_stress
      
      db_stress retncode -15 output LevelDB version     : 1.5
      Number of threads   : 32
      Ops per thread      : 10000000
      Read percentage     : 50
      Write-buffer-size   : 4194304
      Delete percentage   : 30
      Max key             : 1000
      Ratio #ops/#keys    : 320000
      Num times DB reopens: 0
      Batches/snapshots   : 1
      Purge redundant %   : 50
      Num keys per lock   : 4
      Compression         : snappy
      ------------------------------------------------
      No lock creation because test_batches_snapshots set
      2013/04/26-17:55:17  Starting database operations
      Created bg thread 0x7fc1f07ff700
      ... finished 60000 ops
      Running db_stress
      
      db_stress retncode -15 output LevelDB version     : 1.5
      Number of threads   : 32
      Ops per thread      : 10000000
      Read percentage     : 50
      Write-buffer-size   : 4194304
      Delete percentage   : 30
      Max key             : 1000
      Ratio #ops/#keys    : 320000
      Num times DB reopens: 0
      Batches/snapshots   : 1
      Purge redundant %   : 50
      Num keys per lock   : 4
      Compression         : snappy
      ------------------------------------------------
      Created bg thread 0x7ff0137ff700
      No lock creation because test_batches_snapshots set
      2013/04/26-17:56:15  Starting database operations
      ... finished 90000 ops
      
      Revert Plan: OK
      
      Task ID: #2252691
      
      Reviewers: dhruba, emayanke
      
      Reviewed By: emayanke
      
      CC: leveldb, haobo
      
      Differential Revision: https://reviews.facebook.net/D10581
      760dd475
  22. 23 4月, 2013 2 次提交
    • K
      Avoid global static initialization in Env::Default() · 958b9c80
      Kai Liu 提交于
      Summary:
      Mark's task description from #2316777
      
      Env::Default() comes from util/env_posix.cc
      
      This is a static global.
      
      static PosixEnv default_env;
      
      Env* Env::Default() {
        return &default_env;
      }
      
      -----
      
      These globals assume default_env was initialized first. I don't think that is safe or correct to do (http://stackoverflow.com/questions/1005685/c-static-initialization-order)
      
      const string AutoRollLoggerTest::kTestDir(
      test::TmpDir() + "/db_log_test");
      const string AutoRollLoggerTest::kLogFile(
      test::TmpDir() + "/db_log_test/LOG");
      Env* AutoRollLoggerTest::env = Env::Default();
      
      Test Plan:
      run make clean && make && make check
      But how can I know if it works in Ubuntu?
      
      Reviewers: MarkCallaghan, chip
      
      Reviewed By: chip
      
      CC: leveldb, dhruba, haobo
      
      Differential Revision: https://reviews.facebook.net/D10491
      958b9c80
    • D
      Initialize parameters in the constructor. · 3cb7bf81
      Dhruba Borthakur 提交于
      Summary:
      RocksDB doesn't build on Ubuntu VM .. shoudl be fixed with this patch.
      
      g++ --version
      g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
      
      util/env_posix.cc:68:24: sorry, unimplemented: non-static data member initializers
      util/env_posix.cc:68:24: error: ISO C++ forbids in-class initialization of non-const static member ‘use_os_buffer’
      util/env_posix.cc:113:24: sorry, unimplemented: non-static data member initializers
      util/env_posix.cc:113:24: error: ISO C++ forbids in-class initialization of non-const static member ‘use_os_buffer
      
      Test Plan: make check
      
      Reviewers: sheki, leveldb
      
      Reviewed By: sheki
      
      Differential Revision: https://reviews.facebook.net/D10461
      3cb7bf81
  23. 11 4月, 2013 2 次提交
    • M
      Exit and Join the background compaction threads while running rocksdb tests · 6594fef7
      Mayank Agarwal 提交于
      Summary:
      The background compaction threads are never exitted and therefore caused
      memory-leaks while running rpcksdb tests. Have changed the PosixEnv destructor to exit and join them and changed the tests likewise
      The memory leaked has reduced from 320 bytes to 64 bytes in all the tests. The 64
      bytes is relating to
      pthread_exit, but still have to figure out why. The stack-trace right now with
      table_test.cc = 64 bytes in 1 blocks are possibly lost in loss record 4 of 5
         at 0x475D8C: malloc (jemalloc.c:914)
         by 0x400D69E: _dl_map_object_deps (dl-deps.c:505)
         by 0x4013393: dl_open_worker (dl-open.c:263)
         by 0x400F015: _dl_catch_error (dl-error.c:178)
         by 0x4013B2B: _dl_open (dl-open.c:569)
         by 0x5D3E913: do_dlopen (dl-libc.c:86)
         by 0x400F015: _dl_catch_error (dl-error.c:178)
         by 0x5D3E9D6: __libc_dlopen_mode (dl-libc.c:47)
         by 0x5048BF3: pthread_cancel_init (unwind-forcedunwind.c:53)
         by 0x5048DC9: _Unwind_ForcedUnwind (unwind-forcedunwind.c:126)
         by 0x5046D9F: __pthread_unwind (unwind.c:130)
         by 0x50413A4: pthread_exit (pthreadP.h:289)
      
      Test Plan: make all check
      
      Reviewers: dhruba, sheki, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, chip
      
      Differential Revision: https://reviews.facebook.net/D9573
      6594fef7
    • H
      Set FD_CLOEXEC after each file open · e21ba94a
      heyongqiang 提交于
      Summary: as subject. This is causing problem in adsconv. Ideally, this flags should be set in open. But that is only supported in Linux kernel ≥2.6.23 and glibc ≥2.7.
      
      Test Plan:
      db_test
      
      run db_test
      
      Reviewers: dhruba, MarkCallaghan, haobo
      
      Reviewed By: dhruba
      
      CC: leveldb, chip
      
      Differential Revision: https://reviews.facebook.net/D10089
      e21ba94a
  24. 10 4月, 2013 1 次提交
  25. 03 4月, 2013 1 次提交
    • H
      [RocksDB] env_posix cleanup · d8150821
      Haobo Xu 提交于
      Summary:
      1. SetBackgroundThreads was not thread safe
      2. queue_size_ does not seem necessary
      3. moved condition signal after shared state change. Even though the original
         order is in practice ok (because the mutex is still held), it looks fishy
         and non-intuitive.
      
      Test Plan: make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: leveldb, zshao
      
      Differential Revision: https://reviews.facebook.net/D9825
      d8150821
  26. 22 3月, 2013 1 次提交
  27. 21 3月, 2013 1 次提交
    • D
      Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. · ad96563b
      Dhruba Borthakur 提交于
      Summary:
      This patch allows an application to specify whether to use bufferedio,
      reads-via-mmaps and writes-via-mmaps per database. Earlier, there
      was a global static variable that was used to configure this functionality.
      
      The default setting remains the same (and is backward compatible):
       1. use bufferedio
       2. do not use mmaps for reads
       3. use mmap for writes
       4. use readaheads for reads needed for compaction
      
      I also added a parameter to db_bench to be able to explicitly specify
      whether to do readaheads for compactions or not.
      
      Test Plan: make check
      
      Reviewers: sheki, heyongqiang, MarkCallaghan
      
      Reviewed By: sheki
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D9429
      ad96563b
  28. 14 3月, 2013 1 次提交
    • A
      Use posix_fallocate as default. · 1ba5abca
      Abhishek Kona 提交于
      Summary:
      Ftruncate does not throw an error on disk-full. This causes Sig-bus in
      the case where the database tries to issue a Put call on a full-disk.
      
      Use posix_fallocate for allocation instead of truncate.
      Add a check to use MMaped files only on ext4, xfs and tempfs, as
      posix_fallocate is very slow on ext3 and older.
      
      Test Plan: make all check
      
      Reviewers: dhruba, chip
      
      Reviewed By: dhruba
      
      CC: adsharma, leveldb
      
      Differential Revision: https://reviews.facebook.net/D9291
      1ba5abca
  29. 01 3月, 2013 1 次提交
  30. 01 2月, 2013 1 次提交
    • K
      Fixed cache key for block cache · 4dcc0c89
      Kosie van der Merwe 提交于
      Summary:
      Added function to `RandomAccessFile` to generate an unique ID for that file. Currently only `PosixRandomAccessFile` has this behaviour implemented and only on Linux.
      
      Changed how key is generated in `Table::BlockReader`.
      
      Added tests to check whether the unique ID is stable, unique and not a prefix of another unique ID. Added tests to see that `Table` uses the cache more efficiently.
      
      Test Plan: make check
      
      Reviewers: chip, vamsi, dhruba
      
      Reviewed By: chip
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D8145
      4dcc0c89
  31. 29 1月, 2013 1 次提交
  32. 25 1月, 2013 1 次提交
    • C
      Use fallocate to prevent excessive allocation of sst files and logs · 3dafdfb2
      Chip Turner 提交于
      Summary:
      On some filesystems, pre-allocation can be a considerable
      amount of space.  xfs in our production environment pre-allocates by
      1GB, for instance.  By using fallocate to inform the kernel of our
      expected file sizes, we eliminate this wasteage (that isn't recovered
      until the file is closed which, in the case of LOG files, can be a
      considerable amount of time).
      
      Test Plan:
      created an xfs loopback filesystem, mounted with
      allocsize=4M, and ran db_stress.  LOG file without this change was 4M,
      and with it it was 128k then grew to normal size.
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: adsharma, leveldb
      
      Differential Revision: https://reviews.facebook.net/D7953
      3dafdfb2
  33. 24 1月, 2013 1 次提交
    • C
      Fix a number of object lifetime/ownership issues · 2fdf91a4
      Chip Turner 提交于
      Summary:
      Replace manual memory management with std::unique_ptr in a
      number of places; not exhaustive, but this fixes a few leaks with file
      handles as well as clarifies semantics of the ownership of file handles
      with log classes.
      
      Test Plan: db_stress, make check
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      CC: zshao, leveldb, heyongqiang
      
      Differential Revision: https://reviews.facebook.net/D8043
      2fdf91a4
  34. 16 1月, 2013 1 次提交
    • C
      Add optional clang compile mode · a2dcd79c
      Chip Turner 提交于
      Summary:
      clang is an alternate compiler based on llvm.  It produces
      nicer error messages and finds some bugs that gcc doesn't, such as the
      size_t change in this file (which caused some write return values to be
      misinterpreted!)
      
      Clang isn't the default; to try it, do "USE_CLANG=1 make" or "export
      USE_CLANG=1" then make as normal
      
      Test Plan: "make check" and "USE_CLANG=1 make check"
      
      Reviewers: dhruba
      
      Reviewed By: dhruba
      
      Differential Revision: https://reviews.facebook.net/D7899
      a2dcd79c
  35. 08 1月, 2013 1 次提交
    • K
      Added clearer error message for failure to create db directory in DBImpl::Recover() · d6e873f2
      Kosie van der Merwe 提交于
      Summary:
      Changed CreateDir() to CreateDirIfMissing() so a directory that already exists now causes and error.
      
      Fixed CreateDirIfMissing() and added Env.DirExists()
      
      Test Plan:
      make check to test for regessions
      
      Ran the following to test if the error message is not about lock files not existing
      ./db_bench --db=dir/testdb
      
      After creating a file "testdb", ran the following to see if it failed with sane error message:
      ./db_bench --db=testdb
      
      Reviewers: dhruba, emayanke, vamsi, sheki
      
      Reviewed By: emayanke
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7707
      d6e873f2
  36. 10 12月, 2012 1 次提交
    • D
      Fix a race condition while processing tasks by background threads. · 38671c4d
      Dhruba Borthakur 提交于
      Summary:
      Suppose you submit 100 background tasks one after another. The first
      enqueu task finds that the queue is empty and wakes up one worker thread.
      Now suppose that all remaining 99 work items are enqueued, they do not
      wake up any worker threads because the queue is already non-empty.
      This causes a situation when there are 99 tasks in the task queue but
      only one worker thread is processing a task while the remaining
      worker threads are waiting.
      The fix is to always wakeup one worker thread while enqueuing a task.
      
      I also added a check to count the number of elements in the queue
      to help in debugging.
      
      Test Plan: make clean check.
      
      Reviewers: chip
      
      Reviewed By: chip
      
      CC: leveldb
      
      Differential Revision: https://reviews.facebook.net/D7203
      38671c4d