提交 · f62fbd2c8575f6f14dabe1567f99a62a8ebe07c0 · kvdb / rocksdb

28 5月, 2016 1 次提交

Handle overflow case of rate limiter's paramters · f62fbd2c

由 sdong 提交于 5月 27, 2016

Summary: When rate_bytes_per_sec * refill_period_us_ overflows, the actual limited rate is very low. Handle this case so the rate will be large.

Test Plan: Add a unit test for it.

Reviewers: IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: yiwu, lightmark, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58929

f62fbd2c

20 5月, 2016 2 次提交

Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions. · 2073cf37

由 Aaron Orenstein 提交于 5月 20, 2016

Summary: Reduce use of argument-dependent name lookup in RocksDB.

Test Plan: 'make check' passed.

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58203

2073cf37

Split WinEnv into separate classes. (#1128) · 26adaad4

由 Dmitri Smirnov 提交于 5月 19, 2016

For ease of reuse and customization as a library
  without wrapping.
  WinEnvThreads is a class for replacement.
  WintEnvIO is a class for reuse and behavior override.
  Added private virtual functions for custom override
  of fallocate pread for io classes.

26adaad4

17 5月, 2016 1 次提交
- D
  Fix build issue. (#1123) · bac3be7c
  由 Dmitri Smirnov 提交于 5月 16, 2016
```
Implement GetUniqueIdFromFile to support new tests and the feature.
```
  bac3be7c
13 5月, 2016 1 次提交

Use generic threadpool for Windows environment (#1120) · aab91b8d

由 Dmitri Smirnov 提交于 5月 12, 2016

Conditionally retrofit thread_posix for use with std::thread
  and reuse the same logic. Posix users continue using Posix interfaces.
  Enable XPRESS compression in test runs.
  Fix master introduced signed/unsigned mismatch.

aab91b8d

30 4月, 2016 1 次提交
- D
  Fix multiple issues with WinMmapFile fo sequential writing (#1108) · 4ea6e051
  由 Dmitri Smirnov 提交于 4月 29, 2016
```
make preallocation inline with other writable files
  make sure that we map no more than pre-allocated size.
```
  4ea6e051
29 4月, 2016 1 次提交
- P
  Revert "Use async file handle for better parallelism (#1049)" (#1105) · e8115cea
  由 PraveenSinghRao 提交于 4月 28, 2016
```
This reverts commit b54c3474.

Revert async file handle change as it causes failures with appveyor
```
  e8115cea
28 4月, 2016 1 次提交
- L
  Merge pull request #1101 from flyd1005/wip-fix-typo · 6d4832a9
  由 Li Peng 提交于 4月 28, 2016
```
fix typos and remove duplicated words
```
  6d4832a9
23 4月, 2016 2 次提交

Alpine Linux Build (#990) · b71c4e61

由 dx9 提交于 4月 22, 2016

* Musl libc does not provide adaptive mutex. Added feature test for PTHREAD_MUTEX_ADAPTIVE_NP.

* Musl libc does not provide backtrace(3). Added a feature check for backtrace(3).

* Fixed compiler error.

* Musl libc does not implement backtrace(3). Added platform check for libexecinfo.

* Alpine does not appear to support gcc -pg option. By default (gcc has PIE option enabled) it fails with:

gcc: error: -pie and -pg|p|profile are incompatible when linking

When -fno-PIE and -nopie are used it fails with:

/usr/lib/gcc/x86_64-alpine-linux-musl/5.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find gcrt1.o: No such file or directory

Added gcc -pg platform test and output PROFILING_FLAGS accordingly. Replaced pg var in Makefile with PROFILING_FLAGS.

* fix segfault when TEST_IOCTL_FRIENDLY_TMPDIR is undefined and default candidates are not suitable

* use ASSERT_DOUBLE_EQ instead of ASSERT_EQ

* When compiled with ROCKSDB_MALLOC_USABLE_SIZE UniversalCompactionFourPaths and UniversalCompactionSecondPathRatio tests fail due to premature memtable flushes on systems with 16-byte alignment. Arena runs out of block space before GenerateNewFile() completes.

Increased options.write_buffer_size.

b71c4e61

P

Use async file handle for better parallelism (#1049) · b54c3474
由 PraveenSinghRao 提交于 4月 22, 2016

b54c3474

20 4月, 2016 1 次提交

Introduce XPRESS compresssion on Windows. (#1081) · ee221d2d

由 Dmitri Smirnov 提交于 4月 19, 2016

Comparable with Snappy on comp ratio.
  Implemented using Windows API, does not require external package.
  Avaiable since Windows 8 and server 2012.
  Use -DXPRESS=1 with CMake to enable.

ee221d2d

01 4月, 2016 1 次提交

Fixed compile warnings in posix_logger.h and coding.h · a558830f

由 Yueh-Hsuan Chiang 提交于 3月 31, 2016

Summary:
Fixed the following compile warnings:

/Users/yhchiang/rocksdb/util/posix_logger.h:32:11: error: unused variable 'kDebugLogChunkSize' [-Werror,-Wunused-const-variable]
const int kDebugLogChunkSize = 128 * 1024;
          ^
/Users/yhchiang/rocksdb/util/coding.h:24:20: error: unused variable 'kMaxVarint32Length' [-Werror,-Wunused-const-variable]
const unsigned int kMaxVarint32Length = 5;
                   ^
2 errors generated.

Test Plan: make clean rocksdb

Reviewers: igor, sdong, anthony, IslamAbdelRahman, rven, kradhakrishnan, adamretter

Reviewed By: adamretter

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56223

a558830f

18 3月, 2016 1 次提交
- D
  Latest versions of Jemalloc library do not require je_init()/je_unint() · 2ca0994c
  由 Dmitri Smirnov 提交于 3月 17, 2016
```
  calls. #ifdef in the source code and make this a default build option.
```
  2ca0994c
20 2月, 2016 1 次提交
- D
  Implement ConsistentChildrenAttribute · 9ea2968d
  由 Dmitri Smirnov 提交于 2月 19, 2016
```
  by using default implementation for now as it works.
```
  9ea2968d
18 2月, 2016 1 次提交

[build] Fix env_win.cc compiler errors · d733dd57

由 Andrew Kryczka 提交于 2月 17, 2016

Summary: I broke it in D53781.

Test Plan: tried the same code in util/env_posix.cc and it compiled successfully

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54303

d733dd57

10 2月, 2016 2 次提交

B

Updated all copyright headers to the new format. · 21e95811
由 Baraa Hamodi 提交于 2月 09, 2016

21e95811

Env function for bulk metadata retrieval · 59b3ee65

由 Andrew Kryczka 提交于 2月 09, 2016

Summary:
Added this new function, which returns filename, size, and modified
timestamp for each file in the provided directory. The default implementation
retrieves the metadata sequentially using existing functions. In the next diff
I'll make HdfsEnv override this function to use libhdfs's bulk get function.

This won't work on windows due to the path separator.

Test Plan:
new unit test

  $ ./env_test --gtest_filter=EnvPosixTest.ConsistentChildrenMetadata

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: IslamAbdelRahman, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53781

59b3ee65

02 2月, 2016 3 次提交

T

Making use of GetSystemTimePreciseAsFileTime dynamic - code review fixes · 57a95a70
由 Tomas Kolda 提交于 1月 27, 2016

57a95a70
T
Making use of GetSystemTimePreciseAsFileTime dynamic to not · 502d41f1
由 Tomas Kolda 提交于 1月 27, 2016
```
break compatibility with Windows 7. The issue with rotated logs
was fixed other way.
```
502d41f1

Enable per-request buffer allocation in RandomAccessFile · 36300fbb

由 Dmitri Smirnov 提交于 2月 01, 2016

 This change impacts only non-buffered I/O on Windows.
 Currently, there is a buffer per RandomAccessFile
 instance that is protected by a lock. The reason we
 maintain the buffer is non-buffered I/O requires an aligned
 buffer to work.
 XPerf traces demonstrate that we accumulate a considerable
 wait time while waiting for that lock.
 This change enables to set random access buffer size to zero
 which would indicate a per request allocation.
 We are expecting that allocation expense would be much less than
 I/O costs plus wait time due to the fact that the memory heap
 would tend to re-use page aligned allocations especially with the
 use of Jemalloc.
 This change does not affect buffer use as a read_ahead_buffer for
 compaction purposes.

36300fbb

14 1月, 2016 1 次提交
- D
  Align statistics · ac50fd3a
  由 Dmitri Smirnov 提交于 1月 13, 2016
```
  Use Yield macro to make it a little more portable between platforms.
```
  ac50fd3a
05 1月, 2016 1 次提交
- M
  
  Fix failing assertion in logger on Windows when the disk is full. · 92d0850f
  由 Marek Kurdej 提交于 1月 05, 2016
  
  92d0850f
26 12月, 2015 1 次提交

support for concurrent adds to memtable · 7d87f027

由 Nathan Bronson 提交于 8月 14, 2015

Summary:
This diff adds support for concurrent adds to the skiplist memtable
implementations. Memory allocation is made thread-safe by the addition of
a spinlock, with small per-core buffers to avoid contention. Concurrent
memtable writes are made via an additional method and don't impose a
performance overhead on the non-concurrent case, so parallelism can be
selected on a per-batch basis.

Write thread synchronization is an increasing bottleneck for higher levels
of concurrency, so this diff adds --enable_write_thread_adaptive_yield
(default off). This feature causes threads joining a write batch
group to spin for a short time (default 100 usec) using sched_yield,
rather than going to sleep on a mutex. If the timing of the yield calls
indicates that another thread has actually run during the yield then
spinning is avoided. This option improves performance for concurrent
situations even without parallel adds, although it has the potential to
increase CPU usage (and the heuristic adaptation is not yet mature).

Parallel writes are not currently compatible with
inplace updates, update callbacks, or delete filtering.
Enable it with --allow_concurrent_memtable_write (and
--enable_write_thread_adaptive_yield). Parallel memtable writes
are performance neutral when there is no actual parallelism, and in
my experiments (SSD server-class Linux and varying contention and key
sizes for fillrandom) they are always a performance win when there is
more than one thread.

Statistics are updated earlier in the write path, dropping the number
of DB mutex acquisitions from 2 to 1 for almost all cases.

This diff was motivated and inspired by Yahoo's cLSM work. It is more
conservative than cLSM: RocksDB's write batch group leader role is
preserved (along with all of the existing flush and write throttling
logic) and concurrent writers are blocked until all memtable insertions
have completed and the sequence number has been advanced, to preserve
linearizability.

My test config is "db_bench -benchmarks=fillrandom -threads=$T
-batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000
--block_size=16384 --allow_concurrent_memtable_write" on a two-socket
Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive. With 1
thread I get ~440Kops/sec. Peak performance for 1 socket (numactl
-N1) is slightly more than 1Mops/sec, at 16 threads. Peak performance
across both sockets happens at 30 threads, and is ~900Kops/sec, although
with fewer threads there is less performance loss when the system has
background work.

Test Plan:
1. concurrent stress tests for InlineSkipList and DynamicBloom
2. make clean; make check
3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
6. make clean; OPT=-DROCKSDB_LITE make check
7. verify no perf regressions when disabled

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba

Differential Revision: https://reviews.facebook.net/D50589

7d87f027

15 12月, 2015 1 次提交

Running manual compactions in parallel with other automatic or manual... · 030215bf

由 Venkatesh Radhakrishnan 提交于 12月 14, 2015

Running manual compactions in parallel with other automatic or manual compactions in restricted cases

Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.

Test Plan: Rocksdb regression + new tests + valgrind

Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47973

030215bf

12 12月, 2015 1 次提交

Enable MS compiler warning c4244. · 236fe21c

由 Dmitri Smirnov 提交于 11月 19, 2015

  Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.

236fe21c

11 12月, 2015 1 次提交
- C
  
  fix typos in comments · c30b4995
  由 charsyam 提交于 12月 11, 2015
  
  c30b4995
09 12月, 2015 1 次提交
- Y
  Fix up VS 15 build. · 78de0c92
  由 yuslepukhin 提交于 12月 08, 2015
```
 Fix warnings
 Take advantage of native snprintf on VS 15
```
  78de0c92
24 11月, 2015 1 次提交

Enable C4267 warning · 41b32c60

由 Vasili Svirski 提交于 11月 15, 2015

* conversion from 'size_t' to 'type', by add static_cast

Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful

41b32c60

21 11月, 2015 1 次提交
- Y
  
  Build on Visual Studio 2015 Update 1 · 047bd22a
  由 yuslepukhin 提交于 11月 20, 2015
  
  047bd22a
17 11月, 2015 3 次提交
- D
  
  Remove headers from the cc since they are in the module's header. · 314f6219
  由 Dmitri Smirnov 提交于 11月 16, 2015
  
  314f6219
- D
  
  Add necessary headers after cpplint rearranged includes · 472c7400
  由 Dmitri Smirnov 提交于 11月 16, 2015
  
  472c7400
- I
  Lint everything · a163cc2d
  由 Islam AbdelRahman 提交于 11月 16, 2015
```
Summary:
```
  arc2 lint --everything
```

run the linter on the whole code repo to fix exisitng lint issues

Test Plan: make check -j64

Reviewers: sdong, rven, anthony, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D50769
```
  a163cc2d
11 11月, 2015 2 次提交

Make use of portable `uint64_t` type to make possible file access · 5270b33b

由 Dmitri Smirnov 提交于 11月 10, 2015

  in 64-bit.

  Currently, a signed off_t type is being used for the following
  interfaces for both offset and the length in bytes:
  * `Allocate`
  * `RangeSync`

  On Linux `off_t` is automatically either 32 or 64-bit depending on
  the platform. On Windows it is always a 32-bit signed long which
  limits file access and in particular space pre-allocation
  to effectively 2 Gb.

  Proposal is to replace off_t with uint64_t as a portable type
  always access files with 64-bit interfaces.

  May need to modify posix code but lack resources to test it.

5270b33b

Make use of portable `uint64_t` type to make possible file access · 5421c972

由 Dmitri Smirnov 提交于 11月 10, 2015

  in 64-bit.

  Currently, a signed off_t type is being used for the following
  interfaces for both offset and the length in bytes:
  * `Allocate`
  * `RangeSync`

  On Linux `off_t` is automatically either 32 or 64-bit depending on
  the platform. On Windows it is always a 32-bit signed long which
  limits file access and in particular space pre-allocation
  to effectively 2 Gb.

  Proposal is to replace off_t with uint64_t as a portable type
  always access files with 64-bit interfaces.

  May need to modify posix code but lack resources to test it.

5421c972

30 10月, 2015 1 次提交

"make format" in some recent commits · 296c3a1f

由 sdong 提交于 10月 29, 2015

Summary: Run "make format" for some recent commits.

Test Plan: Build and run tests

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49707

296c3a1f

28 10月, 2015 1 次提交

Implement smart buffer management. · 6fbc4f9f

由 Dmitri Smirnov 提交于 10月 27, 2015

  introduce a new DBOption random_access_max_buffer_size to limit
  the size of the random access buffer used for unbuffered access.
  Implement read ahead buffering when enabled.
  To that effect propagate compaction_readahead_size and the new option
  to the env options to make it available for the implementation.
  Add Hint() override so SetupForCompaction() call would call Hint()
  readahead can now be setup from both Hint() and EnableReadAhead()
  Add new option random_access_max_buffer_size support
  db_bench, options_helper to make it string parsable
  and the unit test.

6fbc4f9f

16 10月, 2015 1 次提交

Allow users to disable some kill points in db_stress · e1a5ff85

由 sdong 提交于 10月 14, 2015

Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.

Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
 --kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync

Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48735

e1a5ff85

14 10月, 2015 1 次提交
- P
  
  move debug variable under ifndef · 91c041e5
  由 Praveen Rao 提交于 10月 13, 2015
  
  91c041e5
13 10月, 2015 1 次提交
- P
  
  Fixing mutex to not use unique_lock · a1d37602
  由 Praveen Rao 提交于 10月 12, 2015
  
  a1d37602
07 10月, 2015 1 次提交

Mmap reads should not return error if reading past file · e95b703b

由 Dmitri Smirnov 提交于 10月 06, 2015

Summary:
  This mirrors  https://reviews.facebook.net/D45645
  Currently, mmap returns IOError when user tries to read
  data past the end of the file. This diff changes the behavior.
  Now, we return just the bytes that we can, and report the size
  we returned via a Slice result. This is consistent with non-mmap
  behavior and also pread() system call.

e95b703b

kvdb / rocksdb 12 个月 前同步成功

kvdb / rocksdb
12 个月前同步成功