1. 21 7月, 2016 1 次提交
    • I
      Introduce FullMergeV2 (eliminate memcpy from merge operators) · 68a8e6b8
      Islam AbdelRahman 提交于
      Summary:
      This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>
      
      This diff is stacked on top of D56493 and D56511
      
      In this diff we
      - Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
      - Replace std::deque<std::string> with std::vector<Slice> to pass operands
      - Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
      - Allow FullMergeV2 output to be an existing operand
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
      readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
      readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
      readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
      readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s
      
      [master]
      readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
      readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
      readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
      readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
      readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
      ```
      
      ```
      [Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000
      
      [FullMergeV2]
      readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
      readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
      readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
      readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
      readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s
      
      [master]
      readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
      readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
      readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
      readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
      readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s
      
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]
      
      [FullMergeV2]
      $ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
      readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
      readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
      readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
      readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s
      
      [master]
      readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
      readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
      readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
      readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
      readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
      ```
      
      ```
      [Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]
      
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
      
      [FullMergeV2]
      readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
      readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
      readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
      readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
      readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s
      
      [master]
      readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
      readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
      readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
      readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
      readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: lovro, andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57075
      68a8e6b8
  2. 14 6月, 2016 1 次提交
  3. 04 5月, 2016 1 次提交
    • I
      Fix Iterator::Prev memory pinning bug · ff4b3fb5
      Islam AbdelRahman 提交于
      Summary: We should not use IterKey::SetKey with copy = false except if we are pinning the iterator thru it's life time, otherwise we may release the temporarily pinned blocks and in this case the IterKey will be pointing to freed memory
      
      Test Plan: added a new test
      
      Reviewers: sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D57561
      ff4b3fb5
  4. 03 5月, 2016 1 次提交
    • I
      Eliminate memcpy in Iterator::Prev() by pinning blocks for keys spanning multiple blocks · 6e801b0b
      Islam AbdelRahman 提交于
      Summary:
      This diff is stacked on top of this diff https://reviews.facebook.net/D56493
      The current Iterator::Prev() implementation need to copy every value since the underlying Iterator may move after reading the value.
      This can be optimized by making sure that the block containing the value is pinned until the Iterator move. which will improve the throughput by up to 1.5X
      
      master
      ```
      ==> 1000000_Keys_100Byte.txt <==
      readreverse  :       0.449 micros/op 2225887 ops/sec;  246.2 MB/s
      readreverse  :       0.433 micros/op 2311508 ops/sec;  255.7 MB/s
      readreverse  :       0.436 micros/op 2294335 ops/sec;  253.8 MB/s
      readreverse  :       0.471 micros/op 2121295 ops/sec;  234.7 MB/s
      readreverse  :       0.465 micros/op 2152227 ops/sec;  238.1 MB/s
      readreverse  :       0.454 micros/op 2203011 ops/sec;  243.7 MB/s
      readreverse  :       0.451 micros/op 2216095 ops/sec;  245.2 MB/s
      readreverse  :       0.462 micros/op 2162447 ops/sec;  239.2 MB/s
      readreverse  :       0.476 micros/op 2099151 ops/sec;  232.2 MB/s
      readreverse  :       0.472 micros/op 2120710 ops/sec;  234.6 MB/s
      
      avg : 242.34 MB/s
      
      ==> 1000000_Keys_1KB.txt <==
      readreverse  :       1.013 micros/op 986793 ops/sec;  978.7 MB/s
      readreverse  :       0.942 micros/op 1061136 ops/sec; 1052.5 MB/s
      readreverse  :       0.951 micros/op 1051901 ops/sec; 1043.3 MB/s
      readreverse  :       0.932 micros/op 1072894 ops/sec; 1064.1 MB/s
      readreverse  :       1.024 micros/op 976720 ops/sec;  968.7 MB/s
      readreverse  :       0.935 micros/op 1069169 ops/sec; 1060.4 MB/s
      readreverse  :       1.012 micros/op 988132 ops/sec;  980.1 MB/s
      readreverse  :       0.962 micros/op 1039579 ops/sec; 1031.1 MB/s
      readreverse  :       0.991 micros/op 1008924 ops/sec; 1000.7 MB/s
      readreverse  :       1.004 micros/op 996144 ops/sec;  988.0 MB/s
      
      avg : 1016.76 MB/s
      
      ==> 1000000_Keys_10KB.txt <==
      readreverse  :       4.167 micros/op 239952 ops/sec; 2346.9 MB/s
      readreverse  :       4.070 micros/op 245713 ops/sec; 2403.3 MB/s
      readreverse  :       4.572 micros/op 218733 ops/sec; 2139.4 MB/s
      readreverse  :       4.497 micros/op 222388 ops/sec; 2175.2 MB/s
      readreverse  :       4.203 micros/op 237920 ops/sec; 2327.1 MB/s
      readreverse  :       4.206 micros/op 237756 ops/sec; 2325.5 MB/s
      readreverse  :       4.181 micros/op 239149 ops/sec; 2339.1 MB/s
      readreverse  :       4.157 micros/op 240552 ops/sec; 2352.8 MB/s
      readreverse  :       4.187 micros/op 238848 ops/sec; 2336.1 MB/s
      readreverse  :       4.106 micros/op 243575 ops/sec; 2382.4 MB/s
      
      avg : 2312.78 MB/s
      
      ==> 100000_Keys_100KB.txt <==
      readreverse  :      41.281 micros/op 24224 ops/sec; 2366.0 MB/s
      readreverse  :      39.722 micros/op 25175 ops/sec; 2458.9 MB/s
      readreverse  :      40.319 micros/op 24802 ops/sec; 2422.5 MB/s
      readreverse  :      39.762 micros/op 25149 ops/sec; 2456.4 MB/s
      readreverse  :      40.916 micros/op 24440 ops/sec; 2387.1 MB/s
      readreverse  :      41.188 micros/op 24278 ops/sec; 2371.4 MB/s
      readreverse  :      40.061 micros/op 24962 ops/sec; 2438.1 MB/s
      readreverse  :      40.221 micros/op 24862 ops/sec; 2428.4 MB/s
      readreverse  :      40.084 micros/op 24947 ops/sec; 2436.7 MB/s
      readreverse  :      40.655 micros/op 24597 ops/sec; 2402.4 MB/s
      
      avg : 2416.79 MB/s
      
      ==> 10000_Keys_1MB.txt <==
      readreverse  :     298.038 micros/op 3355 ops/sec; 3355.3 MB/s
      readreverse  :     335.001 micros/op 2985 ops/sec; 2985.1 MB/s
      readreverse  :     286.956 micros/op 3484 ops/sec; 3484.9 MB/s
      readreverse  :     329.954 micros/op 3030 ops/sec; 3030.8 MB/s
      readreverse  :     306.428 micros/op 3263 ops/sec; 3263.5 MB/s
      readreverse  :     330.749 micros/op 3023 ops/sec; 3023.5 MB/s
      readreverse  :     328.903 micros/op 3040 ops/sec; 3040.5 MB/s
      readreverse  :     324.853 micros/op 3078 ops/sec; 3078.4 MB/s
      readreverse  :     320.488 micros/op 3120 ops/sec; 3120.3 MB/s
      readreverse  :     320.536 micros/op 3119 ops/sec; 3119.8 MB/s
      
      avg : 3150.21 MB/s
      ```
      
      After memcpy elimination
      ```
      
      ==> 1000000_Keys_100Byte.txt <==
      readreverse  :       0.395 micros/op 2529890 ops/sec;  279.9 MB/s
      readreverse  :       0.368 micros/op 2715922 ops/sec;  300.5 MB/s
      readreverse  :       0.384 micros/op 2603929 ops/sec;  288.1 MB/s
      readreverse  :       0.375 micros/op 2663286 ops/sec;  294.6 MB/s
      readreverse  :       0.357 micros/op 2802180 ops/sec;  310.0 MB/s
      readreverse  :       0.363 micros/op 2757684 ops/sec;  305.1 MB/s
      readreverse  :       0.372 micros/op 2689603 ops/sec;  297.5 MB/s
      readreverse  :       0.379 micros/op 2638599 ops/sec;  291.9 MB/s
      readreverse  :       0.375 micros/op 2663803 ops/sec;  294.7 MB/s
      readreverse  :       0.375 micros/op 2665579 ops/sec;  294.9 MB/s
      
      avg: 295.72 MB/s (1.22 X)
      
      ==> 1000000_Keys_1KB.txt <==
      readreverse  :       0.879 micros/op 1138112 ops/sec; 1128.8 MB/s
      readreverse  :       0.842 micros/op 1187998 ops/sec; 1178.3 MB/s
      readreverse  :       0.837 micros/op 1194915 ops/sec; 1185.1 MB/s
      readreverse  :       0.845 micros/op 1182983 ops/sec; 1173.3 MB/s
      readreverse  :       0.877 micros/op 1140308 ops/sec; 1131.0 MB/s
      readreverse  :       0.849 micros/op 1177581 ops/sec; 1168.0 MB/s
      readreverse  :       0.915 micros/op 1093284 ops/sec; 1084.3 MB/s
      readreverse  :       0.863 micros/op 1159418 ops/sec; 1149.9 MB/s
      readreverse  :       0.895 micros/op 1117670 ops/sec; 1108.5 MB/s
      readreverse  :       0.852 micros/op 1174116 ops/sec; 1164.5 MB/s
      
      avg: 1147.17 MB/s (1.12 X)
      
      ==> 1000000_Keys_10KB.txt <==
      readreverse  :       3.870 micros/op 258386 ops/sec; 2527.2 MB/s
      readreverse  :       3.568 micros/op 280296 ops/sec; 2741.5 MB/s
      readreverse  :       4.005 micros/op 249694 ops/sec; 2442.2 MB/s
      readreverse  :       3.550 micros/op 281719 ops/sec; 2755.5 MB/s
      readreverse  :       3.562 micros/op 280758 ops/sec; 2746.1 MB/s
      readreverse  :       3.507 micros/op 285125 ops/sec; 2788.8 MB/s
      readreverse  :       3.463 micros/op 288739 ops/sec; 2824.1 MB/s
      readreverse  :       3.428 micros/op 291734 ops/sec; 2853.4 MB/s
      readreverse  :       3.553 micros/op 281491 ops/sec; 2753.2 MB/s
      readreverse  :       3.535 micros/op 282885 ops/sec; 2766.9 MB/s
      
      avg : 2719.89 MB/s (1.17 X)
      
      ==> 100000_Keys_100KB.txt <==
      readreverse  :      22.815 micros/op 43830 ops/sec; 4281.0 MB/s
      readreverse  :      29.957 micros/op 33381 ops/sec; 3260.4 MB/s
      readreverse  :      25.334 micros/op 39473 ops/sec; 3855.4 MB/s
      readreverse  :      23.037 micros/op 43409 ops/sec; 4239.8 MB/s
      readreverse  :      27.810 micros/op 35958 ops/sec; 3512.1 MB/s
      readreverse  :      30.327 micros/op 32973 ops/sec; 3220.6 MB/s
      readreverse  :      29.704 micros/op 33665 ops/sec; 3288.2 MB/s
      readreverse  :      29.423 micros/op 33987 ops/sec; 3319.6 MB/s
      readreverse  :      23.334 micros/op 42856 ops/sec; 4185.9 MB/s
      readreverse  :      29.969 micros/op 33368 ops/sec; 3259.1 MB/s
      
      avg : 3642.21 MB/s (1.5 X)
      
      ==> 10000_Keys_1MB.txt <==
      readreverse  :     244.748 micros/op 4085 ops/sec; 4085.9 MB/s
      readreverse  :     230.208 micros/op 4343 ops/sec; 4344.0 MB/s
      readreverse  :     235.655 micros/op 4243 ops/sec; 4243.6 MB/s
      readreverse  :     235.730 micros/op 4242 ops/sec; 4242.2 MB/s
      readreverse  :     237.346 micros/op 4213 ops/sec; 4213.3 MB/s
      readreverse  :     227.306 micros/op 4399 ops/sec; 4399.4 MB/s
      readreverse  :     194.957 micros/op 5129 ops/sec; 5129.4 MB/s
      readreverse  :     238.359 micros/op 4195 ops/sec; 4195.4 MB/s
      readreverse  :     221.588 micros/op 4512 ops/sec; 4513.0 MB/s
      readreverse  :     235.911 micros/op 4238 ops/sec; 4239.0 MB/s
      
      avg : 4360.52 MB/s (1.38 X)
      ```
      
      Test Plan: COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: andrewkr, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56511
      6e801b0b
  5. 29 4月, 2016 1 次提交
  6. 27 4月, 2016 1 次提交
    • I
      Introduce PinnedIteratorsManager (Reduce PinData() overhead / Refactor PinData) · d719b095
      Islam AbdelRahman 提交于
      Summary:
      While trying to reuse PinData() / ReleasePinnedData() .. to optimize away some memcpys I realized that there is a significant overhead for using PinData() / ReleasePinnedData if they were called many times.
      This diff refactor the pinning logic by introducing PinnedIteratorsManager a centralized component that will be created once and will be notified whenever we need to Pin an Iterator. This implementation have much less overhead than the original implementation
      
      Test Plan:
      make check -j64
      COMPILE_WITH_ASAN=1 make check -j64
      
      Reviewers: yhchiang, sdong, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56493
      d719b095
  7. 02 4月, 2016 1 次提交
    • I
      Eliminate std::deque initialization while iterating over merge operands · 8a1a603f
      Islam AbdelRahman 提交于
      Summary:
      This patch is similar to D52563, When we iterate over a DB with merge operands we keep creating std::queue to store the operands, optimize this by reusing merge_operands_ data member
      
      Before the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.757 micros/op 266141 ops/sec;   29.4 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.413 micros/op 2423538 ops/sec;  268.1 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.451 micros/op 2219071 ops/sec;  245.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.420 micros/op 2382039 ops/sec;  263.5 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.408 micros/op 2452017 ops/sec;  271.3 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.947 micros/op 253376 ops/sec;   28.0 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.441 micros/op 2266473 ops/sec;  250.7 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.471 micros/op 2122033 ops/sec;  234.8 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.440 micros/op 2271407 ops/sec;  251.3 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.429 micros/op 2331471 ops/sec;  257.9 MB/s
      ```
      
      with the patch
      
      ```
      ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq" --db="/dev/shm/bench_merge_memcpy_on_the_fly/" --merge_operator="put" --merge_keys=10000 --num=10000
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       4.080 micros/op 245092 ops/sec;   27.1 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.308 micros/op 3241843 ops/sec;  358.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.312 micros/op 3200408 ops/sec;  354.0 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.332 micros/op 3013962 ops/sec;  333.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.300 micros/op 3328017 ops/sec;  368.2 MB/s
      
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      mergerandom  :       3.973 micros/op 251705 ops/sec;   27.8 MB/s ( updates:10000)
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.320 micros/op 3123752 ops/sec;  345.6 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.335 micros/op 2986641 ops/sec;  330.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.339 micros/op 2950047 ops/sec;  326.4 MB/s
      DB path: [/dev/shm/bench_merge_memcpy_on_the_fly/]
      readseq      :       0.319 micros/op 3131565 ops/sec;  346.4 MB/s
      ```
      
      Test Plan: make check -j64
      
      Reviewers: yhchiang, andrewkr, sdong
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D56031
      8a1a603f
  8. 12 3月, 2016 1 次提交
    • I
      Aggregate hot Iterator counters in LocalStatistics (DBIter::Next perf regression) · 580fede3
      Islam AbdelRahman 提交于
      Summary:
      This patch bump the counters in the frequent code path DBIter::Next() / DBIter::Prev() in a local data members and send them to Statistics when the iterator is destroyed
      A better solution will be to have thread_local implementation for Statistics
      
      New performance
      ```
      readseq      :       0.035 micros/op 28597881 ops/sec; 3163.7 MB/s
           1,851,568,819      stalled-cycles-frontend   #   31.29% frontend cycles idle    [49.86%]
             884,929,823      stalled-cycles-backend    #   14.95% backend  cycles idle    [50.21%]
      readreverse  :       0.071 micros/op 14077393 ops/sec; 1557.3 MB/s
           3,239,575,993      stalled-cycles-frontend   #   27.36% frontend cycles idle    [49.96%]
           1,558,253,983      stalled-cycles-backend    #   13.16% backend  cycles idle    [50.14%]
      
      ```
      
      Existing performance
      
      ```
      readreverse  :       0.174 micros/op 5732342 ops/sec;  634.1 MB/s
          20,570,209,389      stalled-cycles-frontend   #   70.71% frontend cycles idle    [50.01%]
          18,422,816,837      stalled-cycles-backend    #   63.33% backend  cycles idle    [50.04%]
      
      readseq      :       0.119 micros/op 8400537 ops/sec;  929.3 MB/s
          15,634,225,844      stalled-cycles-frontend   #   79.07% frontend cycles idle    [49.96%]
          14,227,427,453      stalled-cycles-backend    #   71.95% backend  cycles idle    [50.09%]
      ```
      
      Test Plan: unit tests
      
      Reviewers: yhchiang, sdong, igor
      
      Reviewed By: sdong
      
      Subscribers: andrewkr, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55107
      580fede3
  9. 05 3月, 2016 1 次提交
    • S
      Change Property name from "rocksdb.current_version_number" to... · 294bdf9e
      sdong 提交于
      Change Property name from "rocksdb.current_version_number" to "rocksdb.current-super-version-number"
      
      Summary: I realized I again is wrong about the naming convention. Let me change it to the correct one.
      
      Test Plan: Run unit tests.
      
      Reviewers: IslamAbdelRahman, kradhakrishnan, yhchiang, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D55041
      294bdf9e
  10. 03 3月, 2016 1 次提交
    • S
      Add Iterator Property rocksdb.iterator.version_number · e79ad9e1
      sdong 提交于
      Summary: We want to provide a way to detect whether an iterator is stale and needs to be recreated. Add a iterator property to return version number.
      
      Test Plan: Add two unit tests for it.
      
      Reviewers: IslamAbdelRahman, yhchiang, anthony, kradhakrishnan, andrewkr
      
      Reviewed By: andrewkr
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54921
      e79ad9e1
  11. 02 3月, 2016 1 次提交
  12. 01 3月, 2016 1 次提交
    • S
      Introduce Iterator::GetProperty() and replace Iterator::IsKeyPinned() · 1f595414
      sdong 提交于
      Summary:
      Add Iterator::GetProperty(), a way for users to communicate with iterator, and turn Iterator::IsKeyPinned() with it.
      As a follow-up, I'll ask a property as the version number attached to the iterator
      
      Test Plan: Rerun existing tests and add a negative test case.
      
      Reviewers: yhchiang, andrewkr, kradhakrishnan, anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D54783
      1f595414
  13. 11 2月, 2016 1 次提交
  14. 10 2月, 2016 1 次提交
  15. 07 1月, 2016 1 次提交
    • I
      Optimize DBIter::Prev() by reducing stack overhead · 8c71eb5a
      Islam AbdelRahman 提交于
      Summary:
      It looks like we are spending significant amount of time creating std::deque<std::string> every time we do Iterator::Prev()
      
      {F921567}
      
      By using merge_operands_ as a DBIter data member w create it once and reduce this overhead and see ~30% performance improvement when using Iterator::Prev() on hot data
      
      Orignal performance
      
      ```
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readreverse" --db="/dev/shm/bench_prev_opt/" --use_existing_db --disable_auto_compactions
      readreverse  :       0.713 micros/op 1402219 ops/sec;  155.1 MB/s
      readreverse  :       0.609 micros/op 1641386 ops/sec;  181.6 MB/s
      readreverse  :       0.684 micros/op 1461150 ops/sec;  161.6 MB/s
      readreverse  :       0.629 micros/op 1589842 ops/sec;  175.9 MB/s
      readreverse  :       0.647 micros/op 1544530 ops/sec;  170.9 MB/s
      ```
      
      After optimization
      
      ```
      DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readreverse" --db="/dev/shm/bench_prev_opt/" --use_existing_db --disable_auto_compactions
      readreverse  :       0.488 micros/op 2051189 ops/sec;  226.9 MB/s
      readreverse  :       0.505 micros/op 1980892 ops/sec;  219.1 MB/s
      readreverse  :       0.541 micros/op 1846971 ops/sec;  204.3 MB/s
      readreverse  :       0.497 micros/op 2013612 ops/sec;  222.8 MB/s
      readreverse  :       0.480 micros/op 2082665 ops/sec;  230.4 MB/s
      ```
      
      Test Plan: make check -j64
      
      Reviewers: sdong, anthony, rven, igor, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: jkedgar, dhruba
      
      Differential Revision: https://reviews.facebook.net/D52563
      8c71eb5a
  16. 17 12月, 2015 1 次提交
    • I
      Introduce ReadOptions::pin_data (support zero copy for keys) · aececc20
      Islam AbdelRahman 提交于
      Summary:
      This patch update the Iterator API to introduce new functions that allow users to keep the Slices returned by key() valid as long as the Iterator is not deleted
      
      ReadOptions::pin_data : If true keep loaded blocks in memory as long as the iterator is not deleted
      Iterator::IsKeyPinned() : If true, this mean that the Slice returned by key() is valid as long as the iterator is not deleted
      
      Also add a new option BlockBasedTableOptions::use_delta_encoding to allow users to disable delta_encoding if needed.
      
      Benchmark results (using https://phabricator.fb.com/P20083553)
      
      ```
      // $ du -h /home/tec/local/normal.4K.Snappy/db10077
      // 6.1G    /home/tec/local/normal.4K.Snappy/db10077
      
      // $ du -h /home/tec/local/zero.8K.LZ4/db10077
      // 6.4G    /home/tec/local/zero.8K.LZ4/db10077
      
      // Benchmarks for shard db10077
      // _build/opt/rocks/benchmark/rocks_copy_benchmark \
      //      --normal_db_path="/home/tec/local/normal.4K.Snappy/db10077" \
      //      --zero_db_path="/home/tec/local/zero.8K.LZ4/db10077"
      
      // First run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                                 1.73s  576.97m
      // BM_StringPiece                                   103.74%      1.67s  598.55m
      // ============================================================================
      // Match rate : 1000000 / 1000000
      
      // Second run
      // ============================================================================
      // rocks/benchmark/RocksCopyBenchmark.cpp          relative  time/iter  iters/s
      // ============================================================================
      // BM_StringCopy                                              611.99ms     1.63
      // BM_StringPiece                                   203.76%   300.35ms     3.33
      // ============================================================================
      // Match rate : 1000000 / 1000000
      ```
      
      Test Plan: Unit tests
      
      Reviewers: sdong, igor, anthony, yhchiang, rven
      
      Reviewed By: rven
      
      Subscribers: dhruba, lovro, adsharma
      
      Differential Revision: https://reviews.facebook.net/D48999
      aececc20
  17. 01 12月, 2015 1 次提交
    • S
      Revert previous behavior of internal_key_skipped_count · 459c7fba
      sdong 提交于
      Summary: With recent commit 33e0c938, db iterator skips perf context counter internal_key_skipped_count when blindly issuing internal Next(). Now increment the counter by one when issuing this Next()
      
      Test Plan: Run all existing tests
      
      Reviewers: rven, yhchiang, IslamAbdelRahman, kradhakrishnan, igor, anthony
      
      Reviewed By: anthony
      
      Subscribers: yoshinorim, leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D51465
      459c7fba
  18. 25 11月, 2015 1 次提交
    • S
      Reduce extra key comparision in DBIter::Next() · 33e0c938
      sdong 提交于
      Summary: Now DBIter::Next() always compares with current key with itself first, which is unnecessary if the last key is not a merge key. I made the change and didn't see db_iter_test fails. Want to hear whether people have any idea what I miss.
      
      Test Plan: Run all unit tests
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48279
      33e0c938
  19. 06 11月, 2015 2 次提交
    • V
      Fix regression failure in PrefixTest.PrefixValid · ae7940b6
      Venkatesh Radhakrishnan 提交于
      Summary: Use IterKey to store prefix_start_ so that it doesn't get freed
      
      Test Plan: PrefixTest.PrefixValid
      
      Reviewers: anthony, IslamAbdelRahman
      
      Reviewed By: IslamAbdelRahman
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D50289
      ae7940b6
    • V
      Prefix-based iterating only shows keys in prefix · 9d50afc3
      Venkatesh Radhakrishnan 提交于
      Summary:
      MyRocks testing found an issue that while iterating over keys
      that are outside the prefix, sometimes wrong results were seen for keys
      outside the prefix. We now tighten the range of keys seen with a new
      read option called prefix_seen_at_start. This remembers the starting
      prefix and then compares it on a Next for equality of prefix. If they
      are from a different prefix, it sets valid to false.
      
      Test Plan: PrefixTest.PrefixValid
      
      Reviewers: IslamAbdelRahman, sdong, yhchiang, anthony
      
      Reviewed By: anthony
      
      Subscribers: spetrunia, hermanlee4, yoshinorim, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D50211
      9d50afc3
  20. 14 10月, 2015 1 次提交
    • S
      Seperate InternalIterator from Iterator · 35ad531b
      sdong 提交于
      Summary:
      Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.
      
      This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
      At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.
      
      Test Plan: Run all existing tests.
      
      Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven
      
      Reviewed By: rven
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D48549
      35ad531b
  21. 18 9月, 2015 1 次提交
    • A
      Support for SingleDelete() · 014fd55a
      Andres Noetzli 提交于
      Summary:
      This patch fixes #7460559. It introduces SingleDelete as a new database
      operation. This operation can be used to delete keys that were never
      overwritten (no put following another put of the same key). If an overwritten
      key is single deleted the behavior is undefined. Single deletion of a
      non-existent key has no effect but multiple consecutive single deletions are
      not allowed (see limitations).
      
      In contrast to the conventional Delete() operation, the deletion entry is
      removed along with the value when the two are lined up in a compaction. Note:
      The semantics are similar to @igor's prototype that allowed to have this
      behavior on the granularity of a column family (
      https://reviews.facebook.net/D42093 ). This new patch, however, is more
      aggressive when it comes to removing tombstones: It removes the SingleDelete
      together with the value whenever there is no snapshot between them while the
      older patch only did this when the sequence number of the deletion was older
      than the earliest snapshot.
      
      Most of the complex additions are in the Compaction Iterator, all other changes
      should be relatively straightforward. The patch also includes basic support for
      single deletions in db_stress and db_bench.
      
      Limitations:
      - Not compatible with cuckoo hash tables
      - Single deletions cannot be used in combination with merges and normal
        deletions on the same key (other keys are not affected by this)
      - Consecutive single deletions are currently not allowed (and older version of
        this patch supported this so it could be resurrected if needed)
      
      Test Plan: make all check
      
      Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
      
      Reviewed By: igor
      
      Subscribers: maykov, dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43179
      014fd55a
  22. 12 9月, 2015 1 次提交
    • M
      Add counters for seek/next/prev · aeb46126
      Manuel Ung 提交于
      Summary:
      There are currently no statistics on seeks, only on gets. This adds the following counters:
      
      rocksdb.number.db.seek
      rocksdb.number.db.next
      rocksdb.number.db.prev
      (number of calls)
      
      rocksdb.db.iterate.bytes.read
      (number of bytes read from key + value using seek/next/prev)
      
      rocksdb.number.keys.seek.found
      rocksdb.number.keys.next.found
      rocksdb.number.keys.prev.found
      (number of calls where seek/next/prev found a value)
      
      Test Plan:
      ./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5
      ./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5 -reverse_iterator
      
      Reviewers: yhchiang, rven, kradhakrishnan, IslamAbdelRahman, MarkCallaghan, sdong, igor
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D46605
      aeb46126
  23. 09 9月, 2015 1 次提交
    • A
      Added Equal method to Comparator interface · 6bdc484f
      Andres Noetzli 提交于
      Summary:
      In some cases, equality comparisons can be done more efficiently than three-way
      comparisons. There are quite a few places in the code where we only care about
      equality. This patch adds an Equal() method that defaults to using the
      Compare() method.
      
      Test Plan: make clean all check
      
      Reviewers: rven, anthony, yhchiang, igor, sdong
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D46233
      6bdc484f
  24. 27 8月, 2015 1 次提交
    • S
      DBIter to out extra keys with higher sequence numbers when changing direction... · d286b5df
      sdong 提交于
      DBIter to out extra keys with higher sequence numbers when changing direction from forward to backward
      
      Summary:
      When DBIter changes iterating direction from forward to backward, it might see some much larger keys with higher sequence ID. With this commit, these rows will be actively filtered out. It should fix existing disabled tests in db_iter_test.
      
      This may not be a perfect fix, but it introduces least impact on existing codes, in order to be safe.
      
      Test Plan:
      Enable existing tests and make sure they pass. Add a new test DBIterWithMergeIterTest.InnerMergeIteratorDataRace8.
      Also run all existing tests.
      
      Reviewers: yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
      
      Reviewed By: igor
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D45567
      d286b5df
  25. 20 8月, 2015 1 次提交
  26. 12 8月, 2015 1 次提交
    • A
      Removing duplicate code in db_bench/db_stress, fixing typos · 4249f159
      Andres Notzli 提交于
      Summary:
      While working on single delete support for db_bench, I realized that
      db_bench/db_stress contain a bunch of duplicate code related to
      copmression and found some typos. This patch removes duplicate code,
      typos and a redundant #ifndef in internal_stats.cc.
      
      Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
      
      Reviewers: yhchiang, sdong, rven, anthony, igor
      
      Reviewed By: igor
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43965
      4249f159
  27. 07 8月, 2015 1 次提交
    • A
      Fixing endless loop if seeking to end of key with seq num 0 · d7314ba7
      Andres Noetzli 提交于
      Summary:
      When seeking to the last occurrence of a key with sequence number 0, db_iter
      ends up in an endless loop because it seeks to type kValueTypeForSeek
      which is larger than kTypeDeletion/kTypeValue. Added test case that triggers
      the behavior.
      
      Test Plan: make clean all check
      
      Reviewers: igor, rven, anthony, yhchiang, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D43653
      d7314ba7
  28. 06 8月, 2015 1 次提交
    • S
      Fix misplaced position for reversing iterator direction while current key is a merge · 8e01bd11
      sdong 提交于
      Summary:
      While doing forward iterating, if current key is merge, internal iterator position is placed to the next key. If Prev() is called now, needs to do extra Prev() to recover the location.
      This is second attempt of fixing after reverting ec70fea4. This time shrink the fix to only merge key is the current key and avoid the reseeking logic for max_iterating skipping
      
      Test Plan: enable the two disabled tests and make sure they pass
      
      Reviewers: rven, IslamAbdelRahman, kradhakrishnan, tnovak, yhchiang
      
      Reviewed By: yhchiang
      
      Subscribers: leveldb, dhruba
      
      Differential Revision: https://reviews.facebook.net/D43557
      8e01bd11
  29. 08 7月, 2015 1 次提交
  30. 30 6月, 2015 1 次提交
    • T
      Fix a comparison in DBIter::FindPrevUserKey() · ec70fea4
      Tomislav Novak 提交于
      Summary:
      When seek target is a merge key (`kTypeMerge`), `DBIter::FindNextUserEntry()`
      advances the underlying iterator _past_ the current key (`saved_key_`); see
      `MergeValuesNewToOld()`. However, `FindPrevUserKey()` assumes that `iter_`
      points to an entry with the same user key as `saved_key_`. As a result,
      `it->Seek(key) && it->Prev()` can cause the iterator to be positioned at the
      _next_, instead of the previous, entry (new test, written by @lovro, reproduces
      the bug).
      
      This diff changes `FindPrevUserKey()` to also skip keys that are _greater_ than
      `saved_key_`.
      
      Test Plan: db_test
      
      Reviewers: igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: leveldb, dhruba, lovro
      
      Differential Revision: https://reviews.facebook.net/D40791
      ec70fea4
  31. 26 6月, 2015 1 次提交
  32. 25 4月, 2015 1 次提交
  33. 25 3月, 2015 1 次提交
    • A
      Adding stats for the merge and filter operation · 3d1a924f
      Anurag Indu 提交于
      Summary:
      We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
      We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.
      
      Test Plan: WIP
      
      Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba
      
      Differential Revision: https://reviews.facebook.net/D34377
      3d1a924f
  34. 27 2月, 2015 1 次提交
    • I
      rocksdb: Add missing override · 62247ffa
      Igor Sugak 提交于
      Summary:
      When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
      
      Prerequisites: bear and clang 3.5 build with extra tools
      
      ```lang=bash
      % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
      % clang-modernize -p . -include . -add-override
      % make format
      ```
      
      Test Plan:
      Make sure all tests are passing.
      ```lang=bash
      % #Use default fb code clang.
      % make check
      ```
      Verify less error and no missing override errors.
      ```lang=bash
      % # Have trunk clang present in path.
      % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
      ```
      
      Reviewers: igor, kradhakrishnan, rven, meyering, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D34077
      62247ffa
  35. 24 2月, 2015 1 次提交
  36. 05 12月, 2014 1 次提交
  37. 07 11月, 2014 1 次提交
    • I
      Turn -Wshadow back on · 9f20395c
      Igor Canadi 提交于
      Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
      
      Test Plan: compiles
      
      Reviewers: ljin, yhchiang, rven, sdong
      
      Reviewed By: sdong
      
      Subscribers: dhruba, leveldb
      
      Differential Revision: https://reviews.facebook.net/D28131
      9f20395c
  38. 31 10月, 2014 1 次提交
  39. 01 10月, 2014 1 次提交