- 21 5月, 2016 4 次提交
-
-
由 Sergey Senozhatsky 提交于
debug_stat sysfs is read-only and represents various debugging data that zram developers may need. This file is not meant to be used by anyone else: its content is not documented and will change any time w/o any notice. Therefore, the output of debug_stat file contains a version string. To avoid any confusion, we will increase the version number every time we modify the output. At the moment this file exports only one value -- the number of re-compressions, IOW, the number of times compression fast path has failed. This stat is temporary any will be useful in case if any per-cpu compression streams regressions will be reported. Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
Remove the internal part of max_comp_streams interface, since we switched to per-cpu streams. We will keep RW max_comp_streams attr around, because: a) we may (silently) switch back to idle compression streams list and don't want to disturb user space b) max_comp_streams attr must wait for the next 'lay off cycle'; we give user space 2 years to adjust before we remove/downgrade the attr, and there are already several attrs scheduled for removal in 4.11, so it's too late for max_comp_streams. This slightly change a user visible behaviour: - First, reading from max_comp_stream file now will always return the number of online CPUs. - Second, writing to max_comp_stream will not take any effect. Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.comSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
Remove idle streams list and keep compression streams in per-cpu data. This removes two contented spin_lock()/spin_unlock() calls from write path and also prevent write OP from being preempted while holding the compression stream, which can cause slow downs. For instance, let's assume that we have N cpus and N-2 max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in with the write requests: TASK1 TASK2 TASK3 zram_bvec_write() spin_lock find stream spin_unlock compress <<preempted>> zram_bvec_write() spin_lock find stream spin_unlock no_stream schedule zram_bvec_write() spin_lock find_stream spin_unlock no_stream schedule spin_lock release stream spin_unlock wake up TASK2 not only TASK2 and TASK3 will not get the stream, TASK1 will be preempted in the middle of its operation; while we would prefer it to finish compression and release the stream. Test environment: x86_64, 4 CPU box, 3G zram, lzo The following fio tests were executed: read, randread, write, randwrite, rw, randrw with the increasing number of jobs from 1 to 10. 4 streams 8 streams per-cpu =========================================================== jobs1 READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s READ: 434013KB/s 435153KB/s 439961KB/s WRITE: 433969KB/s 435109KB/s 439917KB/s READ: 403166KB/s 405139KB/s 403373KB/s WRITE: 403223KB/s 405197KB/s 403430KB/s jobs2 READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s READ: 981504KB/s 973906KB/s 1018.8MB/s WRITE: 981659KB/s 974060KB/s 1018.1MB/s READ: 937021KB/s 938976KB/s 987250KB/s WRITE: 934878KB/s 936830KB/s 984993KB/s jobs3 READ: 13280MB/s 13553MB/s 13553MB/s READ: 11534MB/s 11785MB/s 11755MB/s WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s jobs4 READ: 20244MB/s 20177MB/s 20344MB/s READ: 17886MB/s 17913MB/s 17835MB/s WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s jobs5 READ: 18663MB/s 18986MB/s 18823MB/s READ: 16659MB/s 16605MB/s 16954MB/s WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s jobs6 READ: 21017MB/s 20922MB/s 21162MB/s READ: 19022MB/s 19140MB/s 18770MB/s WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s jobs7 READ: 21103MB/s 20677MB/s 21482MB/s READ: 18522MB/s 18379MB/s 19443MB/s WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s jobs8 READ: 20463MB/s 20194MB/s 20862MB/s READ: 18178MB/s 17978MB/s 18299MB/s WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s jobs9 READ: 19692MB/s 19734MB/s 19334MB/s READ: 17678MB/s 18249MB/s 17666MB/s WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s jobs10 READ: 19730MB/s 19579MB/s 19492MB/s READ: 18028MB/s 18018MB/s 18221MB/s WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s perf stat 4 streams 8 streams per-cpu ==================================================================================================================== jobs1 stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%) stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%) instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22) branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546) branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%) jobs2 stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%) stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%) instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30) branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794) branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%) jobs3 stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%) stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%) instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34) branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027) branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%) jobs4 stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%) stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%) instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35) branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154) branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%) jobs5 stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%) stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%) instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36) branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213) branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%) jobs6 stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%) stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%) instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38) branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706) branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%) jobs7 stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%) stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%) instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38) branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822) branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%) jobs8 stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%) stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%) instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38) branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530) branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%) jobs9 stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%) stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%) instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38) branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151) branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%) jobs10 stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%) stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%) instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37) branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712) branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%) per-cpu streams tend to cause significantly less stalled cycles; execute less branches and hit less branch-misses. perf stat reported execution time 4 streams 8 streams per-cpu ==================================================================== jobs1 seconds elapsed 20.909073870 20.875670495 20.817838540 jobs2 seconds elapsed 18.529488399 18.720566469 16.356103108 jobs3 seconds elapsed 18.991159531 18.991340812 16.766216066 jobs4 seconds elapsed 19.560643828 19.551323547 16.246621715 jobs5 seconds elapsed 24.746498464 25.221646740 20.696112444 jobs6 seconds elapsed 28.258181828 28.289765505 22.885688857 jobs7 seconds elapsed 32.632490241 31.909125381 26.272753738 jobs8 seconds elapsed 35.651403851 36.027596308 29.108024711 jobs9 seconds elapsed 40.569362365 40.024227989 32.898204012 jobs10 seconds elapsed 44.673112304 43.874898137 35.632952191 Please see Link: http://marc.info/?l=linux-kernel&m=146166970727530 Link: http://marc.info/?l=linux-kernel&m=146174716719650 for more test results (under low memory conditions). Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sergey Senozhatsky 提交于
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to zs_create_pool(), so we can be more flexible, but, more importantly, we need this to switch zram to per-cpu compression streams -- zram will try to allocate handle with preemption disabled in a fast path and switch to a slow path (using different gfp mask) if the fast one has failed. Apart from that, this also align zs_malloc() interface with zspool/zbud. [sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask] Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfishSigned-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 5月, 2016 1 次提交
-
-
由 Joonsoo Kim 提交于
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [akpm@linux-foundation.org: fix comments, per Vlastimil] [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too] [sfr@canb.auug.org.au: sync ethernet driver changes] Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Manish Chopra <manish.chopra@qlogic.com> Cc: Yuval Mintz <yuval.mintz@qlogic.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 5月, 2016 1 次提交
-
-
由 Nicolas Dichtel 提交于
The attribute 0 is never used in drbd, so let's use it as pad attribute in netlink messages. This minimizes the patch. Note that this patch is only compile-tested. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2016 2 次提交
-
-
由 Ilya Dryomov 提交于
... instead of just returning an error. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
-
由 Ilya Dryomov 提交于
A while ago, commit 9875201e ("rbd: fix use-after free of rbd_dev->disk") fixed rbd unmap vs notify race by introducing an exported wrapper for flushing notifies and sticking it into do_rbd_remove(). A similar problem exists on the rbd map path, though: the watch is registered in rbd_dev_image_probe(), while the disk is set up quite a few steps later, in rbd_dev_device_setup(). Nothing prevents a notify from coming in and crashing on a NULL rbd_dev->disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 Call Trace: [<ffffffffa0508344>] rbd_watch_cb+0x34/0x180 [rbd] [<ffffffffa04bd290>] do_event_work+0x40/0xb0 [libceph] [<ffffffff8109d5db>] process_one_work+0x17b/0x470 [<ffffffff8109e3ab>] worker_thread+0x11b/0x400 [<ffffffff8109e290>] ? rescuer_thread+0x400/0x400 [<ffffffff810a5acf>] kthread+0xcf/0xe0 [<ffffffff810b41b3>] ? finish_task_switch+0x53/0x170 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 [<ffffffff81645dd8>] ret_from_fork+0x58/0x90 [<ffffffff810a5a00>] ? kthread_create_on_node+0x140/0x140 RIP [<ffffffffa050828a>] rbd_dev_refresh+0xfa/0x180 [rbd] If an error occurs during rbd map, we have to error out, potentially tearing down a watch. Just like on rbd unmap, notifies have to be flushed, otherwise rbd_watch_cb() may end up trying to read in the image header after rbd_dev_image_release() has run: Assertion failure in rbd_dev_header_info() at line 4722: rbd_assert(rbd_image_format_valid(rbd_dev->image_format)); Call Trace: [<ffffffff81cccee0>] ? rbd_parent_request_create+0x150/0x150 [<ffffffff81cd4e59>] rbd_dev_refresh+0x59/0x390 [<ffffffff81cd5229>] rbd_watch_cb+0x69/0x290 [<ffffffff81fde9bf>] do_event_work+0x10f/0x1c0 [<ffffffff81107799>] process_one_work+0x689/0x1a80 [<ffffffff811076f7>] ? process_one_work+0x5e7/0x1a80 [<ffffffff81132065>] ? finish_task_switch+0x225/0x640 [<ffffffff81107110>] ? pwq_dec_nr_in_flight+0x2b0/0x2b0 [<ffffffff81108c69>] worker_thread+0xd9/0x1320 [<ffffffff81108b90>] ? process_one_work+0x1a80/0x1a80 [<ffffffff8111b02d>] kthread+0x21d/0x2e0 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 [<ffffffff82022802>] ret_from_fork+0x22/0x40 [<ffffffff8111ae10>] ? kthread_stop+0x550/0x550 RIP [<ffffffff81ccd8f9>] rbd_dev_header_info+0xa19/0x1e30 To fix this, a) check if RBD_DEV_FLAG_EXISTS is set before calling revalidate_disk(), b) move ceph_osdc_flush_notifies() call into rbd_dev_header_unwatch_sync() to cover rbd map error paths and c) turn header read-in into a critical section. The latter also happens to take care of rbd map foo@bar vs rbd snap rm foo@bar race. Fixes: http://tracker.ceph.com/issues/15490Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
-
- 26 4月, 2016 1 次提交
-
-
由 Jeff Moyer 提交于
Simply creating a file system on an skd device, followed by mount and fstrim will result in errors in the logs and then a BUG(). Let's remove discard support from that driver. As far as I can tell, it hasn't worked right since it was merged. This patch also has a side-effect of cleaning up an unintentional shadowed declaration inside of skd_end_request. I tested to ensure that I can still do I/O to the device using xfstests ./check -g quick. I didn't do anything more extensive than that, though. Signed-off-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 15 4月, 2016 1 次提交
-
-
由 Ming Lei 提交于
Starting from commit e36f6204(block: split bios to max possible length), block core starts to split bio in the middle of bvec. Unfortunately loop dio/aio doesn't consider this situation, and always treat 'iter.iov_offset' as zero. Then filesystem corruption is observed. This patch figures out the offset of the base bvevc via 'bio->bi_iter.bi_bvec_done' and fixes the issue by passing the offset to iov iterator. Fixes: e36f6204 (block: split bios to max possible length) Cc: Keith Busch <keith.busch@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org (4.5) Signed-off-by: NMing Lei <ming.lei@canonical.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 4月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Now that we converted everything to the newer block write cache interface, kill off the queue flush_flags and queueable flush entries. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 13 4月, 2016 11 次提交
-
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
The driver calls it with 0 for flags, since it doesn't have a writeback cache. Just remove the call, as it's a no-op right now. Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Keith Busch 提交于
Only a single tags array anyway. Signed-off-by: NKeith Busch <keith.busch@intel.com> Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Ming Lin 提交于
We could kmalloc() the payload, so need the offset in page. Signed-off-by: NMing Lin <ming.l@ssi.samsung.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 06 4月, 2016 1 次提交
-
-
由 David Disseldorp 提交于
As of 5a60e876, RBD object request allocations are made via rbd_obj_request_create() with GFP_NOIO. However, subsequent OSD request allocations in rbd_osd_req_create*() use GFP_ATOMIC. With heavy page cache usage (e.g. OSDs running on same host as krbd client), rbd_osd_req_create() order-1 GFP_ATOMIC allocations have been observed to fail, where direct reclaim would have allowed GFP_NOIO allocations to succeed. Cc: stable@vger.kernel.org # 3.18+ Suggested-by: NVlastimil Babka <vbabka@suse.cz> Suggested-by: NNeil Brown <neilb@suse.com> Signed-off-by: NDavid Disseldorp <ddiss@suse.de> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 05 4月, 2016 2 次提交
-
-
由 Kirill A. Shutemov 提交于
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 3月, 2016 3 次提交
-
-
由 Geliang Tang 提交于
Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code. Signed-off-by: NGeliang Tang <geliangtang@163.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
Turn r_ops into a flexible array member to enable large, consisting of up to 16 ops, OSD requests. The use case is scattered writeback in cephfs and, as far as the kernel client is concerned, 16 is just a made up number. r_ops had size 3 for copyup+hint+write, but copyup is really a special case - it can only happen once. ceph_osd_request_cache is therefore stuffed with num_ops=2 requests, anything bigger than that is allocated with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which means either num_ops=1 or num_ops=2 for use_mempool=true - all existing users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with that. Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Yan, Zheng 提交于
This avoids defining large array of r_reply_op_{len,result} in in struct ceph_osd_request. Signed-off-by: NYan, Zheng <zyan@redhat.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 19 3月, 2016 2 次提交
-
-
由 Alexey Khoroshilov 提交于
exec_drive_taskfile() checks for dma mapping errors by comparison returned address with zero, while pci_dma_mapping_error() should be used. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Wenwei Tao 提交于
After register null_blk devices into lightnvm, we forget to add these devices to the the nullb_list, makes them invisible to the null_blk driver. Signed-off-by: NWenwei Tao <ww.tao0320@gmail.com> Fixes: a514379b ("null_blk: oops when initializing without lightnvm") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 18 3月, 2016 1 次提交
-
-
由 Joonsoo Kim 提交于
The success of CMA allocation largely depends on the success of migration and key factor of it is page reference count. Until now, page reference is manipulated by direct calling atomic functions so we cannot follow up who and where manipulate it. Then, it is hard to find actual reason of CMA allocation failure. CMA allocation should be guaranteed to succeed so finding offending place is really important. In this patch, call sites where page reference is manipulated are converted to introduced wrapper function. This is preparation step to add tracepoint to each page reference manipulation function. With this facility, we can easily find reason of CMA allocation failure. There is no functional change in this patch. In addition, this patch also converts reference read sites. It will help a second step that renames page._count to something else and prevents later attempt to direct access to it (Suggested by Andrew). Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 3月, 2016 3 次提交
-
-
由 Arnd Bergmann 提交于
gcc-6.0 found an ancient bug in the paride driver, which had a "module_param(verbose, bool, 0);" since before 2.6.12, but actually uses it to accept '0', '1' or '2' as arguments: drivers/block/paride/pd.c: In function 'pd_init_dev_parms': drivers/block/paride/pd.c:298:29: warning: comparison of constant '1' with boolean expression is always false [-Wbool-compare] #define DBMSG(msg) ((verbose>1)?(msg):NULL) In 2012, Rusty did a cleanup patch that also changed the type of the variable to 'bool', which introduced what is now a gcc warning. This changes the type back to 'int' and adapts the module_param() line instead, so it should work as documented in case anyone ever cares about running the ancient driver with debugging. Fixes: 90ab5ee9 ("module_param: make bool parameters really bool (drivers & misc)") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Rusty Russell <rusty@rustcorp.com.au> Cc: Tim Waugh <tim@cyberelk.net> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Jens Axboe <axboe@fb.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Valentin Rothberg 提交于
Commit d4366414 ("cpqarray: remove it from the kernel") removes the Kconfig option BLK_CPQ_DA and cpqarray. Remove the dead build rule in the Makefile. Signed-off-by: NValentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Bart Van Assche 提交于
Avoid that discard requests with size => PAGE_SIZE fail with -EIO. Refuse discard requests if the discard size is not a multiple of the page size. Fixes: 2dbe5495 ("brd: Refuse improperly aligned discard requests") Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: NJan Kara <jack@suse.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Robert Elliot <elliott@hp.com> Cc: stable <stable@vger.kernel.org> # v4.4+ Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 14 3月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
We disabled the ability to enable this driver back in October of 2013, we should be able to safely remove it at this point. The initial goal was to remove it in 3.15, so now is the time. Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 05 3月, 2016 1 次提交
-
-
由 Arnd Bergmann 提交于
The do_div() macro now checks its arguments for the correct type, and refuses anything other than u64, so we get a warning about nbd_ioctl passing in an loff_t: drivers/block/nbd.c: In function '__nbd_ioctl': drivers/block/nbd.c:757:77: error: comparison of distinct pointer types lacks a cast [-Werror] This changes the nbd code to use div_s64() instead, which takes a signed argument. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Fixes: 37091fdd ("nbd: Create size change events for userspace") Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 04 3月, 2016 4 次提交
-
-
由 Jens Axboe 提交于
We always return BLK_EH_RESET_TIMER, so no point in storing that in an integer. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Konrad Rzeszutek Wilk 提交于
The processes names are truncated to 17, while we had the length of the process as name 20 - which meant that while we filled it out with various details - the last 3 characters (which had the queue number) never surfaced to the user-space. To simplify this and be able to fit the device name, domain id, and the queue number we remove the 'blkback' from the name. Prior to this patch the device name is "blkback.<domid>.<name>" for example: blkback.8.xvda, blkback.11.hda. With the multiqueue block backend we add "-%d" for the queue. But sadly this is already way past the limit so it gets stripped. Possible solution had been identified by Ian: http://lists.xenproject.org/archives/html/xen-devel/2015-05/msg03516.html " If you are pressed for space then the "xvd" is probably a bit redundant in a string which starts blkbk. The guest may not even call the device xvdN (iirc BSD has another prefix) any how, so having blkback say so seems of limited use anyway. Since this seems to not include a partition number how does this work in the split partition scheme? (i.e. one where the guest is given xvda1 and xvda2 rather than xvda with a partition table) [It will be 'blkback.8.xvda1', and 'blkback.11.xvda2'] Perhaps something derived from one of the schemes in http://xenbits.xen.org/docs/unstable/misc/vbd-interface.txt might be a better fit? After a bit of discussion (see http://lists.xenproject.org/archives/html/xen-devel/2015-12/msg01588.html) we settled on dropping the "blback" part. This will make it possible to have the <domid>.<name>-<queue>: [1.xvda-0] [1.xvda-1] And we enough space to make it go up to: [32100.xvdfg9-5] Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Reported-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Jan Beulich 提交于
There's no reason to defer this until the connect phase, and in fact there are frontend implementations expecting this to be available earlier. Move it into the probe function. Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Bob Liu <bob.liu@oracle.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Jan Beulich 提交于
"max" is rather ambiguous and carries pretty little meaning, the more that there are also "max_queues" and "max_ring_page_order". Make this "max_indirect_segments" instead, and at once change the type from int to uint (to match the respective variable's type). Acked-by: NRoger Pau Monné <roger.pau@citrix.com> Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-