1. 12 8月, 2018 18 次提交
  2. 11 8月, 2018 1 次提交
    • C
      bcache: fix error setting writeback_rate through sysfs interface · 46451874
      Coly Li 提交于
      Commit ea8c5356 ("bcache: set max writeback rate when I/O request
      is idle") changes struct bch_ratelimit member rate from uint32_t to
      atomic_long_t and uses atomic_long_set() in drivers/md/bcache/sysfs.c
      to set new writeback rate, after the input is converted from memory
      buf to long int by sysfs_strtoul_clamp().
      
      The above change has a problem because there is an implicit return
      inside sysfs_strtoul_clamp() so the following atomic_long_set()
      won't be called. This error is detected by 0day system with following
      snipped smatch warnings:
      
      drivers/md/bcache/sysfs.c:271 __cached_dev_store() error: uninitialized
      symbol 'v'.
      270  sysfs_strtoul_clamp(writeback_rate, v, 1, INT_MAX);
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      @271 atomic_long_set(&dc->writeback_rate.rate, v);
      
      This patch fixes the above error by using strtoul_safe_clamp() to
      convert the input buffer into a long int type result.
      
      Fixes: ea8c5356 ("bcache: set max writeback rate when I/O request is idle")
      Cc: Kai Krakow <kai@kaishome.de>
      Cc: Stefan Priebe <s.priebe@profihost.ag>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      46451874
  3. 10 8月, 2018 3 次提交
    • J
      null_blk: add lock drop/acquire annotation · 61884de0
      Jens Axboe 提交于
      sparse complains:
      
      drivers/block/null_blk_main.c:816:24: sparse: context imbalance in 'null_insert_page' - unexpected unlock
      
      Fix it by adding the necessary annotations to the function.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      61884de0
    • L
      Blk-throttle: reduce tail io latency when iops limit is enforced · 991f61fe
      Liu Bo 提交于
      When an application's iops has exceeded its cgroup's iops limit, surely it
      is throttled and kernel will set a timer for dispatching, thus IO latency
      includes the delay.
      
      However, the dispatch delay which is calculated by the limit and the
      elapsed jiffies is suboptimal.  As the dispatch delay is only calculated
      once the application's iops is (iops limit + 1), it doesn't need to wait
      any longer than the remaining time of the current slice.
      
      The difference can be proved by the following fio job and cgroup iops
      setting,
      -----
      $ echo 4 > /mnt/config/nullb/disk1/mbps    # limit nullb's bandwidth to 4MB/s for testing.
      $ echo "253:1 riops=100 rbps=max" > /sys/fs/cgroup/unified/cg1/io.max
      $ cat r2.job
      [global]
      name=fio-rand-read
      filename=/dev/nullb1
      rw=randread
      bs=4k
      direct=1
      numjobs=1
      time_based=1
      runtime=60
      group_reporting=1
      
      [file1]
      size=4G
      ioengine=libaio
      iodepth=1
      rate_iops=50000
      norandommap=1
      thinktime=4ms
      -----
      
      wo patch:
      file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
      fio-3.7-66-gedfc
      Starting 1 process
      
         read: IOPS=99, BW=400KiB/s (410kB/s)(23.4MiB/60001msec)
          slat (usec): min=10, max=336, avg=27.71, stdev=17.82
          clat (usec): min=2, max=28887, avg=5929.81, stdev=7374.29
           lat (usec): min=24, max=28901, avg=5958.73, stdev=7366.22
          clat percentiles (usec):
           |  1.00th=[    4],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
           | 30.00th=[    4], 40.00th=[    4], 50.00th=[    6], 60.00th=[11731],
           | 70.00th=[11863], 80.00th=[11994], 90.00th=[12911], 95.00th=[22676],
           | 99.00th=[23725], 99.50th=[23987], 99.90th=[23987], 99.95th=[25035],
           | 99.99th=[28967]
      
      w/ patch:
      file1: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
      fio-3.7-66-gedfc
      Starting 1 process
      
         read: IOPS=100, BW=400KiB/s (410kB/s)(23.4MiB/60005msec)
          slat (usec): min=10, max=155, avg=23.24, stdev=16.79
          clat (usec): min=2, max=12393, avg=5961.58, stdev=5959.25
           lat (usec): min=23, max=12412, avg=5985.91, stdev=5951.92
          clat percentiles (usec):
           |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    4],
           | 30.00th=[    4], 40.00th=[    5], 50.00th=[   47], 60.00th=[11863],
           | 70.00th=[11994], 80.00th=[11994], 90.00th=[11994], 95.00th=[11994],
           | 99.00th=[11994], 99.50th=[11994], 99.90th=[12125], 99.95th=[12125],
           | 99.99th=[12387]
      Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      991f61fe
    • G
      block: paride: pd: mark expected switch fall-throughs · 0a1c749d
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch cases
      where we are expecting to fall through.
      
      Addresses-Coverity-ID: 1056543 ("Missing break in switch")
      Addresses-Coverity-ID: 1056544 ("Missing break in switch")
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0a1c749d
  4. 09 8月, 2018 18 次提交
    • B
      block: Ensure that a request queue is dissociated from the cgroup controller · 24ecc358
      Bart Van Assche 提交于
      Several block drivers call alloc_disk() followed by put_disk() if
      something fails before device_add_disk() is called without calling
      blk_cleanup_queue(). Make sure that also for this scenario a request
      queue is dissociated from the cgroup controller. This patch avoids
      that loading the parport_pc, paride and pf drivers triggers the
      following kernel crash:
      
      BUG: KASAN: null-ptr-deref in pi_init+0x42e/0x580 [paride]
      Read of size 4 at addr 0000000000000008 by task modprobe/744
      Call Trace:
      dump_stack+0x9a/0xeb
      kasan_report+0x139/0x350
      pi_init+0x42e/0x580 [paride]
      pf_init+0x2bb/0x1000 [pf]
      do_one_initcall+0x8e/0x405
      do_init_module+0xd9/0x2f2
      load_module+0x3ab4/0x4700
      SYSC_finit_module+0x176/0x1a0
      do_syscall_64+0xee/0x2b0
      entry_SYSCALL_64_after_hwframe+0x42/0xb7
      Reported-by: NAlexandru Moise <00moses.alexander00@gmail.com>
      Fixes: a063057d ("block: Fix a race between request queue removal and the block cgroup controller") # v4.17
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NAlexandru Moise <00moses.alexander00@gmail.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Alexandru Moise <00moses.alexander00@gmail.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24ecc358
    • B
      block: Introduce blk_exit_queue() · 4cf6324b
      Bart Van Assche 提交于
      This patch does not change any functionality.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Alexandru Moise <00moses.alexander00@gmail.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4cf6324b
    • B
      blkcg: Introduce blkg_root_lookup() · 6bad9b21
      Bart Van Assche 提交于
      This new function will be used in a later patch to verify whether a
      queue has been dissociated from the cgroup controller before being
      released.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: Alexandru Moise <00moses.alexander00@gmail.com>
      Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6bad9b21
    • B
      block: Remove two superfluous #include directives · b1f4267c
      Bart Van Assche 提交于
      Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
      only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
      did not remove the corresponding #include directives. Since these
      include directives are no longer needed, remove them.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f4267c
    • J
      blk-mq: count the hctx as active before allocating tag · d263ed99
      Jianchao Wang 提交于
      Currently, we count the hctx as active after allocate driver tag
      successfully. If a previously inactive hctx try to get tag first
      time, it may fails and need to wait. However, due to the stale tag
      ->active_queues, the other shared-tags users are still able to
      occupy all driver tags while there is someone waiting for tag.
      Consequently, even if the previously inactive hctx is waked up, it
      still may not be able to get a tag and could be starved.
      
      To fix it, we count the hctx as active before try to allocate driver
      tag, then when it is waiting the tag, the other shared-tag users
      will reserve budget for it.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d263ed99
    • G
      block: bvec_nr_vecs() returns value for wrong slab · d6c02a9b
      Greg Edwards 提交于
      In commit ed996a52 ("block: simplify and cleanup bvec pool
      handling"), the value of the slab index is incremented by one in
      bvec_alloc() after the allocation is done to indicate an index value of
      0 does not need to be later freed.
      
      bvec_nr_vecs() was not updated accordingly, and thus returns the wrong
      value.  Decrement idx before performing the lookup.
      
      Fixes: ed996a52 ("block: simplify and cleanup bvec pool handling")
      Signed-off-by: NGreg Edwards <gedwards@ddn.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d6c02a9b
    • J
      Merge branch 'nvme-4.19' of git://git.infradead.org/nvme into for-4.19/block · 4884f8bf
      Jens Axboe 提交于
      Pull NVMe updates from Christoph:
      
      "This should be the last round of NVMe updates before the 4.19 merge
       window opens.  It conatins support for write protected (aka read-only)
       namespaces from Chaitanya, two ANA fixes from Hannes and a fabrics
       fix from Tal Shorer."
      
      * 'nvme-4.19' of git://git.infradead.org/nvme:
        nvme-fabrics: fix ctrl_loss_tmo < 0 to reconnect forever
        nvmet: add ns write protect support
        nvme: set gendisk read only based on nsattr
        nvme.h: add support for ns write protect definitions
        nvme.h: fixup ANA group descriptor format
        nvme: fixup crash on failed discovery
      4884f8bf
    • S
      bcache: trivial - remove tailing backslash in macro BTREE_FLAG · cbb751c0
      Shenghui Wang 提交于
      Remove the tailing backslash in macro BTREE_FLAG in btree.h
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cbb751c0
    • S
      bcache: make the pr_err statement used for ENOENT only in sysfs_attatch section · e921efeb
      Shenghui Wang 提交于
      The pr_err statement in the code for sysfs_attatch section would run
      for various error codes, which maybe confusing.
      
      E.g,
      
      Run the command twice:
         echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
      				/sys/block/bcache0/bcache/attach
         [the backing dev got attached on the first run]
         echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
      				/sys/block/bcache0/bcache/attach
      
      In dmesg, after the command run twice, we can get:
      	bcache: bch_cached_dev_attach() Can't attach sda6: already attached
      	bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
      a8df5e8be891
                     : cache set not found
      The first statement in the message was right, but the second was
      confusing.
      
      bch_cached_dev_attach has various pr_ statements for various error
      codes, except ENOENT.
      
      After the change, rerun above command twice:
      	echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
      			/sys/block/bcache0/bcache/attach
      	echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be891 > \
      			/sys/block/bcache0/bcache/attach
      
      In dmesg we only got:
      	bcache: bch_cached_dev_attach() Can't attach sda6: already attached
      No confusing "cache set not found" message anymore.
      
      And for some not exist SET-UUID:
      	echo 796b5c05-b03c-4bc7-9cbd-a8df5e8be898 > \
      			/sys/block/bcache0/bcache/attach
      In dmesg we can get:
      	bcache: __cached_dev_store() Can't attach 796b5c05-b03c-4bc7-9cbd-\
      a8df5e8be898
      	               : cache set not found
      Signed-off-by: NShenghui Wang <shhuiw@foxmail.com>
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e921efeb
    • C
      bcache: set max writeback rate when I/O request is idle · ea8c5356
      Coly Li 提交于
      Commit b1092c9a ("bcache: allow quick writeback when backing idle")
      allows the writeback rate to be faster if there is no I/O request on a
      bcache device. It works well if there is only one bcache device attached
      to the cache set. If there are many bcache devices attached to a cache
      set, it may introduce performance regression because multiple faster
      writeback threads of the idle bcache devices will compete the btree level
      locks with the bcache device who have I/O requests coming.
      
      This patch fixes the above issue by only permitting fast writebac when
      all bcache devices attached on the cache set are idle. And if one of the
      bcache devices has new I/O request coming, minimized all writeback
      throughput immediately and let PI controller __update_writeback_rate()
      to decide the upcoming writeback rate for each bcache device.
      
      Also when all bcache devices are idle, limited wrieback rate to a small
      number is wast of thoughput, especially when backing devices are slower
      non-rotation devices (e.g. SATA SSD). This patch sets a max writeback
      rate for each backing device if the whole cache set is idle. A faster
      writeback rate in idle time means new I/Os may have more available space
      for dirty data, and people may observe a better write performance then.
      
      Please note bcache may change its cache mode in run time, and this patch
      still works if the cache mode is switched from writeback mode and there
      is still dirty data on cache.
      
      Fixes: Commit b1092c9a ("bcache: allow quick writeback when backing idle")
      Cc: stable@vger.kernel.org #4.16+
      Signed-off-by: NColy Li <colyli@suse.de>
      Tested-by: NKai Krakow <kai@kaishome.de>
      Tested-by: NStefan Priebe <s.priebe@profihost.ag>
      Cc: Michael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ea8c5356
    • C
      bcache: add code comments for bset.c · b467a6ac
      Coly Li 提交于
      This patch tries to add code comments in bset.c, to make some
      tricky code and designment to be more comprehensible. Most information
      of this patch comes from the discussion between Kent and I, he
      offers very informative details. If there is any mistake
      of the idea behind the code, no doubt that's from me misrepresentation.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b467a6ac
    • C
      bcache: fix mistaken comments in request.c · 0cba2e71
      Coly Li 提交于
      This patch updates code comment in bch_keylist_realloc() by fixing
      incorrected function names, to make the code to be more comprehennsible.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0cba2e71
    • C
      bcache: fix mistaken code comments in bcache.h · cb329dec
      Coly Li 提交于
      This patch updates the code comment in struct cache with correct array
      names, to make the code to be more comprehensible.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cb329dec
    • C
      bcache: add a comment in super.c · e57fd746
      Coly Li 提交于
      This patch adds a line of code comment in super.c:register_bdev(), to
      make code to be more comprehensible.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e57fd746
    • C
      bcache: avoid unncessary cache prefetch bch_btree_node_get() · c2e8dcf7
      Coly Li 提交于
      In bch_btree_node_get() the read-in btree node will be partially
      prefetched into L1 cache for following bset iteration (if there is).
      But if the btree node read is failed, the perfetch operations will
      waste L1 cache space. This patch checkes whether read operation and
      only does cache prefetch when read I/O succeeded.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c2e8dcf7
    • C
      bcache: display rate debug parameters to 0 when writeback is not running · b4cb6efc
      Coly Li 提交于
      When writeback is not running, writeback rate should be 0, other value is
      misleading. And the following dyanmic writeback rate debug parameters
      should be 0 too,
      	rate, proportional, integral, change
      otherwise they are misleading when writeback is not running.
      Signed-off-by: NColy Li <colyli@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b4cb6efc
    • C
      bcache: do not check return value of debugfs_create_dir() · 78ac2107
      Coly Li 提交于
      Greg KH suggests that normal code should not care about debugfs. Therefore
      no matter successful or failed of debugfs_create_dir() execution, it is
      unncessary to check its return value.
      
      There are two functions called debugfs_create_dir() and check the return
      value, which are bch_debug_init() and closure_debug_init(). This patch
      changes these two functions from int to void type, and ignore return values
      of debugfs_create_dir().
      
      This patch does not fix exact bug, just makes things work as they should.
      Signed-off-by: NColy Li <colyli@suse.de>
      Suggested-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: stable@vger.kernel.org
      Cc: Kai Krakow <kai@kaishome.de>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      78ac2107
    • Z
      drivers/block/drbd: remove the null check for kmem_cache_destroy · a12fc00b
      zhong jiang 提交于
      kmem_cache_destroy has taken null pointer into account. So it is
      safe to drop the null check before calling the function.
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a12fc00b