1. 06 10月, 2017 1 次提交
    • J
      Merge branch 'nvme-4.14' of git://git.infradead.org/nvme into for-linus · d7b544de
      Jens Axboe 提交于
      Pull NVMe fixes from Christoph:
      
      "A trivial one-liner from Martin to fix the visible of the uuid attr,
      and another one (originally from Abhishek Shah, rewritten by me) to fix
      the CMB addresses passed back to the controller in case of a system that
      remaps BAR addresses between host and device."
      d7b544de
  2. 04 10月, 2017 6 次提交
    • B
      bsg-lib: fix use-after-free under memory-pressure · eab40cf3
      Benjamin Block 提交于
      When under memory-pressure it is possible that the mempool which backs
      the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
      emergency buffers - in case it can't get a regular allocation. These
      buffers are preallocated and once they are also used, they are
      re-supplied with old finished requests from the same request_queue (see
      mempool_free()).
      
      The bug is, when re-supplying the emergency pool, the old requests are
      not again ran through the callback mempool_t->alloc(), and thus also not
      through the callback bsg_init_rq(). Thus we skip initialization, and
      while the sense-buffer still should be good, scsi_request->cmd might
      have become to be an invalid pointer in the meantime. When the request
      is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
      bsg will replace it with a custom allocated buffer, which is freed when
      the user's command is finished, thus it dangles afterwards. When next a
      command is sent by the user that has a smaller/similar CDB as
      BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
      scsi_request->__cmd, will not make a custom allocation, and write into
      undefined memory.
      
      Fix this by splitting bsg_init_rq() into two functions:
       - bsg_init_rq() is changed to only do the allocation of the
         sense-buffer, which is used to back the bsg job's reply buffer. This
         pointer should never change during the lifetime of a scsi_request, so
         it doesn't need re-initialization.
       - bsg_initialize_rq() is a new function that makes use of
         'struct request_queue's initialize_rq_fn callback (which was
         introduced in v4.12). This is always called before the request is
         given out via blk_get_request(). This function does the remaining
         initialization that was previously done in bsg_init_rq(), and will
         also do it when the request is taken from the emergency-pool of the
         backing mempool.
      
      Fixes: 50b4d485 ("bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer")
      Cc: <stable@vger.kernel.org> # 4.11+
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eab40cf3
    • C
      nvme-pci: Use PCI bus address for data/queues in CMB · 8969f1f8
      Christoph Hellwig 提交于
      Currently, NVMe PCI host driver is programming CMB dma address as
      I/O SQs addresses. This results in failures on systems where 1:1
      outbound mapping is not used (example Broadcom iProc SOCs) because
      CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
      try to access CMB using dma address.
      
      To have CMB working on systems without 1:1 outbound mapping, we
      program PCI bus address for I/O SQs instead of dma address. This
      approach will work on systems with/without 1:1 outbound mapping.
      
      Based on a report and previous patch from Abhishek Shah.
      
      Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available")
      Cc: stable@vger.kernel.org
      Reported-by: NAbhishek Shah <abhishek.shah@broadcom.com>
      Tested-by: NAbhishek Shah <abhishek.shah@broadcom.com>
      Reviewed-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      8969f1f8
    • O
      blk-mq-debugfs: fix device sched directory for default scheduler · 70e62f4b
      Omar Sandoval 提交于
      In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched
      directories if a default scheduler was already configured by
      blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do
      the same for the device-wide sched directory. Fix it.
      
      Fixes: d332ce09 ("blk-mq-debugfs: allow schedulers to register debugfs attributes")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      70e62f4b
    • J
      null_blk: change configfs dependency to select · 6cd1a6fe
      Jens Axboe 提交于
      A recent commit made null_blk depend on configfs, which is kind of
      annoying since you now have to find this dependency and enable that
      as well. Discovered this since I no longer had null_blk available
      on a box I needed to debug, since it got killed when the config
      updated after the configfs change was merged.
      
      Fixes: 3bf2bd20 ("nullb: add configfs interface")
      Reviewed-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6cd1a6fe
    • J
      blk-throttle: fix possible io stall when upgrade to max · 4f02fb76
      Joseph Qi 提交于
      There is a case which will lead to io stall. The case is described as
      follows.
      /test1
        |-subtest1
      /test2
        |-subtest2
      And subtest1 and subtest2 each has 32 queued bios already.
      
      Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
      bios as follows:
      1) tg=subtest1, do nothing;
      2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
      left, no need to schedule next dispatch;
      3) tg=subtest2, do nothing;
      4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
      left, no need to schedule next dispatch;
      5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
      test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
      to /; note that test1 and test2 each still has 16 queued bios left;
      6) tg=/, try to schedule next dispatch, but since disptime is now
      (update in tg_update_disptime, wait=0), pending timer is not scheduled
      in fact;
      7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
      32 left. test1 and test2 each has 16 queued bios;
      8) throtl_pending_timer_fn sees the left over bios, but could do
      nothing, because throtl_select_dispatch returns 0, and test1/test2 has
      no pending tg.
      
      The blktrace shows the following:
      8,32   0        0     2.539007641     0  m   N throtl upgrade to max
      8,32   0        0     2.539072267     0  m   N throtl /test2 dispatch nr_queued=16 read=0 write=16
      8,32   7        0     2.539077142     0  m   N throtl /test1 dispatch nr_queued=16 read=0 write=16
      
      So force schedule dispatch if there are pending children.
      Reviewed-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJoseph Qi <qijiang.qj@alibaba-inc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4f02fb76
    • W
      MAINTAINERS: update list for NBD · 38b249bc
      Wouter Verhelst 提交于
      nbd-general@sourceforge.net becomes nbd@other.debian.org, because
      sourceforge is just a spamtrap these days.
      Signed-off-by: NWouter Verhelst <w@uter.be>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      38b249bc
  3. 03 10月, 2017 1 次提交
    • J
      nbd: fix -ERESTARTSYS handling · 6e60a3bb
      Josef Bacik 提交于
      Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we
      got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which
      means we'd never requeue and just hang.  We really need to return the
      right value from the upper layer.
      
      Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t")
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6e60a3bb
  4. 01 10月, 2017 1 次提交
  5. 28 9月, 2017 1 次提交
    • C
      bcache: use llist_for_each_entry_safe() in __closure_wake_up() · a5f3d8a5
      Coly Li 提交于
      Commit 09b3efec ("bcache: Don't reinvent the wheel but use existing llist
      API") replaces the following while loop by llist_for_each_entry(),
      
      -
      -	while (reverse) {
      -		cl = container_of(reverse, struct closure, list);
      -		reverse = llist_next(reverse);
      -
      +	llist_for_each_entry(cl, reverse, list) {
       		closure_set_waiting(cl, 0);
       		closure_sub(cl, CLOSURE_WAITING + 1);
       	}
      
      This modification introduces a potential race by iterating a corrupted
      list. Here is how it happens.
      
      In the above modification, closure_sub() may wake up a process which is
      waiting on reverse list. If this process decides to wait again by calling
      closure_wait(), its cl->list will be added to another wait list. Then
      when llist_for_each_entry() continues to iterate next node, it will travel
      on another new wait list which is added in closure_wait(), not the
      original reverse list in __closure_wake_up(). It is more probably to
      happen on UP machine because the waked up process may preempt the process
      which wakes up it.
      
      Use llist_for_each_entry_safe() will fix the issue, the safe version fetch
      next node before waking up a process. Then the copy of next node will make
      sure list iteration stays on original reverse list.
      
      Fixes: 09b3efec ("bcache: Don't reinvent the wheel but use existing llist API")
      Signed-off-by: NColy Li <colyli@suse.de>
      Reported-by: NMichael Lyle <mlyle@lyle.org>
      Reviewed-by: NByungchul Park <byungchul.park@lge.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a5f3d8a5
  6. 27 9月, 2017 2 次提交
  7. 26 9月, 2017 15 次提交
  8. 25 9月, 2017 13 次提交