1. 12 6月, 2021 1 次提交
  2. 09 6月, 2021 3 次提交
    • D
      libnvdimm/pmem: Fix blk_cleanup_disk() usage · a624eb52
      Dan Williams 提交于
      The queue_to_disk() helper can not be used after del_gendisk()
      communicate @disk via the pgmap->owner.
      
      Otherwise, queue_to_disk() returns NULL resulting in the splat below.
      
       Kernel attempted to read user page (330) - exploit attempt? (uid: 0)
       BUG: Kernel NULL pointer dereference on read at 0x00000330
       Faulting instruction address: 0xc000000000906344
       Oops: Kernel access of bad area, sig: 11 [#1]
       [..]
       NIP [c000000000906344] pmem_pagemap_cleanup+0x24/0x40
       LR [c0000000004701d4] memunmap_pages+0x1b4/0x4b0
       Call Trace:
       [c000000022cbb9c0] [c0000000009063c8] pmem_pagemap_kill+0x28/0x40 (unreliable)
       [c000000022cbb9e0] [c0000000004701d4] memunmap_pages+0x1b4/0x4b0
       [c000000022cbba90] [c0000000008b28a0] devm_action_release+0x30/0x50
       [c000000022cbbab0] [c0000000008b39c8] release_nodes+0x2f8/0x3e0
       [c000000022cbbb60] [c0000000008ac440] device_release_driver_internal+0x190/0x2b0
       [c000000022cbbba0] [c0000000008a8450] unbind_store+0x130/0x170
      Reported-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Fixes: 87eb73b2 ("nvdimm-pmem: convert to blk_alloc_disk/blk_cleanup_disk")
      Link: http://lore.kernel.org/r/DFB75BA8-603F-4A35-880B-C5B23EF8FA7D@linux.vnet.ibm.com
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ulf Hansson <ulf.hansson@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
      Link: https://lore.kernel.org/r/162310994435.1571616.334551212901820961.stgit@dwillia2-desk3.amr.corp.intel.com
      [axboe: fold in compile warning fix]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a624eb52
    • J
      rq-qos: fix missed wake-ups in rq_qos_throttle try two · 11c7aa0d
      Jan Kara 提交于
      Commit 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      tried to fix a problem that a process could be sleeping in rq_qos_wait()
      without anyone to wake it up. However the fix is not complete and the
      following can still happen:
      
      CPU1 (waiter1)		CPU2 (waiter2)		CPU3 (waker)
      rq_qos_wait()		rq_qos_wait()
        acquire_inflight_cb() -> fails
      			  acquire_inflight_cb() -> fails
      
      						completes IOs, inflight
      						  decreased
        prepare_to_wait_exclusive()
      			  prepare_to_wait_exclusive()
        has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
      			  has_sleeper = !wq_has_single_sleeper() -> true
        io_schedule()		  io_schedule()
      
      Deadlock as now there's nobody to wakeup the two waiters. The logic
      automatically blocking when there are already sleepers is really subtle
      and the only way to make it work reliably is that we check whether there
      are some waiters in the queue when adding ourselves there. That way, we
      are guaranteed that at least the first process to enter the wait queue
      will recheck the waiting condition before going to sleep and thus
      guarantee forward progress.
      
      Fixes: 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      11c7aa0d
    • L
      block: return the correct bvec when checking for gaps · c9c9762d
      Long Li 提交于
      After commit 07173c3e ("block: enable multipage bvecs"), a bvec can
      have multiple pages. But bio_will_gap() still assumes one page bvec while
      checking for merging. If the pages in the bvec go across the
      seg_boundary_mask, this check for merging can potentially succeed if only
      the 1st page is tested, and can fail if all the pages are tested.
      
      Later, when SCSI builds the SG list the same check for merging is done in
      __blk_segment_map_sg_merge() with all the pages in the bvec tested. This
      time the check may fail if the pages in bvec go across the
      seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
      BIOs were merged). If this check fails, we end up with a broken SG list
      for drivers assuming the SG list not having offsets in intermediate pages.
      This results in incorrect pages written to the disk.
      
      Fix this by returning the multi-page bvec when testing gaps for merging.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Signed-off-by: NLong Li <longli@microsoft.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      c9c9762d
  3. 04 6月, 2021 2 次提交
    • B
      block: Update blk_update_request() documentation · 7cc2623d
      Bart Van Assche 提交于
      Although the original intent was to use blk_update_request() in stacking
      block drivers only, it is used much more widely today. Reflect this in the
      documentation block above this function. See also:
      * commit 32fab448 ("block: add request update interface").
      * commit 2e60e022 ("block: clean up request completion API").
      * commit ed6565e7 ("block: handle partial completions for special
        payload requests").
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20210519175226.8853-1-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      7cc2623d
    • J
      block: Do not pull requests from the scheduler when we cannot dispatch them · 61347154
      Jan Kara 提交于
      Provided the device driver does not implement dispatch budget accounting
      (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
      requests from the IO scheduler as long as it is willing to give out any.
      That defeats scheduling heuristics inside the scheduler by creating
      false impression that the device can take more IO when it in fact
      cannot.
      
      For example with BFQ IO scheduler on top of virtio-blk device setting
      blkio cgroup weight has barely any impact on observed throughput of
      async IO because __blk_mq_do_dispatch_sched() always sucks out all the
      IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
      when that is all dispatched, it will give out IO of lower weight cgroups
      as well. And then we have to wait for all this IO to be dispatched to
      the disk (which means lot of it actually has to complete) before the
      IO scheduler is queried again for dispatching more requests. This
      completely destroys any service differentiation.
      
      So grab request tag for a request pulled out of the IO scheduler already
      in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
      cannot get it because we are unlikely to be able to dispatch it. That
      way only single request is going to wait in the dispatch list for some
      tag to free.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      61347154
  4. 03 6月, 2021 1 次提交
  5. 01 6月, 2021 33 次提交