1. 18 6月, 2021 1 次提交
  2. 12 6月, 2021 4 次提交
  3. 09 6月, 2021 2 次提交
    • J
      rq-qos: fix missed wake-ups in rq_qos_throttle try two · 11c7aa0d
      Jan Kara 提交于
      Commit 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      tried to fix a problem that a process could be sleeping in rq_qos_wait()
      without anyone to wake it up. However the fix is not complete and the
      following can still happen:
      
      CPU1 (waiter1)		CPU2 (waiter2)		CPU3 (waker)
      rq_qos_wait()		rq_qos_wait()
        acquire_inflight_cb() -> fails
      			  acquire_inflight_cb() -> fails
      
      						completes IOs, inflight
      						  decreased
        prepare_to_wait_exclusive()
      			  prepare_to_wait_exclusive()
        has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
      			  has_sleeper = !wq_has_single_sleeper() -> true
        io_schedule()		  io_schedule()
      
      Deadlock as now there's nobody to wakeup the two waiters. The logic
      automatically blocking when there are already sleepers is really subtle
      and the only way to make it work reliably is that we check whether there
      are some waiters in the queue when adding ourselves there. That way, we
      are guaranteed that at least the first process to enter the wait queue
      will recheck the waiting condition before going to sleep and thus
      guarantee forward progress.
      
      Fixes: 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      11c7aa0d
    • L
      block: return the correct bvec when checking for gaps · c9c9762d
      Long Li 提交于
      After commit 07173c3e ("block: enable multipage bvecs"), a bvec can
      have multiple pages. But bio_will_gap() still assumes one page bvec while
      checking for merging. If the pages in the bvec go across the
      seg_boundary_mask, this check for merging can potentially succeed if only
      the 1st page is tested, and can fail if all the pages are tested.
      
      Later, when SCSI builds the SG list the same check for merging is done in
      __blk_segment_map_sg_merge() with all the pages in the bvec tested. This
      time the check may fail if the pages in bvec go across the
      seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
      BIOs were merged). If this check fails, we end up with a broken SG list
      for drivers assuming the SG list not having offsets in intermediate pages.
      This results in incorrect pages written to the disk.
      
      Fix this by returning the multi-page bvec when testing gaps for merging.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Signed-off-by: NLong Li <longli@microsoft.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      c9c9762d
  4. 01 6月, 2021 7 次提交
  5. 24 5月, 2021 2 次提交
  6. 23 5月, 2021 1 次提交
  7. 20 5月, 2021 1 次提交
  8. 19 5月, 2021 6 次提交
  9. 15 5月, 2021 3 次提交
  10. 14 5月, 2021 3 次提交
  11. 13 5月, 2021 1 次提交
  12. 12 5月, 2021 1 次提交
  13. 11 5月, 2021 3 次提交
    • O
      kyber: fix out of bounds access when preempted · efed9a33
      Omar Sandoval 提交于
      __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
      passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
      for the current CPU again and uses that to get the corresponding Kyber
      context in the passed hctx. However, the thread may be preempted between
      the two calls to blk_mq_get_ctx(), and the ctx returned the second time
      may no longer correspond to the passed hctx. This "works" accidentally
      most of the time, but it can cause us to read garbage if the second ctx
      came from an hctx with more ctx's than the first one (i.e., if
      ctx->index_hw[hctx->type] > hctx->nr_ctx).
      
      This manifested as this UBSAN array index out of bounds error reported
      by Jakub:
      
      UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
      index 13106 is out of range for type 'long unsigned int [128]'
      Call Trace:
       dump_stack+0xa4/0xe5
       ubsan_epilogue+0x5/0x40
       __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
       queued_spin_lock_slowpath+0x476/0x480
       do_raw_spin_lock+0x1c2/0x1d0
       kyber_bio_merge+0x112/0x180
       blk_mq_submit_bio+0x1f5/0x1100
       submit_bio_noacct+0x7b0/0x870
       submit_bio+0xc2/0x3a0
       btrfs_map_bio+0x4f0/0x9d0
       btrfs_submit_data_bio+0x24e/0x310
       submit_one_bio+0x7f/0xb0
       submit_extent_page+0xc4/0x440
       __extent_writepage_io+0x2b8/0x5e0
       __extent_writepage+0x28d/0x6e0
       extent_write_cache_pages+0x4d7/0x7a0
       extent_writepages+0xa2/0x110
       do_writepages+0x8f/0x180
       __writeback_single_inode+0x99/0x7f0
       writeback_sb_inodes+0x34e/0x790
       __writeback_inodes_wb+0x9e/0x120
       wb_writeback+0x4d2/0x660
       wb_workfn+0x64d/0xa10
       process_one_work+0x53a/0xa80
       worker_thread+0x69/0x5b0
       kthread+0x20b/0x240
       ret_from_fork+0x1f/0x30
      
      Only Kyber uses the hctx, so fix it by passing the request_queue to
      ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
      map the queues itself to avoid the mismatch.
      
      Fixes: a6088845 ("block: kyber: make kyber more friendly with merging")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      efed9a33
    • N
      stack: Replace "o" output with "r" input constraint · 2515dd6c
      Nick Desaulniers 提交于
      "o" isn't a common asm() constraint to use; it triggers an assertion in
      assert-enabled builds of LLVM that it's not recognized when targeting
      aarch64 (though it appears to fall back to "m"). It's fixed in LLVM 13 now,
      but there isn't really a good reason to use "o" in particular here. To
      avoid causing build issues for those using assert-enabled builds of earlier
      LLVM versions, the constraint needs changing.
      
      Instead, if the point is to retain the __builtin_alloca(), make ptr appear
      to "escape" via being an input to an empty inline asm block. This is
      preferable anyways, since otherwise this looks like a dead store.
      
      While the use of "r" was considered in
      
        https://lore.kernel.org/lkml/202104011447.2E7F543@keescook/
      
      it was only tested as an output (which looks like a dead store, and wasn't
      sufficient).
      
      Use "r" as an input constraint instead, which behaves correctly across
      compilers and architectures.
      
      Fixes: 39218ff4 ("stack: Optionally randomize kernel stack offset each syscall")
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Link: https://reviews.llvm.org/D100412
      Link: https://bugs.llvm.org/show_bug.cgi?id=49956
      Link: https://lore.kernel.org/r/20210419231741.4084415-1-keescook@chromium.org
      2515dd6c
    • T
      PM: runtime: Fix unpaired parent child_count for force_resume · c745253e
      Tony Lindgren 提交于
      As pm_runtime_need_not_resume() relies also on usage_count, it can return
      a different value in pm_runtime_force_suspend() compared to when called in
      pm_runtime_force_resume(). Different return values can happen if anything
      calls PM runtime functions in between, and causes the parent child_count
      to increase on every resume.
      
      So far I've seen the issue only for omapdrm that does complicated things
      with PM runtime calls during system suspend for legacy reasons:
      
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      pm_runtime_force_suspend() for 58000000.dss, !pm_runtime_need_not_resume()
       __update_runtime_status()
      system suspended
      pm_runtime_force_resume() for 58000000.dss, pm_runtime_need_not_resume()
       pm_runtime_enable() only called because of pm_runtime_need_not_resume()
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      ...
      rpm_suspend for 58000000.dss but parent child_count is now unbalanced
      
      Let's fix the issue by adding a flag for needs_force_resume and use it in
      pm_runtime_force_resume() instead of pm_runtime_need_not_resume().
      
      Additionally omapdrm system suspend could be simplified later on to avoid
      lots of unnecessary PM runtime calls and the complexity it adds. The
      driver can just use internal functions that are shared between the PM
      runtime and system suspend related functions.
      
      Fixes: 4918e1f8 ("PM / runtime: Rework pm_runtime_force_suspend/resume()")
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NTomi Valkeinen <tomi.valkeinen@ideasonboard.com>
      Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c745253e
  14. 10 5月, 2021 2 次提交
  15. 09 5月, 2021 1 次提交
  16. 08 5月, 2021 2 次提交