1. 18 6月, 2021 1 次提交
  2. 12 6月, 2021 4 次提交
  3. 09 6月, 2021 2 次提交
    • J
      rq-qos: fix missed wake-ups in rq_qos_throttle try two · 11c7aa0d
      Jan Kara 提交于
      Commit 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      tried to fix a problem that a process could be sleeping in rq_qos_wait()
      without anyone to wake it up. However the fix is not complete and the
      following can still happen:
      
      CPU1 (waiter1)		CPU2 (waiter2)		CPU3 (waker)
      rq_qos_wait()		rq_qos_wait()
        acquire_inflight_cb() -> fails
      			  acquire_inflight_cb() -> fails
      
      						completes IOs, inflight
      						  decreased
        prepare_to_wait_exclusive()
      			  prepare_to_wait_exclusive()
        has_sleeper = !wq_has_single_sleeper() -> true as there are two sleepers
      			  has_sleeper = !wq_has_single_sleeper() -> true
        io_schedule()		  io_schedule()
      
      Deadlock as now there's nobody to wakeup the two waiters. The logic
      automatically blocking when there are already sleepers is really subtle
      and the only way to make it work reliably is that we check whether there
      are some waiters in the queue when adding ourselves there. That way, we
      are guaranteed that at least the first process to enter the wait queue
      will recheck the waiting condition before going to sleep and thus
      guarantee forward progress.
      
      Fixes: 545fbd07 ("rq-qos: fix missed wake-ups in rq_qos_throttle")
      CC: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210607112613.25344-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      11c7aa0d
    • L
      block: return the correct bvec when checking for gaps · c9c9762d
      Long Li 提交于
      After commit 07173c3e ("block: enable multipage bvecs"), a bvec can
      have multiple pages. But bio_will_gap() still assumes one page bvec while
      checking for merging. If the pages in the bvec go across the
      seg_boundary_mask, this check for merging can potentially succeed if only
      the 1st page is tested, and can fail if all the pages are tested.
      
      Later, when SCSI builds the SG list the same check for merging is done in
      __blk_segment_map_sg_merge() with all the pages in the bvec tested. This
      time the check may fail if the pages in bvec go across the
      seg_boundary_mask (but tested okay in bio_will_gap() earlier, so those
      BIOs were merged). If this check fails, we end up with a broken SG list
      for drivers assuming the SG list not having offsets in intermediate pages.
      This results in incorrect pages written to the disk.
      
      Fix this by returning the multi-page bvec when testing gaps for merging.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Jeffle Xu <jefflexu@linux.alibaba.com>
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org
      Fixes: 07173c3e ("block: enable multipage bvecs")
      Signed-off-by: NLong Li <longli@microsoft.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/1623094445-22332-1-git-send-email-longli@linuxonhyperv.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      c9c9762d
  4. 01 6月, 2021 7 次提交
  5. 24 5月, 2021 2 次提交
  6. 23 5月, 2021 1 次提交
  7. 20 5月, 2021 1 次提交
  8. 19 5月, 2021 5 次提交
  9. 15 5月, 2021 3 次提交
  10. 14 5月, 2021 2 次提交
    • J
      dyndbg: avoid calling dyndbg_emit_prefix when it has no work · 640d1eaf
      Jim Cromie 提交于
      Wrap function in a static-inline one, which checks flags to avoid
      calling the function unnecessarily.
      
      And hoist its output-buffer initialization to the grand-caller, which
      is already allocating the buffer on the stack, and can trivially
      initialize it too.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Link: https://lore.kernel.org/r/20210504222235.1033685-2-jim.cromie@gmail.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      640d1eaf
    • M
      vt: Fix character height handling with VT_RESIZEX · 860dafa9
      Maciej W. Rozycki 提交于
      Restore the original intent of the VT_RESIZEX ioctl's `v_clin' parameter
      which is the number of pixel rows per character (cell) rather than the
      height of the font used.
      
      For framebuffer devices the two values are always the same, because the
      former is inferred from the latter one.  For VGA used as a true text
      mode device these two parameters are independent from each other: the
      number of pixel rows per character is set in the CRT controller, while
      font height is in fact hardwired to 32 pixel rows and fonts of heights
      below that value are handled by padding their data with blanks when
      loaded to hardware for use by the character generator.  One can change
      the setting in the CRT controller and it will update the screen contents
      accordingly regardless of the font loaded.
      
      The `v_clin' parameter is used by the `vgacon' driver to set the height
      of the character cell and then the cursor position within.  Make the
      parameter explicit then, by defining a new `vc_cell_height' struct
      member of `vc_data', set it instead of `vc_font.height' from `v_clin' in
      the VT_RESIZEX ioctl, and then use it throughout the `vgacon' driver
      except where actual font data is accessed which as noted above is
      independent from the CRTC setting.
      
      This way the framebuffer console driver is free to ignore the `v_clin'
      parameter as irrelevant, as it always should have, avoiding any issues
      attempts to give the parameter a meaning there could have caused, such
      as one that has led to commit 988d0763 ("vt_ioctl: make VT_RESIZEX
      behave like VT_RESIZE"):
      
       "syzbot is reporting UAF/OOB read at bit_putcs()/soft_cursor() [1][2],
        for vt_resizex() from ioctl(VT_RESIZEX) allows setting font height
        larger than actual font height calculated by con_font_set() from
        ioctl(PIO_FONT). Since fbcon_set_font() from con_font_set() allocates
        minimal amount of memory based on actual font height calculated by
        con_font_set(), use of vt_resizex() can cause UAF/OOB read for font
        data."
      
      The problem first appeared around Linux 2.5.66 which predates our repo
      history, but the origin could be identified with the old MIPS/Linux repo
      also at: <git://git.kernel.org/pub/scm/linux/kernel/git/ralf/linux.git>
      as commit 9736a3546de7 ("Merge with Linux 2.5.66."), where VT_RESIZEX
      code in `vt_ioctl' was updated as follows:
      
       		if (clin)
      -			video_font_height = clin;
      +			vc->vc_font.height = clin;
      
      making the parameter apply to framebuffer devices as well, perhaps due
      to the use of "font" in the name of the original `video_font_height'
      variable.  Use "cell" in the new struct member then to avoid ambiguity.
      
      References:
      
      [1] https://syzkaller.appspot.com/bug?id=32577e96d88447ded2d3b76d71254fb855245837
      [2] https://syzkaller.appspot.com/bug?id=6b8355d27b2b94fb5cedf4655e3a59162d9e48e3Signed-off-by: NMaciej W. Rozycki <macro@orcam.me.uk>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: stable@vger.kernel.org # v2.6.12+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      860dafa9
  11. 13 5月, 2021 1 次提交
  12. 12 5月, 2021 1 次提交
  13. 11 5月, 2021 3 次提交
    • O
      kyber: fix out of bounds access when preempted · efed9a33
      Omar Sandoval 提交于
      __blk_mq_sched_bio_merge() gets the ctx and hctx for the current CPU and
      passes the hctx to ->bio_merge(). kyber_bio_merge() then gets the ctx
      for the current CPU again and uses that to get the corresponding Kyber
      context in the passed hctx. However, the thread may be preempted between
      the two calls to blk_mq_get_ctx(), and the ctx returned the second time
      may no longer correspond to the passed hctx. This "works" accidentally
      most of the time, but it can cause us to read garbage if the second ctx
      came from an hctx with more ctx's than the first one (i.e., if
      ctx->index_hw[hctx->type] > hctx->nr_ctx).
      
      This manifested as this UBSAN array index out of bounds error reported
      by Jakub:
      
      UBSAN: array-index-out-of-bounds in ../kernel/locking/qspinlock.c:130:9
      index 13106 is out of range for type 'long unsigned int [128]'
      Call Trace:
       dump_stack+0xa4/0xe5
       ubsan_epilogue+0x5/0x40
       __ubsan_handle_out_of_bounds.cold.13+0x2a/0x34
       queued_spin_lock_slowpath+0x476/0x480
       do_raw_spin_lock+0x1c2/0x1d0
       kyber_bio_merge+0x112/0x180
       blk_mq_submit_bio+0x1f5/0x1100
       submit_bio_noacct+0x7b0/0x870
       submit_bio+0xc2/0x3a0
       btrfs_map_bio+0x4f0/0x9d0
       btrfs_submit_data_bio+0x24e/0x310
       submit_one_bio+0x7f/0xb0
       submit_extent_page+0xc4/0x440
       __extent_writepage_io+0x2b8/0x5e0
       __extent_writepage+0x28d/0x6e0
       extent_write_cache_pages+0x4d7/0x7a0
       extent_writepages+0xa2/0x110
       do_writepages+0x8f/0x180
       __writeback_single_inode+0x99/0x7f0
       writeback_sb_inodes+0x34e/0x790
       __writeback_inodes_wb+0x9e/0x120
       wb_writeback+0x4d2/0x660
       wb_workfn+0x64d/0xa10
       process_one_work+0x53a/0xa80
       worker_thread+0x69/0x5b0
       kthread+0x20b/0x240
       ret_from_fork+0x1f/0x30
      
      Only Kyber uses the hctx, so fix it by passing the request_queue to
      ->bio_merge() instead. BFQ and mq-deadline just use that, and Kyber can
      map the queues itself to avoid the mismatch.
      
      Fixes: a6088845 ("block: kyber: make kyber more friendly with merging")
      Reported-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Link: https://lore.kernel.org/r/c7598605401a48d5cfeadebb678abd10af22b83f.1620691329.git.osandov@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      efed9a33
    • N
      stack: Replace "o" output with "r" input constraint · 2515dd6c
      Nick Desaulniers 提交于
      "o" isn't a common asm() constraint to use; it triggers an assertion in
      assert-enabled builds of LLVM that it's not recognized when targeting
      aarch64 (though it appears to fall back to "m"). It's fixed in LLVM 13 now,
      but there isn't really a good reason to use "o" in particular here. To
      avoid causing build issues for those using assert-enabled builds of earlier
      LLVM versions, the constraint needs changing.
      
      Instead, if the point is to retain the __builtin_alloca(), make ptr appear
      to "escape" via being an input to an empty inline asm block. This is
      preferable anyways, since otherwise this looks like a dead store.
      
      While the use of "r" was considered in
      
        https://lore.kernel.org/lkml/202104011447.2E7F543@keescook/
      
      it was only tested as an output (which looks like a dead store, and wasn't
      sufficient).
      
      Use "r" as an input constraint instead, which behaves correctly across
      compilers and architectures.
      
      Fixes: 39218ff4 ("stack: Optionally randomize kernel stack offset each syscall")
      Signed-off-by: NNick Desaulniers <ndesaulniers@google.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NKees Cook <keescook@chromium.org>
      Tested-by: NNathan Chancellor <nathan@kernel.org>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Link: https://reviews.llvm.org/D100412
      Link: https://bugs.llvm.org/show_bug.cgi?id=49956
      Link: https://lore.kernel.org/r/20210419231741.4084415-1-keescook@chromium.org
      2515dd6c
    • T
      PM: runtime: Fix unpaired parent child_count for force_resume · c745253e
      Tony Lindgren 提交于
      As pm_runtime_need_not_resume() relies also on usage_count, it can return
      a different value in pm_runtime_force_suspend() compared to when called in
      pm_runtime_force_resume(). Different return values can happen if anything
      calls PM runtime functions in between, and causes the parent child_count
      to increase on every resume.
      
      So far I've seen the issue only for omapdrm that does complicated things
      with PM runtime calls during system suspend for legacy reasons:
      
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      pm_runtime_force_suspend() for 58000000.dss, !pm_runtime_need_not_resume()
       __update_runtime_status()
      system suspended
      pm_runtime_force_resume() for 58000000.dss, pm_runtime_need_not_resume()
       pm_runtime_enable() only called because of pm_runtime_need_not_resume()
      omap_atomic_commit_tail() for omapdrm.0
       dispc_runtime_get()
        wakes up 58000000.dss as it's the dispc parent
         dispc_runtime_resume()
          rpm_resume() increases parent child_count
       dispc_runtime_put() won't idle, PM runtime suspend blocked
      ...
      rpm_suspend for 58000000.dss but parent child_count is now unbalanced
      
      Let's fix the issue by adding a flag for needs_force_resume and use it in
      pm_runtime_force_resume() instead of pm_runtime_need_not_resume().
      
      Additionally omapdrm system suspend could be simplified later on to avoid
      lots of unnecessary PM runtime calls and the complexity it adds. The
      driver can just use internal functions that are shared between the PM
      runtime and system suspend related functions.
      
      Fixes: 4918e1f8 ("PM / runtime: Rework pm_runtime_force_suspend/resume()")
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      Reviewed-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NTomi Valkeinen <tomi.valkeinen@ideasonboard.com>
      Cc: 4.16+ <stable@vger.kernel.org> # 4.16+
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c745253e
  14. 10 5月, 2021 1 次提交
  15. 09 5月, 2021 1 次提交
  16. 08 5月, 2021 1 次提交
    • M
      linux/kconfig.h: replace IF_ENABLED() with PTR_IF() in <linux/kernel.h> · 0ab1438b
      Masahiro Yamada 提交于
      <linux/kconfig.h> is included from all the kernel-space source files,
      including C, assembly, linker scripts. It is intended to contain a
      minimal set of macros to evaluate CONFIG options.
      
      IF_ENABLED() is an intruder here because (x ? y : z) is C code, which
      should not be included from assembly files or linker scripts.
      
      Also, <linux/kconfig.h> is no longer self-contained because NULL is
      defined in <linux/stddef.h>.
      
      Move IF_ENABLED() out to <linux/kernel.h> as PTR_IF(). PTF_IF()
      takes the general boolean expression instead of a CONFIG option
      so that it fits better in <linux/kernel.h>.
      Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      0ab1438b
  17. 07 5月, 2021 4 次提交