1. 29 11月, 2021 20 次提交
  2. 27 11月, 2021 1 次提交
  3. 26 11月, 2021 1 次提交
  4. 23 11月, 2021 1 次提交
  5. 19 11月, 2021 2 次提交
  6. 17 11月, 2021 1 次提交
  7. 16 11月, 2021 3 次提交
    • M
      blk-mq: cancel blk-mq dispatch work in both blk_cleanup_queue and disk_release() · 2a19b28f
      Ming Lei 提交于
      For avoiding to slow down queue destroy, we don't call
      blk_mq_quiesce_queue() in blk_cleanup_queue(), instead of delaying to
      cancel dispatch work in blk_release_queue().
      
      However, this way has caused kernel oops[1], reported by Changhui. The log
      shows that scsi_device can be freed before running blk_release_queue(),
      which is expected too since scsi_device is released after the scsi disk
      is closed and the scsi_device is removed.
      
      Fixes the issue by canceling blk-mq dispatch work in both blk_cleanup_queue()
      and disk_release():
      
      1) when disk_release() is run, the disk has been closed, and any sync
      dispatch activities have been done, so canceling dispatch work is enough to
      quiesce filesystem I/O dispatch activity.
      
      2) in blk_cleanup_queue(), we only focus on passthrough request, and
      passthrough request is always explicitly allocated & freed by
      its caller, so once queue is frozen, all sync dispatch activity
      for passthrough request has been done, then it is enough to just cancel
      dispatch work for avoiding any dispatch activity.
      
      [1] kernel panic log
      [12622.769416] BUG: kernel NULL pointer dereference, address: 0000000000000300
      [12622.777186] #PF: supervisor read access in kernel mode
      [12622.782918] #PF: error_code(0x0000) - not-present page
      [12622.788649] PGD 0 P4D 0
      [12622.791474] Oops: 0000 [#1] PREEMPT SMP PTI
      [12622.796138] CPU: 10 PID: 744 Comm: kworker/10:1H Kdump: loaded Not tainted 5.15.0+ #1
      [12622.804877] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 1.5.4 10/002/2015
      [12622.813321] Workqueue: kblockd blk_mq_run_work_fn
      [12622.818572] RIP: 0010:sbitmap_get+0x75/0x190
      [12622.823336] Code: 85 80 00 00 00 41 8b 57 08 85 d2 0f 84 b1 00 00 00 45 31 e4 48 63 cd 48 8d 1c 49 48 c1 e3 06 49 03 5f 10 4c 8d 6b 40 83 f0 01 <48> 8b 33 44 89 f2 4c 89 ef 0f b6 c8 e8 fa f3 ff ff 83 f8 ff 75 58
      [12622.844290] RSP: 0018:ffffb00a446dbd40 EFLAGS: 00010202
      [12622.850120] RAX: 0000000000000001 RBX: 0000000000000300 RCX: 0000000000000004
      [12622.858082] RDX: 0000000000000006 RSI: 0000000000000082 RDI: ffffa0b7a2dfe030
      [12622.866042] RBP: 0000000000000004 R08: 0000000000000001 R09: ffffa0b742721334
      [12622.874003] R10: 0000000000000008 R11: 0000000000000008 R12: 0000000000000000
      [12622.881964] R13: 0000000000000340 R14: 0000000000000000 R15: ffffa0b7a2dfe030
      [12622.889926] FS:  0000000000000000(0000) GS:ffffa0baafb40000(0000) knlGS:0000000000000000
      [12622.898956] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [12622.905367] CR2: 0000000000000300 CR3: 0000000641210001 CR4: 00000000001706e0
      [12622.913328] Call Trace:
      [12622.916055]  <TASK>
      [12622.918394]  scsi_mq_get_budget+0x1a/0x110
      [12622.922969]  __blk_mq_do_dispatch_sched+0x1d4/0x320
      [12622.928404]  ? pick_next_task_fair+0x39/0x390
      [12622.933268]  __blk_mq_sched_dispatch_requests+0xf4/0x140
      [12622.939194]  blk_mq_sched_dispatch_requests+0x30/0x60
      [12622.944829]  __blk_mq_run_hw_queue+0x30/0xa0
      [12622.949593]  process_one_work+0x1e8/0x3c0
      [12622.954059]  worker_thread+0x50/0x3b0
      [12622.958144]  ? rescuer_thread+0x370/0x370
      [12622.962616]  kthread+0x158/0x180
      [12622.966218]  ? set_kthread_struct+0x40/0x40
      [12622.970884]  ret_from_fork+0x22/0x30
      [12622.974875]  </TASK>
      [12622.977309] Modules linked in: scsi_debug rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs sunrpc dm_multipath intel_rapl_msr intel_rapl_common dell_wmi_descriptor sb_edac rfkill video x86_pkg_temp_thermal intel_powerclamp dcdbas coretemp kvm_intel kvm mgag200 irqbypass i2c_algo_bit rapl drm_kms_helper ipmi_ssif intel_cstate intel_uncore syscopyarea sysfillrect sysimgblt fb_sys_fops pcspkr cec mei_me lpc_ich mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse xfs libcrc32c sr_mod cdrom sd_mod t10_pi sg ixgbe ahci libahci crct10dif_pclmul crc32_pclmul crc32c_intel libata megaraid_sas ghash_clmulni_intel tg3 wdat_wdt mdio dca wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_debug]
      Reported-by: NChanghuiZhong <czhong@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: linux-scsi@vger.kernel.org
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20211116014343.610501-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      2a19b28f
    • J
      block: fix missing queue put in error path · 95febeb6
      Jens Axboe 提交于
      If we fail the submission queue checks, we don't put the queue afterwards.
      This can cause various issues like stalls on scheduler switch or failure
      to remove the device, or like in the original bug report, timeout waiting
      for the device on reboot/restart.
      
      While in there, fix a few whitespace discrepancies in the surrounding
      code.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215039
      Fixes: b637108a ("blk-mq: fix filesystem I/O request allocation")
      Reported-and-tested-by: NStephen Smith <stephenmsmith@blueyonder.co.uk>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      95febeb6
    • A
      block: Check ADMIN before NICE for IOPRIO_CLASS_RT · 94c4b4fd
      Alistair Delva 提交于
      Booting to Android userspace on 5.14 or newer triggers the following
      SELinux denial:
      
      avc: denied { sys_nice } for comm="init" capability=23
           scontext=u:r:init:s0 tcontext=u:r:init:s0 tclass=capability
           permissive=0
      
      Init is PID 0 running as root, so it already has CAP_SYS_ADMIN. For
      better compatibility with older SEPolicy, check ADMIN before NICE.
      
      Fixes: 9d3a39a5 ("block: grant IOPRIO_CLASS_RT to CAP_SYS_NICE")
      Signed-off-by: NAlistair Delva <adelva@google.com>
      Cc: Khazhismel Kumykov <khazhy@google.com>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: selinux@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: kernel-team@android.com
      Cc: stable@vger.kernel.org # v5.14+
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Link: https://lore.kernel.org/r/20211115181655.3608659-1-adelva@google.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      94c4b4fd
  8. 13 11月, 2021 1 次提交
  9. 12 11月, 2021 5 次提交
    • L
      blkcg: Remove extra blkcg_bio_issue_init · b781d8db
      Laibin Qiu 提交于
      KASAN reports a use-after-free report when doing block test:
      
      ==================================================================
      [10050.967049] BUG: KASAN: use-after-free in
      submit_bio_checks+0x1539/0x1550
      
      [10050.977638] Call Trace:
      [10050.978190]  dump_stack+0x9b/0xce
      [10050.979674]  print_address_description.constprop.6+0x3e/0x60
      [10050.983510]  kasan_report.cold.9+0x22/0x3a
      [10050.986089]  submit_bio_checks+0x1539/0x1550
      [10050.989576]  submit_bio_noacct+0x83/0xc80
      [10050.993714]  submit_bio+0xa7/0x330
      [10050.994435]  mpage_readahead+0x380/0x500
      [10050.998009]  read_pages+0x1c1/0xbf0
      [10051.002057]  page_cache_ra_unbounded+0x4c2/0x6f0
      [10051.007413]  do_page_cache_ra+0xda/0x110
      [10051.008207]  force_page_cache_ra+0x23d/0x3d0
      [10051.009087]  page_cache_sync_ra+0xca/0x300
      [10051.009970]  generic_file_buffered_read+0xbea/0x2130
      [10051.012685]  generic_file_read_iter+0x315/0x490
      [10051.014472]  blkdev_read_iter+0x113/0x1b0
      [10051.015300]  aio_read+0x2ad/0x450
      [10051.023786]  io_submit_one+0xc8e/0x1d60
      [10051.029855]  __se_sys_io_submit+0x125/0x350
      [10051.033442]  do_syscall_64+0x2d/0x40
      [10051.034156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [10051.048733] Allocated by task 18598:
      [10051.049482]  kasan_save_stack+0x19/0x40
      [10051.050263]  __kasan_kmalloc.constprop.1+0xc1/0xd0
      [10051.051230]  kmem_cache_alloc+0x146/0x440
      [10051.052060]  mempool_alloc+0x125/0x2f0
      [10051.052818]  bio_alloc_bioset+0x353/0x590
      [10051.053658]  mpage_alloc+0x3b/0x240
      [10051.054382]  do_mpage_readpage+0xddf/0x1ef0
      [10051.055250]  mpage_readahead+0x264/0x500
      [10051.056060]  read_pages+0x1c1/0xbf0
      [10051.056758]  page_cache_ra_unbounded+0x4c2/0x6f0
      [10051.057702]  do_page_cache_ra+0xda/0x110
      [10051.058511]  force_page_cache_ra+0x23d/0x3d0
      [10051.059373]  page_cache_sync_ra+0xca/0x300
      [10051.060198]  generic_file_buffered_read+0xbea/0x2130
      [10051.061195]  generic_file_read_iter+0x315/0x490
      [10051.062189]  blkdev_read_iter+0x113/0x1b0
      [10051.063015]  aio_read+0x2ad/0x450
      [10051.063686]  io_submit_one+0xc8e/0x1d60
      [10051.064467]  __se_sys_io_submit+0x125/0x350
      [10051.065318]  do_syscall_64+0x2d/0x40
      [10051.066082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [10051.067455] Freed by task 13307:
      [10051.068136]  kasan_save_stack+0x19/0x40
      [10051.068931]  kasan_set_track+0x1c/0x30
      [10051.069726]  kasan_set_free_info+0x1b/0x30
      [10051.070621]  __kasan_slab_free+0x111/0x160
      [10051.071480]  kmem_cache_free+0x94/0x460
      [10051.072256]  mempool_free+0xd6/0x320
      [10051.072985]  bio_free+0xe0/0x130
      [10051.073630]  bio_put+0xab/0xe0
      [10051.074252]  bio_endio+0x3a6/0x5d0
      [10051.074984]  blk_update_request+0x590/0x1370
      [10051.075870]  scsi_end_request+0x7d/0x400
      [10051.076667]  scsi_io_completion+0x1aa/0xe50
      [10051.077503]  scsi_softirq_done+0x11b/0x240
      [10051.078344]  blk_mq_complete_request+0xd4/0x120
      [10051.079275]  scsi_mq_done+0xf0/0x200
      [10051.080036]  virtscsi_vq_done+0xbc/0x150
      [10051.080850]  vring_interrupt+0x179/0x390
      [10051.081650]  __handle_irq_event_percpu+0xf7/0x490
      [10051.082626]  handle_irq_event_percpu+0x7b/0x160
      [10051.083527]  handle_irq_event+0xcc/0x170
      [10051.084297]  handle_edge_irq+0x215/0xb20
      [10051.085122]  asm_call_irq_on_stack+0xf/0x20
      [10051.085986]  common_interrupt+0xae/0x120
      [10051.086830]  asm_common_interrupt+0x1e/0x40
      
      ==================================================================
      
      Bio will be checked at beginning of submit_bio_noacct(). If bio needs
      to be throttled, it will start the timer and stop submit bio directly.
      Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
      But in the current process, if bio is throttled, it will still set bio
      issue->value by blkcg_bio_issue_init(). This is redundant and may cause
      the above use-after-free.
      
      CPU0                                   CPU1
      submit_bio
      submit_bio_noacct
        submit_bio_checks
          blk_throtl_bio()
            <=mod_timer(&sq->pending_timer
                                            blk_throtl_dispatch_work_fn
                                              submit_bio_noacct() <= bio have
                                              throttle tag, will throw directly
                                              and bio issue->value will be set
                                              here
      
                                            bio_endio()
                                            bio_put()
                                            bio_free() <= free this bio
      
          blkcg_bio_issue_init(bio)
            <= bio has been freed and
            will lead to UAF
        return BLK_QC_T_NONE
      
      Fix this by remove extra blkcg_bio_issue_init.
      
      Fixes: e439bedf (blkcg: consolidate bio_issue_init() to be a part of core)
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.comReviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b781d8db
    • S
      block: Hold invalidate_lock in BLKRESETZONE ioctl · 86399ea0
      Shin'ichiro Kawasaki 提交于
      When BLKRESETZONE ioctl and data read race, the data read leaves stale
      page cache. The commit e5113505 ("block: Discard page cache of zone
      reset target range") added page cache truncation to avoid stale page
      cache after the ioctl. However, the stale page cache still can be read
      during the reset zone operation for the ioctl. To avoid the stale page
      cache completely, hold invalidate_lock of the block device file mapping.
      
      Fixes: e5113505 ("block: Discard page cache of zone reset target range")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Cc: stable@vger.kernel.org # v5.15
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      86399ea0
    • M
      blk-mq: rename blk_attempt_bio_merge · b131f201
      Ming Lei 提交于
      It is very annoying to have two block layer functions which share same
      name, so rename blk_attempt_bio_merge in blk-mq.c as
      blk_mq_attempt_bio_merge.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      b131f201
    • M
      blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge · 10f7335e
      Ming Lei 提交于
      blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(),
      which is called when queue usage counter is grabbed already:
      
      1) blk_mq_get_new_requests()
      
      2) blk_mq_get_request()
      - cached request in current plug owns one queue usage counter
      
      So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more
      importantly this nest way causes hang in blk_mq_freeze_queue_wait().
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      10f7335e
    • J
      block: fix kerneldoc for disk_register_independent_access__ranges() · 438cd742
      Jens Axboe 提交于
      The naming got changed as part of a revision of the patchset, but the
      kerneldoc apparently never got updated. Fix it.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Fixes: a2247f19 ("block: Add independent access ranges support")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      438cd742
  10. 10 11月, 2021 4 次提交
  11. 09 11月, 2021 1 次提交