1. 18 4月, 2022 20 次提交
  2. 15 4月, 2022 3 次提交
  3. 01 4月, 2022 1 次提交
  4. 31 3月, 2022 1 次提交
  5. 28 3月, 2022 2 次提交
    • C
      block: Fix the maximum minor value is blk_alloc_ext_minor() · d1868328
      Christophe JAILLET 提交于
      ida_alloc_range(..., min, max, ...) returns values from min to max,
      inclusive.
      
      So, NR_EXT_DEVT is a valid idx returned by blk_alloc_ext_minor().
      
      This is an issue because in device_add_disk(), this value is used in:
         ddev->devt = MKDEV(disk->major, disk->first_minor);
      and NR_EXT_DEVT is '(1 << MINORBITS)'.
      
      So, should 'disk->first_minor' be NR_EXT_DEVT, it would overflow.
      
      Fixes: 22ae8ce8 ("block: simplify bdev/disk lookup in blkdev_get")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/cc17199798312406b90834e433d2cefe8266823d.1648306232.git.christophe.jaillet@wanadoo.frSigned-off-by: NJens Axboe <axboe@kernel.dk>
      d1868328
    • J
      block: restore the old set_task_ioprio() behaviour wrt PF_EXITING · 15583a56
      Jiri Slaby 提交于
      PF_EXITING tasks were silently ignored before the below commits.
      Continue doing so. Otherwise python-psutil tests fail:
        ERROR: psutil.tests.test_process.TestProcess.test_zombie_process
        ----------------------------------------------------------------------
        Traceback (most recent call last):
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/build/lib.linux-x86_64-3.9/psutil/_pslinux.py", line 1661, in wrapper
            return fun(self, *args, **kwargs)
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/build/lib.linux-x86_64-3.9/psutil/_pslinux.py", line 2133, in ionice_set
            return cext.proc_ioprio_set(self.pid, ioclass, value)
        ProcessLookupError: [Errno 3] No such process
      
        During handling of the above exception, another exception occurred:
      
        Traceback (most recent call last):
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/psutil/tests/test_process.py", line 1313, in test_zombie_process
            succeed_or_zombie_p_exc(fun)
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/psutil/tests/test_process.py", line 1288, in succeed_or_zombie_p_exc
            return fun()
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/build/lib.linux-x86_64-3.9/psutil/__init__.py", line 792, in ionice
            return self._proc.ionice_set(ioclass, value)
          File "/home/abuild/rpmbuild/BUILD/psutil-5.9.0/build/lib.linux-x86_64-3.9/psutil/_pslinux.py", line 1665, in wrapper
            raise NoSuchProcess(self.pid, self._name)
        psutil.NoSuchProcess: process no longer exists (pid=2057)
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Fixes: 5fc11eeb (block: open code create_task_io_context in set_task_ioprio)
      Fixes: a957b612 (block: fix error in handling dead task for ioprio setting)
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20220328085928.7899-1-jslaby@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      15583a56
  6. 23 3月, 2022 3 次提交
    • M
      block: avoid calling blkg_free() in atomic context · d578c770
      Ming Lei 提交于
      blkg_free() can currently be called in atomic context, either spin lock is
      held, or run in rcu callback. Meantime either request queue's release
      handler or ->pd_free_fn can sleep.
      
      Fix the issue by scheduling a work function for freeing blkcg_gq the
      instance.
      
      [  148.553894] BUG: sleeping function called from invalid context at block/blk-sysfs.c:767
      [  148.557381] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/13
      [  148.560741] preempt_count: 101, expected: 0
      [  148.562577] RCU nest depth: 0, expected: 0
      [  148.564379] 1 lock held by swapper/13/0:
      [  148.566127]  #0: ffffffff82615f80 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x0/0x1b
      [  148.569640] Preemption disabled at:
      [  148.569642] [<ffffffff8123f9c3>] ___slab_alloc+0x554/0x661
      [  148.573559] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Not tainted 5.17.0_up+ #110
      [  148.576834] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014
      [  148.579768] Call Trace:
      [  148.580567]  <IRQ>
      [  148.581262]  dump_stack_lvl+0x56/0x7c
      [  148.582367]  ? ___slab_alloc+0x554/0x661
      [  148.583526]  __might_resched+0x1af/0x1c8
      [  148.584678]  blk_release_queue+0x24/0x109
      [  148.585861]  kobject_cleanup+0xc9/0xfe
      [  148.586979]  blkg_free+0x46/0x63
      [  148.587962]  rcu_do_batch+0x1c5/0x3db
      [  148.589057]  rcu_core+0x14a/0x184
      [  148.590065]  __do_softirq+0x14d/0x2c7
      [  148.591167]  __irq_exit_rcu+0x7a/0xd4
      [  148.592264]  sysvec_apic_timer_interrupt+0x82/0xa5
      [  148.593649]  </IRQ>
      [  148.594354]  <TASK>
      [  148.595058]  asm_sysvec_apic_timer_interrupt+0x12/0x20
      
      Cc: Tejun Heo <tj@kernel.org>
      Fixes: 0a9a25ca ("block: let blkcg_gq grab request queue's refcnt")
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/linux-block/20220322093322.GA27283@lst.de/Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220323011308.2010380-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      d578c770
    • M
      fs: allocate inode by using alloc_inode_sb() · fd60b288
      Muchun Song 提交于
      The inode allocation is supposed to use alloc_inode_sb(), so convert
      kmem_cache_alloc() of all filesystems to alloc_inode_sb().
      
      Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: Theodore Ts'o <tytso@mit.edu>		[ext4]
      Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Fam Zheng <fam.zheng@bytedance.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kari Argillander <kari.argillander@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Qi Zheng <zhengqi.arch@bytedance.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Xiongchun Duan <duanxiongchun@bytedance.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd60b288
    • N
      block/bfq-iosched.c: use "false" rather than "BLK_RW_ASYNC" · f6bad159
      NeilBrown 提交于
      bfq_get_queue() expects a "bool" for the third arg, so pass "false"
      rather than "BLK_RW_ASYNC" which will soon be removed.
      
      Link: https://lkml.kernel.org/r/164549983746.9187.7949730109246767909.stgit@noble.brownSigned-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
      Cc: Chao Yu <chao@kernel.org>
      Cc: Darrick J. Wong <djwong@kernel.org>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jaegeuk Kim <jaegeuk@kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: Lars Ellenberg <lars.ellenberg@linbit.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Paolo Valente <paolo.valente@linaro.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Ryusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6bad159
  7. 18 3月, 2022 4 次提交
    • Y
      block: cancel all throttled bios in del_gendisk() · 8f9e7b65
      Yu Kuai 提交于
      Throttled bios can't be issued after del_gendisk() is done, thus
      it's better to cancel them immediately rather than waiting for
      throttle is done.
      
      For example, if user thread is throttled with low bps while it's
      issuing large io, and the device is deleted. The user thread will
      wait for a long time for io to return.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220318130144.1066064-4-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      8f9e7b65
    • M
      block: let blkcg_gq grab request queue's refcnt · 0a9a25ca
      Ming Lei 提交于
      In the whole lifetime of blkcg_gq instance, ->q will be referred, such
      as, ->pd_free_fn() is called in blkg_free, and throtl_pd_free() still
      may touch the request queue via &tg->service_queue.pending_timer which
      is handled by throtl_pending_timer_fn(), so it is reasonable to grab
      request queue's refcnt by blkcg_gq instance.
      
      Previously blkcg_exit_queue() is called from blk_release_queue, and it
      is hard to avoid the use-after-free. But recently commit 1059699f ("block:
      move blkcg initialization/destroy into disk allocation/release handler")
      is merged to for-5.18/block, it becomes simple to fix the issue by simply
      grabbing request queue's refcnt.
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220318130144.1066064-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      0a9a25ca
    • M
      block: avoid use-after-free on throttle data · ee37eddb
      Ming Lei 提交于
      In throtl_pending_timer_fn(), request queue is retrieved from throttle
      data. And tg's pending timer is deleted synchronously when releasing the
      associated blkg, at that time, throttle data may have been freed since
      commit 1059699f ("block: move blkcg initialization/destroy into disk
      allocation/release handler") moves freeing q->td to disk_release() from
      blk_release_queue(). So use-after-free on q->td in throtl_pending_timer_fn
      can be triggered.
      
      Fixes the issue by:
      
      - do nothing in case that disk is released, when there isn't any bio to
        dispatch
      
      - retrieve request queue from blkg instead of throttle data for
      non top-level pending timer.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220318130144.1066064-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      ee37eddb
    • S
      block: limit request dispatch loop duration · 572299f0
      Shin'ichiro Kawasaki 提交于
      When IO requests are made continuously and the target block device
      handles requests faster than request arrival, the request dispatch loop
      keeps on repeating to dispatch the arriving requests very long time,
      more than a minute. Since the loop runs as a workqueue worker task, the
      very long loop duration triggers workqueue watchdog timeout and BUG [1].
      
      To avoid the very long loop duration, break the loop periodically. When
      opportunity to dispatch requests still exists, check need_resched(). If
      need_resched() returns true, the dispatch loop already consumed its time
      slice, then reschedule the dispatch work and break the loop. With heavy
      IO load, need_resched() does not return true for 20~30 seconds. To cover
      such case, check time spent in the dispatch loop with jiffies. If more
      than 1 second is spent, reschedule the dispatch work and break the loop.
      
      [1]
      
      [  609.691437] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 35s!
      [  609.701820] Showing busy workqueues and worker pools:
      [  609.707915] workqueue events: flags=0x0
      [  609.712615]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
      [  609.712626]     pending: drm_fb_helper_damage_work [drm_kms_helper]
      [  609.712687] workqueue events_freezable: flags=0x4
      [  609.732943]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
      [  609.732952]     pending: pci_pme_list_scan
      [  609.732968] workqueue events_power_efficient: flags=0x80
      [  609.751947]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
      [  609.751955]     pending: neigh_managed_work
      [  609.752018] workqueue kblockd: flags=0x18
      [  609.769480]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
      [  609.769488]     in-flight: 1020:blk_mq_run_work_fn
      [  609.769498]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
      [  609.769744] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=35s workers=2 idle: 67
      [  639.899730] BUG: workqueue lockup - pool cpus=10 node=1 flags=0x0 nice=-20 stuck for 66s!
      [  639.909513] Showing busy workqueues and worker pools:
      [  639.915404] workqueue events: flags=0x0
      [  639.920197]   pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 refcnt=2
      [  639.920215]     pending: drm_fb_helper_damage_work [drm_kms_helper]
      [  639.920365] workqueue kblockd: flags=0x18
      [  639.939932]   pwq 21: cpus=10 node=1 flags=0x0 nice=-20 active=3/256 refcnt=4
      [  639.939942]     in-flight: 1020:blk_mq_run_work_fn
      [  639.939955]     pending: blk_mq_timeout_work, blk_mq_run_work_fn
      [  639.940212] pool 21: cpus=10 node=1 flags=0x0 nice=-20 hung=66s workers=2 idle: 67
      
      Fixes: 6e6fcbc2 ("blk-mq: support batching dispatch in case of io")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Cc: stable@vger.kernel.org # v5.10+
      Link: https://lore.kernel.org/linux-block/20220310091649.zypaem5lkyfadymg@shindev/
      Link: https://lore.kernel.org/r/20220318022641.133484-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      572299f0
  8. 17 3月, 2022 1 次提交
  9. 16 3月, 2022 1 次提交
  10. 15 3月, 2022 4 次提交
    • M
      fs: Turn block_invalidatepage into block_invalidate_folio · 7ba13abb
      Matthew Wilcox (Oracle) 提交于
      Remove special-casing of a NULL invalidatepage, since there is no
      more block_invalidatepage.
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Tested-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Acked-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs
      Tested-by: David Howells <dhowells@redhat.com> # afs
      7ba13abb
    • T
      block: don't merge across cgroup boundaries if blkcg is enabled · 6b2b0459
      Tejun Heo 提交于
      blk-iocost and iolatency are cgroup aware rq-qos policies but they didn't
      disable merges across different cgroups. This obviously can lead to
      accounting and control errors but more importantly to priority inversions -
      e.g. an IO which belongs to a higher priority cgroup or IO class may end up
      getting throttled incorrectly because it gets merged to an IO issued from a
      low priority cgroup.
      
      Fix it by adding blk_cgroup_mergeable() which is called from merge paths and
      rejects cross-cgroup and cross-issue_as_root merges.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: d7067512 ("block: introduce blk-iolatency io controller")
      Cc: stable@vger.kernel.org # v4.19+
      Cc: Josef Bacik <jbacik@fb.com>
      Link: https://lore.kernel.org/r/Yi/eE/6zFNyWJ+qd@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      6b2b0459
    • T
      block: fix rq-qos breakage from skipping rq_qos_done_bio() · aa1b46dc
      Tejun Heo 提交于
      a647a524 ("block: don't call rq_qos_ops->done_bio if the bio isn't
      tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set.
      While this fixed a potential oops, it also broke blk-iocost by skipping the
      done_bio callback for merged bios.
      
      Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(),
      rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED
      distinguishing the former from the latter. rq_qos_done_bio() is not called
      for bios which wenth through rq_qos_merge(). This royally confuses
      blk-iocost as the merged bios never finish and are considered perpetually
      in-flight.
      
      One reliably reproducible failure mode is an intermediate cgroup geting
      stuck active preventing its children from being activated due to the
      leaf-only rule, leading to loss of control. The following is from
      resctl-bench protection scenario which emulates isolating a web server like
      workload from a memory bomb run on an iocost configuration which should
      yield a reasonable level of protection.
      
        # cat /sys/block/nvme2n1/device/model
        Samsung SSD 970 PRO 512GB
        # cat /sys/fs/cgroup/io.cost.model
        259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025
        # cat /sys/fs/cgroup/io.cost.qos
        259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00
        # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
        ...
        Memory Hog Summary
        ==================
      
        IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m
                    W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m
      
        Isolation and Request Latency Impact Distributions:
      
                      min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
        isol%       15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82
        lat-imp%        0     0     0     0     0  4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6
      
        Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96%
      
      The isolation result of 58.12% is close to what this device would show
      without any IO control.
      
      Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and
      calling rq_qos_done_bio() on them too. For consistency and clarity, rename
      BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into
      rq_qos_done_bio() so that it's next to the code paths that set the flags.
      
      With the patch applied, the above same benchmark shows:
      
        # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1
        ...
        Memory Hog Summary
        ==================
      
        IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m
                    W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m
      
        Isolation and Request Latency Impact Distributions:
      
                      min   p01   p05   p10   p25   p50   p75   p90   p95   p99   max  mean stdev
        isol%       84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42  2.81
        lat-imp%        0     0     0     0     0  2.81  5.73 11.11 13.92 17.53 22.61  4.10  4.68
      
        Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0%
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: a647a524 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked")
      Cc: stable@vger.kernel.org # v5.15+
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      aa1b46dc
    • M
      block: release rq qos structures for queue without disk · daaca352
      Ming Lei 提交于
      blkcg_init_queue() may add rq qos structures to request queue, previously
      blk_cleanup_queue() calls rq_qos_exit() to release them, but commit
      8e141f9e ("block: drain file system I/O on del_gendisk")
      moves rq_qos_exit() into del_gendisk(), so memory leak is caused
      because queues may not have disk, such as un-present scsi luns, nvme
      admin queue, ...
      
      Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back.
      
      BTW, v5.18 won't need this patch any more since we move
      blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release
      handler, and patches have been in for-5.18/block.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Fixes: 8e141f9e ("block: drain file system I/O on del_gendisk")
      Reported-by: syzbot+b42749a851a47a0f581b@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20220314043018.177141-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      daaca352