1. 08 3月, 2023 2 次提交
    • L
      block: Fix kabi broken by "block: split .sysfs_lock into two locks" · 5974d33c
      Li Lingfeng 提交于
      hulk inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6HOKY
      CVE: NA
      
      --------------------------------
      
      commit 92567b1d6f5a ("block: split .sysfs_lock into two locks") introduce
      new member 'sysfs_dir_lock' in struct request_queue, thus move it to
      request_queue_wrapper to avoid kabi broken.
      
      Fixes: 92567b1d6f5a ("block: split .sysfs_lock into two locks")
      Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      5974d33c
    • M
      block: split .sysfs_lock into two locks · e5264f98
      Ming Lei 提交于
      mainline inclusion
      from mainline-5.4-rc1
      commit cecf5d87
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6HOKY
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2&id=cecf5d87ff2035127bb5a9ee054d0023a4a7cad3
      
      ---------------------------
      
      The kernfs built-in lock of 'kn->count' is held in sysfs .show/.store
      path. Meantime, inside block's .show/.store callback, q->sysfs_lock is
      required.
      
      However, when mq & iosched kobjects are removed via
      blk_mq_unregister_dev() & elv_unregister_queue(), q->sysfs_lock is held
      too. This way causes AB-BA lock because the kernfs built-in lock of
      'kn-count' is required inside kobject_del() too, see the lockdep warning[1].
      
      On the other hand, it isn't necessary to acquire q->sysfs_lock for
      both blk_mq_unregister_dev() & elv_unregister_queue() because
      clearing REGISTERED flag prevents storing to 'queue/scheduler'
      from being happened. Also sysfs write(store) is exclusive, so no
      necessary to hold the lock for elv_unregister_queue() when it is
      called in switching elevator path.
      
      So split .sysfs_lock into two: one is still named as .sysfs_lock for
      covering sync .store, the other one is named as .sysfs_dir_lock
      for covering kobjects and related status change.
      
      sysfs itself can handle the race between add/remove kobjects and
      showing/storing attributes under kobjects. For switching scheduler
      via storing to 'queue/scheduler', we use the queue flag of
      QUEUE_FLAG_REGISTERED with .sysfs_lock for avoiding the race, then
      we can avoid to hold .sysfs_lock during removing/adding kobjects.
      
      [1]  lockdep warning
          ======================================================
          WARNING: possible circular locking dependency detected
          5.3.0-rc3-00044-g73277fc75ea0 #1380 Not tainted
          ------------------------------------------------------
          rmmod/777 is trying to acquire lock:
          00000000ac50e981 (kn->count#202){++++}, at: kernfs_remove_by_name_ns+0x59/0x72
      
          but task is already holding lock:
          00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          which lock already depends on the new lock.
      
          the existing dependency chain (in reverse order) is:
      
          -> #1 (&q->sysfs_lock){+.+.}:
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __mutex_lock+0x14a/0xa9b
                 blk_mq_hw_sysfs_show+0x63/0xb6
                 sysfs_kf_seq_show+0x11f/0x196
                 seq_read+0x2cd/0x5f2
                 vfs_read+0xc7/0x18c
                 ksys_read+0xc4/0x13e
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          -> #0 (kn->count#202){++++}:
                 check_prev_add+0x5d2/0xc45
                 validate_chain+0xed3/0xf94
                 __lock_acquire+0x95f/0xa2f
                 lock_acquire+0x1b4/0x1e8
                 __kernfs_remove+0x237/0x40b
                 kernfs_remove_by_name_ns+0x59/0x72
                 remove_files+0x61/0x96
                 sysfs_remove_group+0x81/0xa4
                 sysfs_remove_groups+0x3b/0x44
                 kobject_del+0x44/0x94
                 blk_mq_unregister_dev+0x83/0xdd
                 blk_unregister_queue+0xa0/0x10b
                 del_gendisk+0x259/0x3fa
                 null_del_dev+0x8b/0x1c3 [null_blk]
                 null_exit+0x5c/0x95 [null_blk]
                 __se_sys_delete_module+0x204/0x337
                 do_syscall_64+0xa7/0x295
                 entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
          other info that might help us debug this:
      
           Possible unsafe locking scenario:
      
                 CPU0                    CPU1
                 ----                    ----
            lock(&q->sysfs_lock);
                                         lock(kn->count#202);
                                         lock(&q->sysfs_lock);
            lock(kn->count#202);
      
           *** DEADLOCK ***
      
          2 locks held by rmmod/777:
           #0: 00000000e69bd9de (&lock){+.+.}, at: null_exit+0x2e/0x95 [null_blk]
           #1: 00000000fb16ae21 (&q->sysfs_lock){+.+.}, at: blk_unregister_queue+0x78/0x10b
      
          stack backtrace:
          CPU: 0 PID: 777 Comm: rmmod Not tainted 5.3.0-rc3-00044-g73277fc75ea0 #1380
          Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS ?-20180724_192412-buildhw-07.phx4
          Call Trace:
           dump_stack+0x9a/0xe6
           check_noncircular+0x207/0x251
           ? print_circular_bug+0x32a/0x32a
           ? find_usage_backwards+0x84/0xb0
           check_prev_add+0x5d2/0xc45
           validate_chain+0xed3/0xf94
           ? check_prev_add+0xc45/0xc45
           ? mark_lock+0x11b/0x804
           ? check_usage_forwards+0x1ca/0x1ca
           __lock_acquire+0x95f/0xa2f
           lock_acquire+0x1b4/0x1e8
           ? kernfs_remove_by_name_ns+0x59/0x72
           __kernfs_remove+0x237/0x40b
           ? kernfs_remove_by_name_ns+0x59/0x72
           ? kernfs_next_descendant_post+0x7d/0x7d
           ? strlen+0x10/0x23
           ? strcmp+0x22/0x44
           kernfs_remove_by_name_ns+0x59/0x72
           remove_files+0x61/0x96
           sysfs_remove_group+0x81/0xa4
           sysfs_remove_groups+0x3b/0x44
           kobject_del+0x44/0x94
           blk_mq_unregister_dev+0x83/0xdd
           blk_unregister_queue+0xa0/0x10b
           del_gendisk+0x259/0x3fa
           ? disk_events_poll_msecs_store+0x12b/0x12b
           ? check_flags+0x1ea/0x204
           ? mark_held_locks+0x1f/0x7a
           null_del_dev+0x8b/0x1c3 [null_blk]
           null_exit+0x5c/0x95 [null_blk]
           __se_sys_delete_module+0x204/0x337
           ? free_module+0x39f/0x39f
           ? blkcg_maybe_throttle_current+0x8a/0x718
           ? rwlock_bug+0x62/0x62
           ? __blkcg_punt_bio_submit+0xd0/0xd0
           ? trace_hardirqs_on_thunk+0x1a/0x20
           ? mark_held_locks+0x1f/0x7a
           ? do_syscall_64+0x4c/0x295
           do_syscall_64+0xa7/0x295
           entry_SYSCALL_64_after_hwframe+0x49/0xbe
          RIP: 0033:0x7fb696cdbe6b
          Code: 73 01 c3 48 8b 0d 1d 20 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 008
          RSP: 002b:00007ffec9588788 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
          RAX: ffffffffffffffda RBX: 0000559e589137c0 RCX: 00007fb696cdbe6b
          RDX: 000000000000000a RSI: 0000000000000800 RDI: 0000559e58913828
          RBP: 0000000000000000 R08: 00007ffec9587701 R09: 0000000000000000
          R10: 00007fb696d4eae0 R11: 0000000000000206 R12: 00007ffec95889b0
          R13: 00007ffec95896b3 R14: 0000559e58913260 R15: 0000559e589137c0
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      
      Conflicts:
        block/blk.h
        block/elevator.c
        block/blk-sysfs.c
        block/blk-core.c
      Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      e5264f98
  2. 09 10月, 2022 3 次提交
  3. 09 3月, 2022 2 次提交
  4. 17 1月, 2022 1 次提交
  5. 09 12月, 2021 1 次提交
  6. 30 8月, 2021 1 次提交
  7. 16 8月, 2021 2 次提交
    • Y
      blk-mq: fix kabi broken by "blk-mq: fix hang caused by freeze/unfreeze sequence" · 2fd10d61
      Yu Kuai 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 173119
      CVE: NA
      
      -----------------------------------------------
      
      Add struct request_queue_wrapper to avoid kabi broken.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2fd10d61
    • B
      blk-mq: fix hang caused by freeze/unfreeze sequence · cbd9fe6e
      Bob Liu 提交于
      mainline inclusion
      from mainline-v5.2-rc2
      commit 7996a8b5
      category: bugfix
      bugzilla: 173119
      CVE: NA
      
      -----------------------------------------------
      
      The following is a description of a hang in blk_mq_freeze_queue_wait().
      The hang happens on attempt to freeze a queue while another task does
      queue unfreeze.
      
      The root cause is an incorrect sequence of percpu_ref_resurrect() and
      percpu_ref_kill() and as a result those two can be swapped:
      
       CPU#0                         CPU#1
       ----------------              -----------------
       q1 = blk_mq_init_queue(shared_tags)
      
                                      q2 = blk_mq_init_queue(shared_tags):
                                        blk_mq_add_queue_tag_set(shared_tags):
                                          blk_mq_update_tag_set_depth(shared_tags):
      				     list_for_each_entry()
                                            blk_mq_freeze_queue(q1)
                                             > percpu_ref_kill()
                                             > blk_mq_freeze_queue_wait()
      
       blk_cleanup_queue(q1)
        blk_mq_freeze_queue(q1)
         > percpu_ref_kill()
                       ^^^^^^ freeze_depth can't guarantee the order
      
                                            blk_mq_unfreeze_queue()
                                              > percpu_ref_resurrect()
      
         > blk_mq_freeze_queue_wait()
                       ^^^^^^ Hang here!!!!
      
      This wrong sequence raises kernel warning:
      percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
      WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0
      
      But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
      which waits for a zero of a q_usage_counter, which never happens
      because percpu-ref was reinited (instead of being killed) and stays in
      PERCPU state forever.
      
      How to reproduce:
       - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
       - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
       - cpu1: python Script.py 1; taskset the corresponding process running on cpu1
      
       Script.py:
       ------
       #!/usr/bin/python3
      
      import os
      import sys
      
      while True:
          on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          os.system(on)
          os.system(off)
      ------
      
      This bug was first reported and fixed by Roman, previous discussion:
      [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
      [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org
      [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cbd9fe6e
  8. 22 2月, 2021 1 次提交
    • Y
      scsi: do quiesce for enclosure driver · c3adeb60
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 46860
      
      --------------------------------
      
      Drivers (such as scsi enclosure) will not call blk_register_queue()
      to do initialize for request_queue. And we rely on driver self
      to deal with the race in that case when cleanup queue.
      
      But, some self-developed drivers cannot deal the race. To avoid
      null pointer reference as following, we do quiesce in kernel.
      
      [67760.308034] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      ...
      [67760.308069] pc : blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308072] lr : blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308072] sp : ffff0000b2bb3ca0
      [67760.308073] x29: ffff0000b2bb3ca0 x28: 0000000000000000
      [67760.308075] x27: 0000000000000000 x26: ffff00008128a000
      [67760.308076] x25: ffff8042b13e9700 x24: 0000000000000000
      [67760.308077] x23: ffff0000b2bb3cf8 x22: ffff8042b2976808
      [67760.308078] x21: ffff0000b2bb3d58 x20: ffff00008128a000
      [67760.308079] x19: ffff8042b2976800 x18: 0000000000000000
      [67760.308080] x17: 0000000000000000 x16: 0000000000000000
      [67760.308081] x15: 0000000000000000 x14: 0000000000000000
      [67760.308082] x13: 0000000000000000 x12: 0000000000000000
      [67760.308084] x11: ffff800040801004 x10: ffff80004080100c
      [67760.308085] x9 : 0000000000000060 x8 : ffff809e58f7e500
      [67760.308087] x7 : 0000000000000000 x6 : 00000000ffffffff
      [67760.308088] x5 : ffff000080ab7550 x4 : ffff80418e84ec80
      [67760.308089] x3 : ffff0000815d7f10 x2 : 94e133dc71839c00
      [67760.308090] x1 : 0000000000000000 x0 : ffff00008128a748
      [67760.308091] Call trace:
      [67760.308093]  blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308095]  blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308096]  __blk_mq_run_hw_queue+0x98/0x138
      [67760.308097]  blk_mq_run_work_fn+0x28/0x38
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      c3adeb60
  9. 12 3月, 2020 1 次提交
  10. 27 12月, 2019 4 次提交
  11. 12 9月, 2018 1 次提交
  12. 09 8月, 2018 1 次提交
    • B
      block: Remove two superfluous #include directives · b1f4267c
      Bart Van Assche 提交于
      Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
      only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
      did not remove the corresponding #include directives. Since these
      include directives are no longer needed, remove them.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f4267c
  13. 27 7月, 2018 1 次提交
  14. 18 7月, 2018 1 次提交
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  15. 13 7月, 2018 1 次提交
  16. 09 7月, 2018 5 次提交
  17. 27 6月, 2018 1 次提交
  18. 15 6月, 2018 1 次提交
  19. 14 6月, 2018 1 次提交
  20. 31 5月, 2018 1 次提交
  21. 29 5月, 2018 5 次提交
  22. 14 5月, 2018 1 次提交
  23. 09 5月, 2018 2 次提交