1. 07 6月, 2023 7 次提交
  2. 06 6月, 2023 4 次提交
  3. 05 6月, 2023 1 次提交
  4. 03 6月, 2023 25 次提交
    • L
      md/raid10: fix incorrect done of recovery · 304e8d84
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188535, https://gitee.com/openeuler/kernel/issues/I6O61Q
      CVE: NA
      
      --------------------------------
      
      Recovery will go to giveup and let chunks_skipped++ in raid10_sync_request
      if there are some bad_blocks, and it will return max_sector when
      chunks_skipped >= geo.raid_disks. Now, recovery fail and data is
      inconsistent but user think recovery is done, it is wrong.
      
      Fix it by set mirror's recovery_disabled and spare device shouln't be
      added to here.
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit b0ac58c9)
      304e8d84
    • L
      md/raid10: fix null-ptr-deref in raid10_sync_request · 94831546
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188378, https://gitee.com/openeuler/kernel/issues/I6GGV7
      CVE: NA
      
      --------------------------------
      
      init_resync() init mempool and set conf->have_replacemnt at the begaining
      of sync, close_sync() free the mempool when sync is completed.
      
      After commit 7e83ccbe ("md/raid10: Allow skipping recovery when clean
      arrays are assembled"), recovery might skipped and init_resync() is called
      but close_sync() is not. null-ptr-deref occurs as below:
        1) creat a array, wait for resync to complete, mddev->recovery_cp is set
           to MaxSector.
        2) recovery is woken and it is skipped. conf->have_replacement is set to
           0 in init_resync(). close_sync() not called.
        3) some io errors and rdev A is set to WantReplacement.
        4) a new device is added and set to A's replacement.
        5) recovery is woken, A have replacement, but conf->have_replacemnt is
           0. r10bio->dev[i].repl_bio will not be alloced and null-ptr-deref
           occurs.
      
      Fix it by not init_resync() if recovery skipped.
      
      Fixes: 7e83ccbe md/raid10: Allow skipping recovery when clean arrays are assembled")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 2de30b8f)
      94831546
    • L
      block/badblocks: fix badblocks loss when badblocks combine · cbc212e5
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6ZG5B
      CVE: NA
      
      --------------------------------
      
      badblocks will loss if we set it as below:
      
        # echo 1 1 > bad_blocks
        # echo 3 1 > bad_blocks
        # echo 1 5 > bad_blocks
        # cat bad_blocks
          1 3
      
      we will combine badblocks if there is an intersection between p[lo] and
      p[hi] in badblocks_set(). The end of new badblocks is p[hi]'s end now. but
      p[lo] may cross p[hi] and new end should be the larger of p[lo] and p[hi].
        lo: |------------------------|
        hi:		|--------|
      
      Fixes: 9e0e252a ("badblocks: Add core badblock management code")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit e35a7762)
      cbc212e5
    • L
      block/badblocks: fix the bug of reverse order · f4b34d10
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6ZG5B
      CVE: NA
      
      --------------------------------
      
      Order of badblocks will be reversed if we set a large area at once. 'hi'
      remains unchanged while adding continuous badblocks is wrong, the next
      setting is greater than 'hi', it should be added to the next position.
      Let 'hi' +1 each cycle.
      
        # echo 0 2048 > bad_blocks
        # cat bad_blocks
          1536 512
          1024 512
          512 512
          0 512
      
      Fixes: 9e0e252a ("badblocks: Add core badblock management code")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit f9a3eea0)
      f4b34d10
    • L
      md: fix unexpected changes of return value in rdev_set_badblocks · 74720ee6
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ
      CVE: NA
      
      --------------------------------
      
      If we set any badblocks fail, we will remove this rdev(set it to Faulty
      or set recovery_disabled). Previous patch "md/raid10: fix io hung in
      md_wait_for_blocked_rdev()" check badblocks->changed instead of return
      value in rdev_set_badblocks(), but return value of this func also changed
      accordingly, which is not what we expected.
      
      Keep the return value consistent with before.
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit bebf3d97)
      74720ee6
    • L
      md/raid10: fix io hung in md_wait_for_blocked_rdev() · 1f407ca9
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ
      CVE: NA
      
      --------------------------------
      
      If badblocks are merged but bb->count exceedded, badblocks_set() will
      return 1 and merged badblocks will become un-ack. rdev_set_badblocks()
      will not set sb_flags and wakeup mddev->thread, io wait in
      md_wait_for_blocked_rdev() will hung because BlockedBadBlocks may not be
      cleared.
      
      Fix it by checking badblocks->changed instead of return value. This flag
      is set when badblocks changes.
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit c23e1cd1)
      1f407ca9
    • L
      block: Only set bb->changed when badblocks changes · 219f6154
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188569, https://gitee.com/openeuler/kernel/issues/I6XBZQ
      CVE: NA
      
      --------------------------------
      
      bb->changed and unacked_exist is set and badblocks_update_acked() is
      involked even if no badblocks changes in badblocks_set(). Only update
      them when badblocks changes.
      
      Fixes: 9e0e252a ("badblocks: Add core badblock management code")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 78cba163)
      219f6154
    • L
      md/raid10: fix incorrect counting of rdev->nr_pending · 24ad8fdd
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188605, https://gitee.com/openeuler/kernel/issues/I6ZJ3T
      CVE: NA
      
      --------------------------------
      
      We get rdev from mirrors.replacement twice in raid10_write_request().
      If replacement changes between two reads, it will increase A->nr_pending
      and decrease B->nr_pending.
      
        T1 (write)	   T2 (remove)	    T3 (add)
                         raid10_remove_disk
      
        raid10_write_request
         rrdev = conf->mirrors[d].replacement; ->rdev A
         A nr_pending++
      
                          p->rdev = p->replacement; ->rdev A
                          p->replacement = NULL;
      
      				    //A it set to WantReplacement
                                          raid10_add_disk
      				     p->replacement = rdev; ->rdev B
      
         if blocked_rdev
          rdev = conf->mirrors[d].replacement; ->rdev B
          B nr_pending--
      
      We will record rdev in r10bio, and get rdev from r10bio to fix it.
      
      Fixes: 475b0321 ("md/raid10: writes should get directed to replacement as well as original.")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 7b3b8187)
      24ad8fdd
    • L
      md/raid10: remove WANR_ON_ONCE in raid10_end_write_request · 7599ee43
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188605, https://gitee.com/openeuler/kernel/issues/I6GOYF
      CVE: NA
      
      --------------------------------
      
      It might read mirror.redev first and then mirror->replacement because of
      memory reordering in raid10_end_write_request(), WARN_ON occurs if we
      remove disk at the same time.
      
        T1 remove			T2 io end
        raid10_remove_disk		raid10_end_write_request
         p->rdev = NULL
      				 read rdev -> NULL
         smp_mb
         p->replacement = NULL
      				 read replacement -> NULL
      
      It is meaningless to compare rdev with mirror->rdev after we get it from
      r10_bio in raid10_end_write_request(). Remove this WANR_ON_ONCE.
      
      Fixes: 2ecf5e6ecbfd ("md/raid10: fix uaf if replacement replaces rdev")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit a3ebeed7)
      7599ee43
    • L
      md/raid10: fix uaf if replacement replaces rdev · a7cc3cf3
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188377, https://gitee.com/openeuler/kernel/issues/I6GOYF
      CVE: NA
      
      --------------------------------
      
      After commit 4ca40c2c ("md/raid10: Allow replacement device to be
      replace old drive.") mirrors->replacement can replace rdev during
      replacement's io pending, and repl_bio will write rdev (see
      raid10_write_one_disk()). We will get wrong device by r10conf in
      raid10_end_write_request(). In which case, r10_bio->devs[slot].repl_bio
      will be put but not set to IO_MADE_GOOD, and it will be put again later in
      raid_end_bio_io(), uaf occurs.
      
      Fix it by using r10_bio to record rdev. Put the operations of io fail and
      no replacement together, so no need to change repl.
      
        ==================================================================
        BUG: KASAN: use-after-free in bio_flagged include/linux/bio.h:238 [inline]
        BUG: KASAN: use-after-free in bio_put+0x78/0x80 block/bio.c:650
        Read of size 2 at addr ffff888116524dd4 by task md0_raid10/2618
      
        CPU: 0 PID: 2618 Comm: md0_raid10 Not tainted 5.10.0+ #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        sd 0:0:0:0: rejecting I/O to offline device
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x107/0x167 lib/dump_stack.c:118
         print_address_description.constprop.0+0x1c/0x270 mm/kasan/report.c:390
         __kasan_report mm/kasan/report.c:550 [inline]
         kasan_report.cold+0x22/0x3a mm/kasan/report.c:567
         bio_flagged include/linux/bio.h:238 [inline]
         bio_put+0x78/0x80 block/bio.c:650
         put_all_bios drivers/md/raid10.c:248 [inline]
         free_r10bio drivers/md/raid10.c:257 [inline]
         raid_end_bio_io+0x3b5/0x590 drivers/md/raid10.c:309
         handle_write_completed drivers/md/raid10.c:2699 [inline]
         raid10d+0x2f85/0x5af0 drivers/md/raid10.c:2759
         md_thread+0x444/0x4b0 drivers/md/md.c:7932
         kthread+0x38c/0x470 kernel/kthread.c:313
         ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299
      
        Allocated by task 1400:
         kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
         kasan_set_track mm/kasan/common.c:56 [inline]
         set_alloc_info mm/kasan/common.c:498 [inline]
         __kasan_kmalloc.constprop.0+0xb5/0xe0 mm/kasan/common.c:530
         slab_post_alloc_hook mm/slab.h:512 [inline]
         slab_alloc_node mm/slub.c:2923 [inline]
         slab_alloc mm/slub.c:2931 [inline]
         kmem_cache_alloc+0x144/0x360 mm/slub.c:2936
         mempool_alloc+0x146/0x360 mm/mempool.c:391
         bio_alloc_bioset+0x375/0x610 block/bio.c:486
         bio_clone_fast+0x20/0x50 block/bio.c:711
         raid10_write_one_disk+0x166/0xd30 drivers/md/raid10.c:1240
         raid10_write_request+0x1600/0x2c90 drivers/md/raid10.c:1484
         __make_request drivers/md/raid10.c:1508 [inline]
         raid10_make_request+0x376/0x620 drivers/md/raid10.c:1537
         md_handle_request+0x699/0x970 drivers/md/md.c:451
         md_submit_bio+0x204/0x400 drivers/md/md.c:489
         __submit_bio block/blk-core.c:959 [inline]
         __submit_bio_noacct block/blk-core.c:1007 [inline]
         submit_bio_noacct+0x2e3/0xcf0 block/blk-core.c:1086
         submit_bio+0x1a0/0x3a0 block/blk-core.c:1146
         submit_bh_wbc+0x685/0x8e0 fs/buffer.c:3053
         ext4_commit_super+0x37e/0x6c0 fs/ext4/super.c:5696
         flush_stashed_error_work+0x28b/0x400 fs/ext4/super.c:791
         process_one_work+0x9a6/0x1590 kernel/workqueue.c:2280
         worker_thread+0x61d/0x1310 kernel/workqueue.c:2426
         kthread+0x38c/0x470 kernel/kthread.c:313
         ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:299
      
        Freed by task 2618:
         kasan_save_stack+0x1b/0x40 mm/kasan/common.c:48
         kasan_set_track+0x1c/0x30 mm/kasan/common.c:56
         kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:361
         __kasan_slab_free+0x151/0x180 mm/kasan/common.c:482
         slab_free_hook mm/slub.c:1569 [inline]
         slab_free_freelist_hook+0xa9/0x180 mm/slub.c:1608
         slab_free mm/slub.c:3179 [inline]
         kmem_cache_free+0xcd/0x3d0 mm/slub.c:3196
         mempool_free+0xe3/0x3b0 mm/mempool.c:500
         bio_free+0xe2/0x140 block/bio.c:266
         bio_put+0x58/0x80 block/bio.c:651
         raid10_end_write_request+0x885/0xb60 drivers/md/raid10.c:516
         bio_endio+0x376/0x6a0 block/bio.c:1465
         req_bio_endio block/blk-core.c:289 [inline]
         blk_update_request+0x5f5/0xf40 block/blk-core.c:1525
         blk_mq_end_request+0x4c/0x510 block/blk-mq.c:654
         blk_flush_complete_seq+0x835/0xd80 block/blk-flush.c:204
         flush_end_io+0x7b7/0xb90 block/blk-flush.c:261
         __blk_mq_end_request+0x282/0x4c0 block/blk-mq.c:645
         scsi_end_request+0x3a8/0x850 drivers/scsi/scsi_lib.c:607
         scsi_io_completion+0x3f5/0x1320 drivers/scsi/scsi_lib.c:970
         scsi_softirq_done+0x11b/0x490 drivers/scsi/scsi_lib.c:1448
         blk_mq_complete_request block/blk-mq.c:788 [inline]
         blk_mq_complete_request+0x84/0xb0 block/blk-mq.c:785
         scsi_mq_done+0x155/0x360 drivers/scsi/scsi_lib.c:1603
         virtscsi_vq_done drivers/scsi/virtio_scsi.c:184 [inline]
         virtscsi_req_done+0x14c/0x220 drivers/scsi/virtio_scsi.c:199
         vring_interrupt drivers/virtio/virtio_ring.c:2061 [inline]
         vring_interrupt+0x27a/0x300 drivers/virtio/virtio_ring.c:2047
         __handle_irq_event_percpu+0x2f8/0x830 kernel/irq/handle.c:156
         handle_irq_event_percpu kernel/irq/handle.c:196 [inline]
         handle_irq_event+0x105/0x280 kernel/irq/handle.c:213
         handle_edge_irq+0x258/0xd20 kernel/irq/chip.c:828
         asm_call_irq_on_stack+0xf/0x20
         __run_irq_on_irqstack arch/x86/include/asm/irq_stack.h:48 [inline]
         run_irq_on_irqstack_cond arch/x86/include/asm/irq_stack.h:101 [inline]
         handle_irq arch/x86/kernel/irq.c:230 [inline]
         __common_interrupt arch/x86/kernel/irq.c:249 [inline]
         common_interrupt+0xe2/0x190 arch/x86/kernel/irq.c:239
         asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:626
      
      Fixes: 4ca40c2c ("md/raid10: Allow replacement device to be replace old drive.")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit af959500)
      a7cc3cf3
    • L
      md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request · 02fd87d7
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188527, https://gitee.com/openeuler/kernel/issues/I6O3HO
      CVE: NA
      
      --------------------------------
      
      need_replace will be set to 1 if no-Faulty mreplace exists, and mreplace
      will be deref later. However, the latter check of mreplace might set
      mreplace to NULL, null-ptr-deref occurs if need_replace is 1 at this time.
      
      Fix it by merging two checks into one.
      
      Fixes: ee37d731 ("md/raid10: Fix raid10 replace hang when new added disk faulty")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 7718714e)
      02fd87d7
    • L
      md/raid10: fix io loss while replacement replace rdev · f76a47d5
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188787, https://gitee.com/openeuler/kernel/issues/I78YIW
      CVE: NA
      
      --------------------------------
      
      When we remove a disk which has replacement, first set rdev to NULL
      and then set replacement to rdev, finally set replacement to NULL (see
      raid10_remove_disk()). If io is submitted during the same time, it might
      read both rdev and replacement as NULL, and io will not be submitted.
      
        rdev -> NULL
                              read rdev
        replacement -> NULL
                              read replacement
      
      Fix it by reading replacement first and rdev later, meanwhile, use smp_mb()
      to prevent memory reordering.
      
      Fixes: 475b0321 ("md/raid10: writes should get directed to replacement as well as original.")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit e8025850)
      f76a47d5
    • L
      md/raid10: prioritize adding disk to 'removed' mirror · 9ddd479b
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188804, https://gitee.com/openeuler/kernel/issues/I78YIS
      CVE: NA
      
      --------------------------------
      
      When add a new disk to raid10, it will traverse conf->mirror from start
      and find one of the following mirror:
        1. mirror->rdev is set to WantReplacement and it have no replacement,
           set new disk to mirror->replacement.
        2. no rdev, set new disk to mirror->rdev.
      
      There is a array as below (sda is set to WantReplacement):
      
          Number   Major   Minor   RaidDevice State
             0       8        0        0      active sync set-A   /dev/sda
             -       0        0        1      removed
             2       8       32        2      active sync set-A   /dev/sdc
             3       8       48        3      active sync set-B   /dev/sdd
      
      Use 'mdadm --add' to add a new disk to this array, the new disk will
      become sda's replacement instead of add to removed position, which is
      confusing for users. Meanwhile, after new disk recovery success, sda
      will be set to Faulty.
      
      Prioritize adding disk to 'removed' mirror is a better choice. In the
      above scenario, the behavior is the same as before, except sda will not
      be deleted. Before other disks are added, continued use sda is more
      reliable.
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 2e2e7ab6)
      9ddd479b
    • L
      md: fix io loss when remove rdev fail · 37f812e2
      Li Nan 提交于
      hulk inclusion
      category: bugfix, https://gitee.com/openeuler/kernel/issues/I71EKW
      bugzilla: 188628
      CVE: NA
      
      --------------------------------
      
      We first set rdev to WantRemove, and check if there is any io
      pending, if so, we will clear flag and return busy in
      raid10_remove_disk(). io will loss as below:
      
        raid10_remove_disk
         set WantRemove
      			write rdev
      			 if WantRemove
      			  do not submit io
         if rdev->nr_pending
          clear WantRemove
          return BUSY
      					read rdev
      					 get error data
      
      Fix it by md_error the rdev which io pending while removing. When the code
      reaches this point, it means this rdev will be removed later, so setting
      it as faulty has little impact.
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 894f89fa)
      37f812e2
    • L
      md/raid10: fix a race between removing rdev and access conf->mirrors[i].rdev · e31232eb
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188533, https://gitee.com/openeuler/kernel/issues/I6O7YB
      CVE: NA
      
      --------------------------------
      
      commit ceff49d9 ("md/raid1: fix a race between removing rdev and
      access conf->mirrors[i].rdev") fix a null-ptr-deref about raid1. There
      is same bug in raid10 and fix it in the same way.
      
      There is no sync_thread running while removing rdev, no need to check
      the flag in raid10_sync_request().
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 4461a62e)
      e31232eb
    • L
      md/raid10: fix taks hung in raid10d · 1e2b11c4
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC
      CVE: NA
      
      --------------------------------
      
      commit fe630de0 ("md/raid10: avoid deadlock on recovery.") allowed
      normal io and sync io to exist at the same time. Task hung will occur as
      below:
      
      T1                      T2		T3		T4
      raid10d
       handle_read_error
        allow_barrier
         conf->nr_pending--
          -> 0
                              //submit sync io
                              raid10_sync_request
                               raise_barrier
      			  ->will not be blocked
      			  ...
      			//submit to drivers
        raid10_read_request
         wait_barrier
          conf->nr_pending++
           -> 1
      					//retry read fail
      					raid10_end_read_request
      					 reschedule_retry
      					  add to retry_list
      					  conf->nr_queued++
      					   -> 1
      							//sync io fail
      							end_sync_read
      							 __end_sync_read
      							  reschedule_retry
      							   add to retry_list
      					                    conf->nr_queued++
      							     -> 2
       ...
       handle_read_error
        freeze_array
         wait nr_pending == nr_queued+1
              ->1	      ->3
         //task hung
      
      retry read and sync io will be added to retry_list(nr_queued->2) if they
      fails. raid10d() called handle_read_error() and hung in freeze_array().
      nr_queued will not decrease because raid10d is blocked, nr_pending will
      not increase because conf->barrier is not released.
      
      Fix it by moving allow_barrier() after raid10_read_request().
      raise_barrier() will wait for nr_waiting to become 0. Therefore, sync io
      and regular io will not be issued at the same time.
      
      We also removed the check of nr_queued. It can be 0 but don't need to be
      blocked. MD_RECOVERY_RUNNING always is set after this patch, because all
      sync io is waitting in raise_barrier(), remove it, too.
      
      Fixes: fe630de0 ("md/raid10: avoid deadlock on recovery.")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 1fe782f0)
      1e2b11c4
    • Y
      md/raid10: factor out code from wait_barrier() to stop_waiting_barrier() · 1a4e4cab
      Yu Kuai 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit ed2e063f
      category: bugfix
      bugzilla: 188380, https://gitee.com/openeuler/kernel/issues/I6GISC
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=ed2e063f92c44c891ccd883e289dde6ca870edcc
      
      --------------------------------
      
      Currently the nasty condition in wait_barrier() is hard to read. This
      patch factors out the condition into a function.
      
      There are no functional changes.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Acked-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: NSong Liu <song@kernel.org>
      
      conflict:
      	drivers/md/raid10.c
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 7aad54e0)
      1a4e4cab
    • L
      md/raid10: fix softlockup in raid10_unplug · 32aceee5
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188628, https://gitee.com/openeuler/kernel/issues/I6WKDR
      CVE: NA
      
      --------------------------------
      
      There is no limit to the number of io for raid10 plug, whitch may result
      in excessive memory usage and potential softlockup when a large number of
      io are submitted at once. There is no good way to fix it now, just add
      schedule point to prevent softlockup.
      
      Fixes: 57c67df4 ("md/raid10: submit IO from originating thread instead of md thread.")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit f8cecf7a)
      32aceee5
    • J
      md/raid1: stop mdx_raid1 thread when raid1 array run failed · 025dac6f
      Jiang Li 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit b611ad14
      category: bugfix
      bugzilla: 188662, https://gitee.com/openeuler/kernel/issues/I6UMUF
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=b611ad14006e5be2170d9e8e611bf49dff288911
      
      --------------------------------
      
      fail run raid1 array when we assemble array with the inactive disk only,
      but the mdx_raid1 thread were not stop, Even if the associated resources
      have been released. it will caused a NULL dereference when we do poweroff.
      
      This causes the following Oops:
          [  287.587787] BUG: kernel NULL pointer dereference, address: 0000000000000070
          [  287.594762] #PF: supervisor read access in kernel mode
          [  287.599912] #PF: error_code(0x0000) - not-present page
          [  287.605061] PGD 0 P4D 0
          [  287.607612] Oops: 0000 [#1] SMP NOPTI
          [  287.611287] CPU: 3 PID: 5265 Comm: md0_raid1 Tainted: G     U            5.10.146 #0
          [  287.619029] Hardware name: xxxxxxx/To be filled by O.E.M, BIOS 5.19 06/16/2022
          [  287.626775] RIP: 0010:md_check_recovery+0x57/0x500 [md_mod]
          [  287.632357] Code: fe 01 00 00 48 83 bb 10 03 00 00 00 74 08 48 89 ......
          [  287.651118] RSP: 0018:ffffc90000433d78 EFLAGS: 00010202
          [  287.656347] RAX: 0000000000000000 RBX: ffff888105986800 RCX: 0000000000000000
          [  287.663491] RDX: ffffc90000433bb0 RSI: 00000000ffffefff RDI: ffff888105986800
          [  287.670634] RBP: ffffc90000433da0 R08: 0000000000000000 R09: c0000000ffffefff
          [  287.677771] R10: 0000000000000001 R11: ffffc90000433ba8 R12: ffff888105986800
          [  287.684907] R13: 0000000000000000 R14: fffffffffffffe00 R15: ffff888100b6b500
          [  287.692052] FS:  0000000000000000(0000) GS:ffff888277f80000(0000) knlGS:0000000000000000
          [  287.700149] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          [  287.705897] CR2: 0000000000000070 CR3: 000000000320a000 CR4: 0000000000350ee0
          [  287.713033] Call Trace:
          [  287.715498]  raid1d+0x6c/0xbbb [raid1]
          [  287.719256]  ? __schedule+0x1ff/0x760
          [  287.722930]  ? schedule+0x3b/0xb0
          [  287.726260]  ? schedule_timeout+0x1ed/0x290
          [  287.730456]  ? __switch_to+0x11f/0x400
          [  287.734219]  md_thread+0xe9/0x140 [md_mod]
          [  287.738328]  ? md_thread+0xe9/0x140 [md_mod]
          [  287.742601]  ? wait_woken+0x80/0x80
          [  287.746097]  ? md_register_thread+0xe0/0xe0 [md_mod]
          [  287.751064]  kthread+0x11a/0x140
          [  287.754300]  ? kthread_park+0x90/0x90
          [  287.757974]  ret_from_fork+0x1f/0x30
      
      In fact, when raid1 array run fail, we need to do
      md_unregister_thread() before raid1_free().
      Signed-off-by: NJiang Li <jiang.li@ugreen.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 22eeb5d1)
      025dac6f
    • L
      md: fix sysfs duplicate file while adding rdev · 0b18dcc1
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX
      CVE: NA
      
      --------------------------------
      
      rdev->del_work has not been queued to md_rdev_misc_wq and flush_workqueue
      will not flush it if tow threads add and remove same device. sysfs might
      WARN duplicate filename as below.
      
          //T1	             //T2
          mdadm write super
      			     add success
      			     remove
      			      unbind_rdev_from_array
      
          md_ioctl
           flush_workqueue
      			      INIT_WORK
                                     queue_work
           md_add_new_disk
            duplicate filename dev-xxx
      
      Check if there is any kobj with the same name, and return busy if true.
      
      Fixes: 5792a285 ("md: avoid a deadlock when removing a device from an md array via sysfs")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 5815341f)
      0b18dcc1
    • L
      md: replace invalid function flush_rdev_wq() with flush_workqueue() · 6912b8bb
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188553, https://gitee.com/openeuler/kernel/issues/I6TNFX
      CVE: NA
      
      --------------------------------
      
      If we want to remove a device, first we delete it from mddev->disks list,
      then init rdev->del_work to put it (see unbind_rdev_from_array()).
      
      flush_rdev_wq() traverses mddev->disks to check if there is any pending
      rdev->del_work, if so, flush it. Howerver, rdev will not be in the list of
      mddev->disks if rdev->del_work exists, and flush_workqueue() will never be
      executed.
      
      Replace it with flush_workqueue() to ensure del_work has been completed
      when adding devices.
      
      Fixes: cc1ffe61 ("md: add new workqueue for delete rdev")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit ff461e2d)
      6912b8bb
    • D
      md: Flush workqueue md_rdev_misc_wq in md_alloc() · 47a700c6
      David Sloan 提交于
      mainline inclusion
      from mainline-v6.0-rc3
      commit 5e8daf90
      category: bugfix
      bugzilla: 188015, https://gitee.com/openeuler/kernel/issues/I6OERX
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=5e8daf906f890560df430d30617c692a794acb73
      
      --------------------------------
      
      A race condition still exists when removing and re-creating md devices
      in test cases. However, it is only seen on some setups.
      
      The race condition was tracked down to a reference still being held
      to the kobject by the rdev in the md_rdev_misc_wq which will be released
      in rdev_delayed_delete().
      
      md_alloc() waits for previous deletions by waiting on the md_misc_wq,
      but the md_rdev_misc_wq may still be holding a reference to a recently
      removed device.
      
      To fix this, also flush the md_rdev_misc_wq in md_alloc().
      Signed-off-by: NDavid Sloan <david.sloan@eideticom.com>
      [logang@deltatee.com: rewrote commit message]
      Signed-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      
      Conflict:
      	drivers/md/md.c
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 5fa41917)
      47a700c6
    • J
      block: don't allow the same type rq_qos add more than once · a1909dad
      Jinke Han 提交于
      mainline inclusion
      from mainline-v6.0-rc1
      commit 14a6e2eb
      category: bugfix
      bugzilla: 188088, https://gitee.com/openeuler/kernel/issues/I66GIL
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=14a6e2eb7df5c7897c15b109cba29ab0c4a791b6
      
      ----------------------------------------------------------------------
      
      In our test of iocost, we encountered some list add/del corruptions of
      inner_walk list in ioc_timer_fn.
      
      The reason can be described as follows:
      
      cpu 0					cpu 1
      ioc_qos_write				ioc_qos_write
      
      ioc = q_to_ioc(queue);
      if (!ioc) {
              ioc = kzalloc();
      					ioc = q_to_ioc(queue);
      					if (!ioc) {
      						ioc = kzalloc();
      						...
      						rq_qos_add(q, rqos);
      					}
              ...
              rq_qos_add(q, rqos);
              ...
      }
      
      When the io.cost.qos file is written by two cpus concurrently, rq_qos may
      be added to one disk twice. In that case, there will be two iocs enabled
      and running on one disk. They own different iocgs on their active list. In
      the ioc_timer_fn function, because of the iocgs from two iocs have the
      same root iocg, the root iocg's walk_list may be overwritten by each other
      and this leads to list add/del corruptions in building or destroying the
      inner_walk list.
      
      And so far, the blk-rq-qos framework works in case that one instance for
      one type rq_qos per queue by default. This patch make this explicit and
      also fix the crash above.
      Signed-off-by: NJinke Han <hanjinke.666@bytedance.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Link: https://lore.kernel.org/r/20220720093616.70584-1-hanjinke.666@bytedance.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      
      Conflicts:
      	block/blk-rq-qos.h
      	block/blk-wbt.c
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 8ce9b7c4)
      a1909dad
    • L
      blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost · cf9b73f0
      Li Nan 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188152, https://gitee.com/openeuler/kernel/issues/I67BPT
      CVE: NA
      
      -------------------------------
      
      adjust_inuse_and_calc_cost() use spin_lock_irq and IRQ will enable when
      unlock. DEADLOCK might happen if we have held other locks before:
      
        ================================
        WARNING: inconsistent lock state
        5.10.0-02758-g8e5f91fd772f #26 Not tainted
        --------------------------------
        inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
        kworker/2:3/388 [HC0[0]:SC0[0]:HE0:SE1] takes:
        ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
        ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
        {IN-HARDIRQ-W} state was registered at:
          __lock_acquire+0x3d7/0x1070
          lock_acquire+0x197/0x4a0
          __raw_spin_lock_irqsave
          _raw_spin_lock_irqsave+0x3b/0x60
          bfq_idle_slice_timer_body
          bfq_idle_slice_timer+0x53/0x1d0
          __run_hrtimer+0x477/0xa70
          __hrtimer_run_queues+0x1c6/0x2d0
          hrtimer_interrupt+0x302/0x9e0
          local_apic_timer_interrupt
          __sysvec_apic_timer_interrupt+0xfd/0x420
          run_sysvec_on_irqstack_cond
          sysvec_apic_timer_interrupt+0x46/0xa0
          asm_sysvec_apic_timer_interrupt+0x12/0x20
        irq event stamp: 837522
        hardirqs last  enabled at (837521): [<ffffffff84b9419d>] __raw_spin_unlock_irqrestore
        hardirqs last  enabled at (837521): [<ffffffff84b9419d>] _raw_spin_unlock_irqrestore+0x3d/0x40
        hardirqs last disabled at (837522): [<ffffffff84b93fa3>] __raw_spin_lock_irq
        hardirqs last disabled at (837522): [<ffffffff84b93fa3>] _raw_spin_lock_irq+0x43/0x50
        softirqs last  enabled at (835852): [<ffffffff84e00558>] __do_softirq+0x558/0x8ec
        softirqs last disabled at (835845): [<ffffffff84c010ff>] asm_call_irq_on_stack+0xf/0x20
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&bfqd->lock);
          <Interrupt>
            lock(&bfqd->lock);
      
         *** DEADLOCK ***
      
        3 locks held by kworker/2:3/388:
         #0: ffff888107af0f38 ((wq_completion)kthrotld){+.+.}-{0:0}, at: process_one_work+0x742/0x13f0
         #1: ffff8881176bfdd8 ((work_completion)(&td->dispatch_work)){+.+.}-{0:0}, at: process_one_work+0x777/0x13f0
         #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: spin_lock_irq
         #2: ffff888118c00c28 (&bfqd->lock){?.-.}-{2:2}, at: bfq_bio_merge+0x141/0x390
      
        stack backtrace:
        CPU: 2 PID: 388 Comm: kworker/2:3 Not tainted 5.10.0-02758-g8e5f91fd772f #26
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
        Workqueue: kthrotld blk_throtl_dispatch_work_fn
        Call Trace:
         __dump_stack lib/dump_stack.c:77 [inline]
         dump_stack+0x107/0x167
         print_usage_bug
         valid_state
         mark_lock_irq.cold+0x32/0x3a
         mark_lock+0x693/0xbc0
         mark_held_locks+0x9e/0xe0
         __trace_hardirqs_on_caller
         lockdep_hardirqs_on_prepare.part.0+0x151/0x360
         trace_hardirqs_on+0x5b/0x180
         __raw_spin_unlock_irq
         _raw_spin_unlock_irq+0x24/0x40
         spin_unlock_irq
         adjust_inuse_and_calc_cost+0x4fb/0x970
         ioc_rqos_merge+0x277/0x740
         __rq_qos_merge+0x62/0xb0
         rq_qos_merge
         bio_attempt_back_merge+0x12c/0x4a0
         blk_mq_sched_try_merge+0x1b6/0x4d0
         bfq_bio_merge+0x24a/0x390
         __blk_mq_sched_bio_merge+0xa6/0x460
         blk_mq_sched_bio_merge
         blk_mq_submit_bio+0x2e7/0x1ee0
         __submit_bio_noacct_mq+0x175/0x3b0
         submit_bio_noacct+0x1fb/0x270
         blk_throtl_dispatch_work_fn+0x1ef/0x2b0
         process_one_work+0x83e/0x13f0
         process_scheduled_works
         worker_thread+0x7e3/0xd80
         kthread+0x353/0x470
         ret_from_fork+0x1f/0x30
      
      Fixes: b0853ab4 ("blk-iocost: revamp in-period donation snapbacks")
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit 60e8843c)
      cf9b73f0
    • Y
      blk-iocost: don't allow to configure bio based device · a3cb1621
      Yu Kuai 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188033, https://gitee.com/openeuler/kernel/issues/I663ZP
      CVE: NA
      
      --------------------------------
      
      iocost is based on rq_qos, which can only work for request based device,
      thus it doesn't make sense to configure iocost for bio based device.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLi Nan <linan122@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      (cherry picked from commit e11c64a9)
      a3cb1621
  5. 01 6月, 2023 2 次提交
  6. 31 5月, 2023 1 次提交
    • O
      !861 Backport CVEs and bugfixes · b3a07a12
      openeuler-ci-bot 提交于
      Merge Pull Request from: @zhangjialin11 
       
      Pull new CVEs:
      CVE-2023-22998
      
      cgroup bugfix from Gaosheng Cui
      sched bugfix from Xia Fukun
      block bugfixes from Zhong Jinghua and Yu Kuai
      iomap and ext4 bugfixes from Baokun Li
      md bugfixes from Yu Kuai 
       
      Link:https://gitee.com/openeuler/kernel/pulls/861 
      
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      b3a07a12