1. 15 11月, 2022 1 次提交
    • X
      md/raid0, raid10: Don't set discard sectors for request queue · 8e1a2279
      Xiao Ni 提交于
      It should use disk_stack_limits to get a proper max_discard_sectors
      rather than setting a value by stack drivers.
      
      And there is a bug. If all member disks are rotational devices,
      raid0/raid10 set max_discard_sectors. So the member devices are
      not ssd/nvme, but raid0/raid10 export the wrong value. It reports
      warning messages in function __blkdev_issue_discard when mkfs.xfs
      like this:
      
      [ 4616.022599] ------------[ cut here ]------------
      [ 4616.027779] WARNING: CPU: 4 PID: 99634 at block/blk-lib.c:50 __blkdev_issue_discard+0x16a/0x1a0
      [ 4616.140663] RIP: 0010:__blkdev_issue_discard+0x16a/0x1a0
      [ 4616.146601] Code: 24 4c 89 20 31 c0 e9 fe fe ff ff c1 e8 09 8d 48 ff 4c 89 f0 4c 09 e8 48 85 c1 0f 84 55 ff ff ff b8 ea ff ff ff e9 df fe ff ff <0f> 0b 48 8d 74 24 08 e8 ea d6 00 00 48 c7 c6 20 1e 89 ab 48 c7 c7
      [ 4616.167567] RSP: 0018:ffffaab88cbffca8 EFLAGS: 00010246
      [ 4616.173406] RAX: ffff9ba1f9e44678 RBX: 0000000000000000 RCX: ffff9ba1c9792080
      [ 4616.181376] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9ba1c9792080
      [ 4616.189345] RBP: 0000000000000cc0 R08: ffffaab88cbffd10 R09: 0000000000000000
      [ 4616.197317] R10: 0000000000000012 R11: 0000000000000000 R12: 0000000000000000
      [ 4616.205288] R13: 0000000000400000 R14: 0000000000000cc0 R15: ffff9ba1c9792080
      [ 4616.213259] FS:  00007f9a5534e980(0000) GS:ffff9ba1b7c80000(0000) knlGS:0000000000000000
      [ 4616.222298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 4616.228719] CR2: 000055a390a4c518 CR3: 0000000123e40006 CR4: 00000000001706e0
      [ 4616.236689] Call Trace:
      [ 4616.239428]  blkdev_issue_discard+0x52/0xb0
      [ 4616.244108]  blkdev_common_ioctl+0x43c/0xa00
      [ 4616.248883]  blkdev_ioctl+0x116/0x280
      [ 4616.252977]  __x64_sys_ioctl+0x8a/0xc0
      [ 4616.257163]  do_syscall_64+0x5c/0x90
      [ 4616.261164]  ? handle_mm_fault+0xc5/0x2a0
      [ 4616.265652]  ? do_user_addr_fault+0x1d8/0x690
      [ 4616.270527]  ? do_syscall_64+0x69/0x90
      [ 4616.274717]  ? exc_page_fault+0x62/0x150
      [ 4616.279097]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
      [ 4616.284748] RIP: 0033:0x7f9a55398c6b
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Reported-by: NYi Zhang <yi.zhang@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      8e1a2279
  2. 22 9月, 2022 6 次提交
  3. 25 8月, 2022 1 次提交
  4. 03 8月, 2022 1 次提交
    • M
      md-raid10: fix KASAN warning · d17f744e
      Mikulas Patocka 提交于
      There's a KASAN warning in raid10_remove_disk when running the lvm
      test lvconvert-raid-reshape.sh. We fix this warning by verifying that the
      value "number" is valid.
      
      BUG: KASAN: slab-out-of-bounds in raid10_remove_disk+0x61/0x2a0 [raid10]
      Read of size 8 at addr ffff889108f3d300 by task mdX_raid10/124682
      
      CPU: 3 PID: 124682 Comm: mdX_raid10 Not tainted 5.19.0-rc6 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0x45/0x57a
       ? __lock_text_start+0x18/0x18
       ? raid10_remove_disk+0x61/0x2a0 [raid10]
       kasan_report+0xa8/0xe0
       ? raid10_remove_disk+0x61/0x2a0 [raid10]
       raid10_remove_disk+0x61/0x2a0 [raid10]
      Buffer I/O error on dev dm-76, logical block 15344, async page read
       ? __mutex_unlock_slowpath.constprop.0+0x1e0/0x1e0
       remove_and_add_spares+0x367/0x8a0 [md_mod]
       ? super_written+0x1c0/0x1c0 [md_mod]
       ? mutex_trylock+0xac/0x120
       ? _raw_spin_lock+0x72/0xc0
       ? _raw_spin_lock_bh+0xc0/0xc0
       md_check_recovery+0x848/0x960 [md_mod]
       raid10d+0xcf/0x3360 [raid10]
       ? sched_clock_cpu+0x185/0x1a0
       ? rb_erase+0x4d4/0x620
       ? var_wake_function+0xe0/0xe0
       ? psi_group_change+0x411/0x500
       ? preempt_count_sub+0xf/0xc0
       ? _raw_spin_lock_irqsave+0x78/0xc0
       ? __lock_text_start+0x18/0x18
       ? raid10_sync_request+0x36c0/0x36c0 [raid10]
       ? preempt_count_sub+0xf/0xc0
       ? _raw_spin_unlock_irqrestore+0x19/0x40
       ? del_timer_sync+0xa9/0x100
       ? try_to_del_timer_sync+0xc0/0xc0
       ? _raw_spin_lock_irqsave+0x78/0xc0
       ? __lock_text_start+0x18/0x18
       ? _raw_spin_unlock_irq+0x11/0x24
       ? __list_del_entry_valid+0x68/0xa0
       ? finish_wait+0xa3/0x100
       md_thread+0x161/0x260 [md_mod]
       ? unregister_md_personality+0xa0/0xa0 [md_mod]
       ? _raw_spin_lock_irqsave+0x78/0xc0
       ? prepare_to_wait_event+0x2c0/0x2c0
       ? unregister_md_personality+0xa0/0xa0 [md_mod]
       kthread+0x148/0x180
       ? kthread_complete_and_exit+0x20/0x20
       ret_from_fork+0x1f/0x30
       </TASK>
      
      Allocated by task 124495:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x80/0xa0
       setup_conf+0x140/0x5c0 [raid10]
       raid10_run+0x4cd/0x740 [raid10]
       md_run+0x6f9/0x1300 [md_mod]
       raid_ctr+0x2531/0x4ac0 [dm_raid]
       dm_table_add_target+0x2b0/0x620 [dm_mod]
       table_load+0x1c8/0x400 [dm_mod]
       ctl_ioctl+0x29e/0x560 [dm_mod]
       dm_compat_ctl_ioctl+0x7/0x20 [dm_mod]
       __do_compat_sys_ioctl+0xfa/0x160
       do_syscall_64+0x90/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x40
       __kasan_record_aux_stack+0x9e/0xc0
       kvfree_call_rcu+0x84/0x480
       timerfd_release+0x82/0x140
      L __fput+0xfa/0x400
       task_work_run+0x80/0xc0
       exit_to_user_mode_prepare+0x155/0x160
       syscall_exit_to_user_mode+0x12/0x40
       do_syscall_64+0x42/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Second to last potentially related work creation:
       kasan_save_stack+0x1e/0x40
       __kasan_record_aux_stack+0x9e/0xc0
       kvfree_call_rcu+0x84/0x480
       timerfd_release+0x82/0x140
       __fput+0xfa/0x400
       task_work_run+0x80/0xc0
       exit_to_user_mode_prepare+0x155/0x160
       syscall_exit_to_user_mode+0x12/0x40
       do_syscall_64+0x42/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      The buggy address belongs to the object at ffff889108f3d200
       which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 0 bytes to the right of
       256-byte region [ffff889108f3d200, ffff889108f3d300)
      
      The buggy address belongs to the physical page:
      page:000000007ef2a34c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1108f3c
      head:000000007ef2a34c order:2 compound_mapcount:0 compound_pincount:0
      flags: 0x4000000000010200(slab|head|zone=2)
      raw: 4000000000010200 0000000000000000 dead000000000001 ffff889100042b40
      raw: 0000000000000000 0000000080200020 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff889108f3d200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff889108f3d280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff889108f3d300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                         ^
       ffff889108f3d380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff889108f3d400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSong Liu <song@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d17f744e
  5. 15 7月, 2022 3 次提交
  6. 23 5月, 2022 1 次提交
  7. 26 4月, 2022 1 次提交
    • M
      md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb
      Mariusz Tkaczyk 提交于
      There is no direct mechanism to determine raid failure outside
      personality. It is done by checking rdev->flags after executing
      md_error(). If "faulty" flag is not set then -EBUSY is returned to
      userspace. -EBUSY means that array will be failed after drive removal.
      
      Mdadm has special routine to handle the array failure and it is executed
      if -EBUSY is returned by md.
      
      There are at least two known reasons to not consider this mechanism
      as correct:
      1. drive can be removed even if array will be failed[1].
      2. -EBUSY seems to be wrong status. Array is not busy, but removal
         process cannot proceed safe.
      
      -EBUSY expectation cannot be removed without breaking compatibility
      with userspace. In this patch first issue is resolved by adding support
      for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
      next commit.
      
      The idea is to set the MD_BROKEN if we are sure that raid is in failed
      state now. This is done in each error_handler(). In md_error() MD_BROKEN
      flag is checked. If is set, then -EBUSY is returned to userspace.
      
      As in previous commit, it causes that #mdadm --set-faulty is able to
      fail array. Previously proposed workaround is valid if optional
      functionality[1] is disabled.
      
      [1] commit 9a567843("md: allow last device to be forcibly removed from
          RAID1/RAID10.")
      Reviewd-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      9631abdb
  8. 18 4月, 2022 3 次提交
  9. 09 3月, 2022 1 次提交
  10. 23 2月, 2022 1 次提交
  11. 04 2月, 2022 1 次提交
  12. 02 2月, 2022 2 次提交
  13. 07 1月, 2022 2 次提交
  14. 19 10月, 2021 1 次提交
  15. 27 8月, 2021 1 次提交
    • X
      md/raid10: Remove unnecessary rcu_dereference in raid10_handle_discard · 46d4703b
      Xiao Ni 提交于
      We are seeing the following warning in raid10_handle_discard.
      [  695.110751] =============================
      [  695.131439] WARNING: suspicious RCU usage
      [  695.151389] 4.18.0-319.el8.x86_64+debug #1 Not tainted
      [  695.174413] -----------------------------
      [  695.192603] drivers/md/raid10.c:1776 suspicious
      rcu_dereference_check() usage!
      [  695.225107] other info that might help us debug this:
      [  695.260940] rcu_scheduler_active = 2, debug_locks = 1
      [  695.290157] no locks held by mkfs.xfs/10186.
      
      In the first loop of function raid10_handle_discard. It already
      determines which disk need to handle discard request and add the
      rdev reference count rdev->nr_pending. So the conf->mirrors will
      not change until all bios come back from underlayer disks. It
      doesn't need to use rcu_dereference to get rdev.
      
      Cc: stable@vger.kernel.org
      Fixes: d30588b2 ('md/raid10: improve raid10 discard request')
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      46d4703b
  16. 24 7月, 2021 1 次提交
  17. 15 6月, 2021 1 次提交
  18. 25 3月, 2021 4 次提交
    • X
      md/raid10: improve discard request for far layout · 254c271d
      Xiao Ni 提交于
      For far layout, the discard region is not continuous on disks. So it needs
      far copies r10bio to cover all regions. It needs a way to know all r10bios
      have finish or not. Similar with raid10_sync_request, only the first r10bio
      master_bio records the discard bio. Other r10bios master_bio record the
      first r10bio. The first r10bio can finish after other r10bios finish and
      then return the discard bio.
      Tested-by: NAdrian Huang <ahuang12@lenovo.com>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      254c271d
    • X
      md/raid10: improve raid10 discard request · d30588b2
      Xiao Ni 提交于
      Now the discard request is split by chunk size. So it takes a long time
      to finish mkfs on disks which support discard function. This patch improve
      handling raid10 discard request. It uses the similar way with patch
      29efc390 (md/md0: optimize raid0 discard handling).
      
      But it's a little complex than raid0. Because raid10 has different layout.
      If raid10 is offset layout and the discard request is smaller than stripe
      size. There are some holes when we submit discard bio to underlayer disks.
      
      For example: five disks (disk1 - disk5)
      D01 D02 D03 D04 D05
      D05 D01 D02 D03 D04
      D06 D07 D08 D09 D10
      D10 D06 D07 D08 D09
      The discard bio just wants to discard from D03 to D10. For disk3, there is
      a hole between D03 and D08. For disk4, there is a hole between D04 and D09.
      D03 is a chunk, raid10_write_request can handle one chunk perfectly. So
      the part that is not aligned with stripe size is still handled by
      raid10_write_request.
      
      If reshape is running when discard bio comes and the discard bio spans the
      reshape position, raid10_write_request is responsible to handle this
      discard bio.
      
      I did a test with this patch set.
      Without patch:
      time mkfs.xfs /dev/md0
      real4m39.775s
      user0m0.000s
      sys0m0.298s
      
      With patch:
      time mkfs.xfs /dev/md0
      real0m0.105s
      user0m0.000s
      sys0m0.007s
      
      nvme3n1           259:1    0   477G  0 disk
      └─nvme3n1p1       259:10   0    50G  0 part
      nvme4n1           259:2    0   477G  0 disk
      └─nvme4n1p1       259:11   0    50G  0 part
      nvme5n1           259:6    0   477G  0 disk
      └─nvme5n1p1       259:12   0    50G  0 part
      nvme2n1           259:9    0   477G  0 disk
      └─nvme2n1p1       259:15   0    50G  0 part
      nvme0n1           259:13   0   477G  0 disk
      └─nvme0n1p1       259:14   0    50G  0 part
      Reviewed-by: NColy Li <colyli@suse.de>
      Reviewed-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com>
      Tested-by: NAdrian Huang <ahuang12@lenovo.com>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      d30588b2
    • X
      md/raid10: pull the code that wait for blocked dev into one function · f2e7e269
      Xiao Ni 提交于
      The following patch will reuse these logics, so pull the same codes into
      one function.
      Tested-by: NAdrian Huang <ahuang12@lenovo.com>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      f2e7e269
    • X
      md/raid10: extend r10bio devs to raid disks · c2968285
      Xiao Ni 提交于
      Now it allocs r10bio->devs[conf->copies]. Discard bio needs to submit
      to all member disks and it needs to use r10bio. So extend to
      r10bio->devs[geo.raid_disks].
      Reviewed-by: NColy Li <colyli@suse.de>
      Tested-by: NAdrian Huang <ahuang12@lenovo.com>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      c2968285
  19. 08 2月, 2021 1 次提交
  20. 28 1月, 2021 1 次提交
  21. 25 1月, 2021 1 次提交
  22. 10 12月, 2020 4 次提交
  23. 05 12月, 2020 1 次提交