1. 03 8月, 2022 1 次提交
  2. 15 7月, 2022 2 次提交
  3. 28 6月, 2022 1 次提交
  4. 16 6月, 2022 1 次提交
  5. 23 5月, 2022 4 次提交
  6. 26 4月, 2022 5 次提交
    • D
      md: Replace role magic numbers with defined constants · 9151ad5d
      David Sloan 提交于
      There are several instances where magic numbers are used in md.c instead
      of the defined constants in md_p.h. This patch set improves code
      readability by replacing all occurrences of 0xffff, 0xfffe, and 0xfffd when
      relating to md roles with their equivalent defined constant.
      Signed-off-by: NDavid Sloan <david.sloan@eideticom.com>
      Reviewed-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      9151ad5d
    • H
      md: replace deprecated strlcpy & remove duplicated line · 92d9aac9
      Heming Zhao 提交于
      This commit includes two topics:
      
      1> replace deprecated strlcpy
      
      change strlcpy to strscpy for strlcpy is marked as deprecated in
      Documentation/process/deprecated.rst
      
      2> remove duplicated strlcpy line
      
      in md_bitmap_read_sb@md-bitmap.c there are two duplicated strlcpy(), the
      history:
      
      - commit cf921cc1 ("Add node recovery callbacks") introduced the first
        usage of strlcpy().
      
      - commit b97e9257 ("Use separate bitmaps for each nodes in the cluster")
        introduced the second strlcpy(). this time, the two strlcpy() are same,
         we can remove anyone safely.
      
      - commit d3b178ad ("md: Skip cluster setup for dm-raid") added dm-raid
        special handling. And the "nodes" value is the key of this patch. but
        from this patch, strlcpy() which was introduced by b97e9257
        become necessary.
      
      - commit 3c462c88 ("md: Increment version for clustered bitmaps") used
        clustered major version to only handle in clustered env. this patch
        could look a polishment for clustered code logic.
      
      So cf921cc1 became useless after d3b178ad, we could remove it
      safely.
      Signed-off-by: NHeming Zhao <heming.zhao@suse.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      92d9aac9
    • X
      md: fix an incorrect NULL check in md_reload_sb · 64c54d92
      Xiaomeng Tong 提交于
      The bug is here:
      	if (!rdev || rdev->desc_nr != nr) {
      
      The list iterator value 'rdev' will *always* be set and non-NULL
      by rdev_for_each_rcu(), so it is incorrect to assume that the
      iterator value will be NULL if the list is empty or no element
      found (In fact, it will be a bogus pointer to an invalid struct
      object containing the HEAD). Otherwise it will bypass the check
      and lead to invalid memory access passing the check.
      
      To fix the bug, use a new variable 'iter' as the list iterator,
      while using the original variable 'pdev' as a dedicated pointer to
      point to the found element.
      
      Cc: stable@vger.kernel.org
      Fixes: 70bcecdb ("md-cluster: Improve md_reload_sb to be less error prone")
      Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      64c54d92
    • X
      md: fix an incorrect NULL check in does_sb_need_changing · fc873834
      Xiaomeng Tong 提交于
      The bug is here:
      	if (!rdev)
      
      The list iterator value 'rdev' will *always* be set and non-NULL
      by rdev_for_each(), so it is incorrect to assume that the iterator
      value will be NULL if the list is empty or no element found.
      Otherwise it will bypass the NULL check and lead to invalid memory
      access passing the check.
      
      To fix the bug, use a new variable 'iter' as the list iterator,
      while using the original variable 'rdev' as a dedicated pointer to
      point to the found element.
      
      Cc: stable@vger.kernel.org
      Fixes: 2aa82191 ("md-cluster: Perform a lazy update")
      Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: NXiaomeng Tong <xiam0nd.tong@gmail.com>
      Acked-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      fc873834
    • M
      md: Set MD_BROKEN for RAID1 and RAID10 · 9631abdb
      Mariusz Tkaczyk 提交于
      There is no direct mechanism to determine raid failure outside
      personality. It is done by checking rdev->flags after executing
      md_error(). If "faulty" flag is not set then -EBUSY is returned to
      userspace. -EBUSY means that array will be failed after drive removal.
      
      Mdadm has special routine to handle the array failure and it is executed
      if -EBUSY is returned by md.
      
      There are at least two known reasons to not consider this mechanism
      as correct:
      1. drive can be removed even if array will be failed[1].
      2. -EBUSY seems to be wrong status. Array is not busy, but removal
         process cannot proceed safe.
      
      -EBUSY expectation cannot be removed without breaking compatibility
      with userspace. In this patch first issue is resolved by adding support
      for MD_BROKEN flag for RAID1 and RAID10. Support for RAID456 is added in
      next commit.
      
      The idea is to set the MD_BROKEN if we are sure that raid is in failed
      state now. This is done in each error_handler(). In md_error() MD_BROKEN
      flag is checked. If is set, then -EBUSY is returned to userspace.
      
      As in previous commit, it causes that #mdadm --set-faulty is able to
      fail array. Previously proposed workaround is valid if optional
      functionality[1] is disabled.
      
      [1] commit 9a567843("md: allow last device to be forcibly removed from
          RAID1/RAID10.")
      Reviewd-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      9631abdb
  7. 18 4月, 2022 2 次提交
  8. 09 3月, 2022 1 次提交
  9. 04 2月, 2022 1 次提交
  10. 03 2月, 2022 1 次提交
    • S
      md: fix NULL pointer deref with nowait but no mddev->queue · 0f9650bd
      Song Liu 提交于
      Leon reported NULL pointer deref with nowait support:
      
      [   15.123761] device-mapper: raid: Loading target version 1.15.1
      [   15.124185] device-mapper: raid: Ignoring chunk size parameter for RAID 1
      [   15.124192] device-mapper: raid: Choosing default region size of 4MiB
      [   15.129524] BUG: kernel NULL pointer dereference, address: 0000000000000060
      [   15.129530] #PF: supervisor write access in kernel mode
      [   15.129533] #PF: error_code(0x0002) - not-present page
      [   15.129535] PGD 0 P4D 0
      [   15.129538] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [   15.129541] CPU: 5 PID: 494 Comm: ldmtool Not tainted 5.17.0-rc2-1-mainline #1 9fe89d43dfcb215d2731e6f8851740520778615e
      [   15.129546] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F36e 10/14/2021
      [   15.129549] RIP: 0010:blk_queue_flag_set+0x7/0x20
      [   15.129555] Code: 00 00 00 0f 1f 44 00 00 48 8b 35 e4 e0 04 02 48 8d 57 28 bf 40 01 \
             00 00 e9 16 c1 be ff 66 0f 1f 44 00 00 0f 1f 44 00 00 89 ff <f0> 48 0f ab 7e 60 \
             31 f6 89 f7 c3 66 66 2e 0f 1f 84 00 00 00 00 00
      [   15.129559] RSP: 0018:ffff966b81987a88 EFLAGS: 00010202
      [   15.129562] RAX: ffff8b11c363a0d0 RBX: ffff8b11e294b070 RCX: 0000000000000000
      [   15.129564] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001d
      [   15.129566] RBP: ffff8b11e294b058 R08: 0000000000000000 R09: 0000000000000000
      [   15.129568] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11e294b070
      [   15.129570] R13: 0000000000000000 R14: ffff8b11e294b000 R15: 0000000000000001
      [   15.129572] FS:  00007fa96e826780(0000) GS:ffff8b18deb40000(0000) knlGS:0000000000000000
      [   15.129575] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   15.129577] CR2: 0000000000000060 CR3: 000000010b8ce000 CR4: 00000000003506e0
      [   15.129580] Call Trace:
      [   15.129582]  <TASK>
      [   15.129584]  md_run+0x67c/0xc70 [md_mod 1e470c1b6bcf1114198109f42682f5a2740e9531]
      [   15.129597]  raid_ctr+0x134a/0x28ea [dm_raid 6a645dd7519e72834bd7e98c23497eeade14cd63]
      [   15.129604]  ? dm_split_args+0x63/0x150 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129615]  dm_table_add_target+0x188/0x380 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129625]  table_load+0x13b/0x370 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129635]  ? dev_suspend+0x2d0/0x2d0 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129644]  ctl_ioctl+0x1bd/0x460 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129655]  dm_ctl_ioctl+0xa/0x20 [dm_mod 0d7b0bc3414340a79c4553bae5ca97294b78336e]
      [   15.129663]  __x64_sys_ioctl+0x8e/0xd0
      [   15.129667]  do_syscall_64+0x5c/0x90
      [   15.129672]  ? syscall_exit_to_user_mode+0x23/0x50
      [   15.129675]  ? do_syscall_64+0x69/0x90
      [   15.129677]  ? do_syscall_64+0x69/0x90
      [   15.129679]  ? syscall_exit_to_user_mode+0x23/0x50
      [   15.129682]  ? do_syscall_64+0x69/0x90
      [   15.129684]  ? do_syscall_64+0x69/0x90
      [   15.129686]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   15.129689] RIP: 0033:0x7fa96ecd559b
      [   15.129692] Code: ff ff ff 85 c0 79 9b 49 c7 c4 ff ff ff ff 5b 5d 4c 89 e0 41 5c \
          c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff \
          ff 73 01 c3 48 8b 0d a5 a8 0c 00 f7 d8 64 89 01 48
      [   15.129696] RSP: 002b:00007ffcaf85c258 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
      [   15.129699] RAX: ffffffffffffffda RBX: 00007fa96f1b48f0 RCX: 00007fa96ecd559b
      [   15.129701] RDX: 00007fa97017e610 RSI: 00000000c138fd09 RDI: 0000000000000003
      [   15.129702] RBP: 00007fa96ebab583 R08: 00007fa97017c9e0 R09: 00007ffcaf85bf27
      [   15.129704] R10: 0000000000000001 R11: 0000000000000206 R12: 00007fa97017e610
      [   15.129706] R13: 00007fa97017e640 R14: 00007fa97017e6c0 R15: 00007fa97017e530
      [   15.129709]  </TASK>
      
      This is caused by missing mddev->queue check for setting QUEUE_FLAG_NOWAIT
      Fix this by moving the QUEUE_FLAG_NOWAIT logic to under mddev->queue check.
      
      Fixes: f51d46d0 ("md: add support for REQ_NOWAIT")
      Reported-by: NLeon Möller <jkhsjdhjs@totally.rip>
      Tested-by: NLeon Möller <jkhsjdhjs@totally.rip>
      Cc: Vishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      0f9650bd
  11. 02 2月, 2022 2 次提交
  12. 07 1月, 2022 4 次提交
    • G
      md: use default_groups in kobj_type · 1745e857
      Greg Kroah-Hartman 提交于
      There are currently 2 ways to create a set of sysfs files for a
      kobj_type, through the default_attrs field, and the default_groups
      field.  Move the md rdev sysfs code to use default_groups field which
      has been the preferred way since commit aa30f47c ("kobject: Add
      support for default attribute groups to kobj_type") so that we can soon
      get rid of the obsolete default_attrs field.
      
      Cc: Song Liu <song@kernel.org>
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSong Liu <song@kernel.org>
      1745e857
    • X
      md: Move alloc/free acct bioset in to personality · 0c031fd3
      Xiao Ni 提交于
      bioset acct is only needed for raid0 and raid5. Therefore, md_run only
      allocates it for raid0 and raid5. However, this does not cover
      personality takeover, which may cause uninitialized bioset. For example,
      the following repro steps:
      
        mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
        mdadm --wait /dev/md0
        mkfs.xfs /dev/md0
        mdadm /dev/md0 --grow -l5
        mount /dev/md0 /mnt
      
      causes panic like:
      
      [  225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  225.934903] #PF: supervisor instruction fetch in kernel mode
      [  225.935639] #PF: error_code(0x0010) - not-present page
      [  225.936361] PGD 0 P4D 0
      [  225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
      [  225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
      [  225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
      [  225.939922] RIP: 0010:0x0
      [  225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
      [  225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
      [  225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
      [  225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
      [  225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
      [  225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
      [  225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
      [  225.946709] FS:  00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
      [  225.947814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
      [  225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  225.951414] Call Trace:
      [  225.951787]  <TASK>
      [  225.952120]  mempool_alloc+0xe5/0x250
      [  225.952625]  ? mempool_resize+0x370/0x370
      [  225.953187]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [  225.953862]  ? rcu_read_lock_bh_held+0xb0/0xb0
      [  225.954464]  ? sched_clock_cpu+0x15/0x120
      [  225.955019]  ? find_held_lock+0xac/0xd0
      [  225.955564]  bio_alloc_bioset+0x1ed/0x2a0
      [  225.956080]  ? lock_downgrade+0x3a0/0x3a0
      [  225.956644]  ? bvec_alloc+0xc0/0xc0
      [  225.957135]  bio_clone_fast+0x19/0x80
      [  225.957651]  raid5_make_request+0x1370/0x1b70
      [  225.958286]  ? sched_clock_cpu+0x15/0x120
      [  225.958797]  ? __lock_acquire+0x8b2/0x3510
      [  225.959339]  ? raid5_get_active_stripe+0xce0/0xce0
      [  225.959986]  ? lock_is_held_type+0xd8/0x130
      [  225.960528]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [  225.961135]  ? rcu_read_lock_bh_held+0xb0/0xb0
      [  225.961703]  ? sched_clock_cpu+0x15/0x120
      [  225.962232]  ? lock_release+0x27a/0x6c0
      [  225.962746]  ? do_wait_intr_irq+0x130/0x130
      [  225.963302]  ? lock_downgrade+0x3a0/0x3a0
      [  225.963815]  ? lock_release+0x6c0/0x6c0
      [  225.964348]  md_handle_request+0x342/0x530
      [  225.964888]  ? set_in_sync+0x170/0x170
      [  225.965397]  ? blk_queue_split+0x133/0x150
      [  225.965988]  ? __blk_queue_split+0x8b0/0x8b0
      [  225.966524]  ? submit_bio_checks+0x3b2/0x9d0
      [  225.967069]  md_submit_bio+0x127/0x1c0
      [...]
      
      Fix this by moving alloc/free of acct bioset to pers->run and pers->free.
      
      While we are on this, properly handle md_integrity_register() error in
      raid0_run().
      
      Fixes: daee2024 (md: check level before create and exit io_acct_set)
      Cc: stable@vger.kernel.org
      Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      0c031fd3
    • R
      md: fix spelling of "its" · dd3dc5f4
      Randy Dunlap 提交于
      Use the possessive "its" instead of the contraction "it's"
      in printed messages.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NSong Liu <song@kernel.org>
      dd3dc5f4
    • V
      md: add support for REQ_NOWAIT · f51d46d0
      Vishal Verma 提交于
      commit 021a2446 ("block: add QUEUE_FLAG_NOWAIT") added support
      for checking whether a given bdev supports handling of REQ_NOWAIT or not.
      Since then commit 6abc4946 ("dm: add support for REQ_NOWAIT and enable
      it for linear target") added support for REQ_NOWAIT for dm. This uses
      a similar approach to incorporate REQ_NOWAIT for md based bios.
      
      This patch was tested using t/io_uring tool within FIO. A nvme drive
      was partitioned into 2 partitions and a simple raid 0 configuration
      /dev/md0 was created.
      
      md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
            937423872 blocks super 1.2 512k chunks
      
      Before patch:
      
      $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
      
      Running top while the above runs:
      
      $ ps -eL | grep $(pidof io_uring)
      
        38396   38396 pts/2    00:00:00 io_uring
        38396   38397 pts/2    00:00:15 io_uring
        38396   38398 pts/2    00:00:13 iou-wrk-38397
      
      We can see iou-wrk-38397 io worker thread created which gets created
      when io_uring sees that the underlying device (/dev/md0 in this case)
      doesn't support nowait.
      
      After patch:
      
      $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
      
      Running top while the above runs:
      
      $ ps -eL | grep $(pidof io_uring)
      
        38341   38341 pts/2    00:10:22 io_uring
        38341   38342 pts/2    00:10:37 io_uring
      
      After running this patch, we don't see any io worker thread
      being created which indicated that io_uring saw that the
      underlying device does support nowait. This is the exact behaviour
      noticed on a dm device which also supports nowait.
      
      For all the other raid personalities except raid0, we would need
      to train pieces which involves make_request fn in order for them
      to correctly handle REQ_NOWAIT.
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NVishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      f51d46d0
  13. 11 12月, 2021 2 次提交
  14. 29 11月, 2021 1 次提交
  15. 19 10月, 2021 7 次提交
  16. 18 10月, 2021 3 次提交
  17. 22 9月, 2021 1 次提交
    • C
      md: fix a lock order reversal in md_alloc · 7df835a3
      Christoph Hellwig 提交于
      Commit b0140891 ("md: Fix race when creating a new md device.")
      not only moved assigning mddev->gendisk before calling add_disk, which
      fixes the races described in the commit log, but also added a
      mddev->open_mutex critical section over add_disk and creation of the
      md kobj.  Adding a kobject after add_disk is racy vs deleting the gendisk
      right after adding it, but md already prevents against that by holding
      a mddev->active reference.
      
      On the other hand taking this lock added a lock order reversal with what
      is not disk->open_mutex (used to be bdev->bd_mutex when the commit was
      added) for partition devices, which need that lock for the internal open
      for the partition scan, and a recent commit also takes it for
      non-partitioned devices, leading to further lockdep splatter.
      
      Fixes: b0140891 ("md: Fix race when creating a new md device.")
      Fixes: d6263387 ("block: support delayed holder registration")
      Reported-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: syzbot+fadc0aaf497e6a493b9f@syzkaller.appspotmail.com
      Reviewed-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      7df835a3
  18. 15 6月, 2021 1 次提交