1. 04 2月, 2022 12 次提交
  2. 02 2月, 2022 9 次提交
  3. 29 1月, 2022 2 次提交
  4. 07 1月, 2022 9 次提交
    • G
      md: use default_groups in kobj_type · 1745e857
      Greg Kroah-Hartman 提交于
      There are currently 2 ways to create a set of sysfs files for a
      kobj_type, through the default_attrs field, and the default_groups
      field.  Move the md rdev sysfs code to use default_groups field which
      has been the preferred way since commit aa30f47c ("kobject: Add
      support for default attribute groups to kobj_type") so that we can soon
      get rid of the obsolete default_attrs field.
      
      Cc: Song Liu <song@kernel.org>
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NSong Liu <song@kernel.org>
      1745e857
    • X
      md: Move alloc/free acct bioset in to personality · 0c031fd3
      Xiao Ni 提交于
      bioset acct is only needed for raid0 and raid5. Therefore, md_run only
      allocates it for raid0 and raid5. However, this does not cover
      personality takeover, which may cause uninitialized bioset. For example,
      the following repro steps:
      
        mdadm -CR /dev/md0 -l1 -n2 /dev/loop0 /dev/loop1
        mdadm --wait /dev/md0
        mkfs.xfs /dev/md0
        mdadm /dev/md0 --grow -l5
        mount /dev/md0 /mnt
      
      causes panic like:
      
      [  225.933939] BUG: kernel NULL pointer dereference, address: 0000000000000000
      [  225.934903] #PF: supervisor instruction fetch in kernel mode
      [  225.935639] #PF: error_code(0x0010) - not-present page
      [  225.936361] PGD 0 P4D 0
      [  225.936677] Oops: 0010 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN PTI
      [  225.937525] CPU: 27 PID: 1133 Comm: mount Not tainted 5.16.0-rc3+ #706
      [  225.938416] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-2.module_el8.4.0+547+a85d02ba 04/01/2014
      [  225.939922] RIP: 0010:0x0
      [  225.940289] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
      [  225.941196] RSP: 0018:ffff88815897eff0 EFLAGS: 00010246
      [  225.941897] RAX: 0000000000000000 RBX: 0000000000092800 RCX: ffffffff81370a39
      [  225.942813] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000092800
      [  225.943772] RBP: 1ffff1102b12fe04 R08: fffffbfff0b43c01 R09: fffffbfff0b43c01
      [  225.944807] R10: ffffffff85a1e007 R11: fffffbfff0b43c00 R12: ffff88810eaaaf58
      [  225.945757] R13: 0000000000000000 R14: ffff88810eaaafb8 R15: ffff88815897f040
      [  225.946709] FS:  00007ff3f2505080(0000) GS:ffff888fb5e00000(0000) knlGS:0000000000000000
      [  225.947814] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  225.948556] CR2: ffffffffffffffd6 CR3: 000000015aa5a006 CR4: 0000000000370ee0
      [  225.949537] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  225.950455] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  225.951414] Call Trace:
      [  225.951787]  <TASK>
      [  225.952120]  mempool_alloc+0xe5/0x250
      [  225.952625]  ? mempool_resize+0x370/0x370
      [  225.953187]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [  225.953862]  ? rcu_read_lock_bh_held+0xb0/0xb0
      [  225.954464]  ? sched_clock_cpu+0x15/0x120
      [  225.955019]  ? find_held_lock+0xac/0xd0
      [  225.955564]  bio_alloc_bioset+0x1ed/0x2a0
      [  225.956080]  ? lock_downgrade+0x3a0/0x3a0
      [  225.956644]  ? bvec_alloc+0xc0/0xc0
      [  225.957135]  bio_clone_fast+0x19/0x80
      [  225.957651]  raid5_make_request+0x1370/0x1b70
      [  225.958286]  ? sched_clock_cpu+0x15/0x120
      [  225.958797]  ? __lock_acquire+0x8b2/0x3510
      [  225.959339]  ? raid5_get_active_stripe+0xce0/0xce0
      [  225.959986]  ? lock_is_held_type+0xd8/0x130
      [  225.960528]  ? rcu_read_lock_sched_held+0xa1/0xd0
      [  225.961135]  ? rcu_read_lock_bh_held+0xb0/0xb0
      [  225.961703]  ? sched_clock_cpu+0x15/0x120
      [  225.962232]  ? lock_release+0x27a/0x6c0
      [  225.962746]  ? do_wait_intr_irq+0x130/0x130
      [  225.963302]  ? lock_downgrade+0x3a0/0x3a0
      [  225.963815]  ? lock_release+0x6c0/0x6c0
      [  225.964348]  md_handle_request+0x342/0x530
      [  225.964888]  ? set_in_sync+0x170/0x170
      [  225.965397]  ? blk_queue_split+0x133/0x150
      [  225.965988]  ? __blk_queue_split+0x8b0/0x8b0
      [  225.966524]  ? submit_bio_checks+0x3b2/0x9d0
      [  225.967069]  md_submit_bio+0x127/0x1c0
      [...]
      
      Fix this by moving alloc/free of acct bioset to pers->run and pers->free.
      
      While we are on this, properly handle md_integrity_register() error in
      raid0_run().
      
      Fixes: daee2024 (md: check level before create and exit io_acct_set)
      Cc: stable@vger.kernel.org
      Acked-by: NGuoqing Jiang <guoqing.jiang@linux.dev>
      Signed-off-by: NXiao Ni <xni@redhat.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      0c031fd3
    • R
      md: fix spelling of "its" · dd3dc5f4
      Randy Dunlap 提交于
      Use the possessive "its" instead of the contraction "it's"
      in printed messages.
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Song Liu <song@kernel.org>
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NSong Liu <song@kernel.org>
      dd3dc5f4
    • V
      md: raid456 add nowait support · bf2c411b
      Vishal Verma 提交于
      Returns EAGAIN in case the raid456 driver would block waiting for reshape.
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NVishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      bf2c411b
    • V
      md: raid10 add nowait support · c9aa889b
      Vishal Verma 提交于
      This adds nowait support to the RAID10 driver. Very similar to
      raid1 driver changes. It makes RAID10 driver return with EAGAIN
      for situations where it could wait for eg:
      
        - Waiting for the barrier,
        - Reshape operation,
        - Discard operation.
      
      wait_barrier() and regular_request_wait() fn are modified to return bool
      to support error for wait barriers. They returns true in case of wait
      or if wait is not required and returns false if wait was required
      but not performed to support nowait.
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NVishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      c9aa889b
    • V
      md: raid1 add nowait support · 5aa70503
      Vishal Verma 提交于
      This adds nowait support to the RAID1 driver. It makes RAID1 driver
      return with EAGAIN for situations where it could wait for eg:
      
        - Waiting for the barrier,
      
      wait_barrier() fn is modified to return bool to support error for
      wait barriers. It returns true in case of wait or if wait is not
      required and returns false if wait was required but not performed
      to support nowait.
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NVishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      5aa70503
    • V
      md: add support for REQ_NOWAIT · f51d46d0
      Vishal Verma 提交于
      commit 021a2446 ("block: add QUEUE_FLAG_NOWAIT") added support
      for checking whether a given bdev supports handling of REQ_NOWAIT or not.
      Since then commit 6abc4946 ("dm: add support for REQ_NOWAIT and enable
      it for linear target") added support for REQ_NOWAIT for dm. This uses
      a similar approach to incorporate REQ_NOWAIT for md based bios.
      
      This patch was tested using t/io_uring tool within FIO. A nvme drive
      was partitioned into 2 partitions and a simple raid 0 configuration
      /dev/md0 was created.
      
      md0 : active raid0 nvme4n1p1[1] nvme4n1p2[0]
            937423872 blocks super 1.2 512k chunks
      
      Before patch:
      
      $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
      
      Running top while the above runs:
      
      $ ps -eL | grep $(pidof io_uring)
      
        38396   38396 pts/2    00:00:00 io_uring
        38396   38397 pts/2    00:00:15 io_uring
        38396   38398 pts/2    00:00:13 iou-wrk-38397
      
      We can see iou-wrk-38397 io worker thread created which gets created
      when io_uring sees that the underlying device (/dev/md0 in this case)
      doesn't support nowait.
      
      After patch:
      
      $ ./t/io_uring /dev/md0 -p 0 -a 0 -d 1 -r 100
      
      Running top while the above runs:
      
      $ ps -eL | grep $(pidof io_uring)
      
        38341   38341 pts/2    00:10:22 io_uring
        38341   38342 pts/2    00:10:37 io_uring
      
      After running this patch, we don't see any io worker thread
      being created which indicated that io_uring saw that the
      underlying device does support nowait. This is the exact behaviour
      noticed on a dm device which also supports nowait.
      
      For all the other raid personalities except raid0, we would need
      to train pieces which involves make_request fn in order for them
      to correctly handle REQ_NOWAIT.
      Reviewed-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NVishal Verma <vverma@digitalocean.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      f51d46d0
    • M
      md: drop queue limitation for RAID1 and RAID10 · a92ce0fe
      Mariusz Tkaczyk 提交于
      As suggested by Neil Brown[1], this limitation seems to be
      deprecated.
      
      With plugging in use, writes are processed behind the raid thread
      and conf->pending_count is not increased. This limitation occurs only
      if caller doesn't use plugs.
      
      It can be avoided and often it is (with plugging). There are no reports
      that queue is growing to enormous size so remove queue limitation for
      non-plugged IOs too.
      
      [1] https://lore.kernel.org/linux-raid/162496301481.7211.18031090130574610495@noble.neil.brown.nameSigned-off-by: NMariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
      Signed-off-by: NSong Liu <song@kernel.org>
      a92ce0fe
    • D
      md/raid5: play nice with PREEMPT_RT · 770b1d21
      Davidlohr Bueso 提交于
      raid_run_ops() relies on the implicitly disabled preemption for
      its percpu ops, although this is really about CPU locality. This
      breaks RT semantics as it can take regular (and thus sleeping)
      spinlocks, such as stripe_lock.
      
      Add a local_lock such that non-RT does not change and continues
      to be just map to preempt_disable/enable, but makes RT happy as
      the region will use a per-CPU spinlock and thus be preemptible
      and still guarantee CPU locality.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      770b1d21
  5. 06 1月, 2022 2 次提交
  6. 05 1月, 2022 5 次提交
  7. 04 1月, 2022 1 次提交
    • S
      md/raid1: fix missing bitmap update w/o WriteMostly devices · 46669e86
      Song Liu 提交于
      commit [1] causes missing bitmap updates when there isn't any WriteMostly
      devices.
      
      Detailed steps to reproduce by Norbert (which somehow didn't make to lore):
      
         # setup md10 (raid1) with two drives (1 GByte sparse files)
         dd if=/dev/zero of=disk1 bs=1024k seek=1024 count=0
         dd if=/dev/zero of=disk2 bs=1024k seek=1024 count=0
      
         losetup /dev/loop11 disk1
         losetup /dev/loop12 disk2
      
         mdadm --create /dev/md10 --level=1 --raid-devices=2 /dev/loop11 /dev/loop12
      
         # add bitmap (aka write-intent log)
         mdadm /dev/md10 --grow --bitmap=internal
      
         echo check > /sys/block/md10/md/sync_action
      
         root:# cat /sys/block/md10/md/mismatch_cnt
         0
         root:#
      
         # remove member drive disk2 (loop12)
         mdadm /dev/md10 -f loop12 ; mdadm /dev/md10 -r loop12
      
         # modify degraded md device
         dd if=/dev/urandom of=/dev/md10 bs=512 count=1
      
         # no blocks recorded as out of sync on the remaining member disk1/loop11
         root:# mdadm -X /dev/loop11 | grep Bitmap
                   Bitmap : 16 bits (chunks), 0 dirty (0.0%)
         root:#
      
         # re-add disk2, nothing synced because of empty bitmap
         mdadm /dev/md10 --re-add /dev/loop12
      
         # check integrity again
         echo check > /sys/block/md10/md/sync_action
      
         # disk1 and disk2 are no longer in sync, reads return differend data
         root:# cat /sys/block/md10/md/mismatch_cnt
         128
         root:#
      
         # clean up
         mdadm -S /dev/md10
         losetup -d /dev/loop11
         losetup -d /dev/loop12
         rm disk1 disk2
      
      Fix this by moving the WriteMostly check to the if condition for
      alloc_behind_master_bio().
      
      [1] commit fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
      Fixes: fd3b6975 ("md/raid1: only allocate write behind bio for WriteMostly device")
      Cc: stable@vger.kernel.org # v5.12+
      Cc: Guoqing Jiang <guoqing.jiang@linux.dev>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reported-by: NNorbert Warmuth <nwarmuth@t-online.de>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSong Liu <song@kernel.org>
      46669e86