1. 03 8月, 2022 32 次提交
  2. 29 7月, 2022 8 次提交
    • M
      dm: fix dm-raid crash if md_handle_request() splits bio · 9dd1cd32
      Mike Snitzer 提交于
      Commit ca522482 ("dm: pass NULL bdev to bio_alloc_clone")
      introduced the optimization to _not_ perform bio_associate_blkg()'s
      relatively costly work when DM core clones its bio. But in doing so it
      exposed the possibility for DM's cloned bio to alter DM target
      behavior (e.g. crash) if a target were to issue IO without first
      calling bio_set_dev().
      
      The DM raid target can trigger an MD crash due to its need to split
      the DM bio that is passed to md_handle_request(). The split will
      recurse to submit_bio_noacct() using a bio with an uninitialized
      ->bi_blkg. This NULL bio->bi_blkg causes blk_throtl_bio() to
      dereference a NULL blkg_to_tg(bio->bi_blkg).
      
      Fix this in DM core by adding a new 'needs_bio_set_dev' target flag that
      will make alloc_tio() call bio_set_dev() on behalf of the target.
      dm-raid is the only target that requires this flag. bio_set_dev()
      initializes the DM cloned bio's ->bi_blkg, using bio_associate_blkg,
      before passing the bio to md_handle_request().
      
      Long-term fix would be to audit and refactor MD code to rely on DM to
      split its bio, using dm_accept_partial_bio(), but there are MD raid
      personalities (e.g. raid1 and raid10) whose implementation are tightly
      coupled to handling the bio splitting inline.
      
      Fixes: ca522482 ("dm: pass NULL bdev to bio_alloc_clone")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      9dd1cd32
    • M
      dm raid: fix address sanitizer warning in raid_resume · 7dad24db
      Mikulas Patocka 提交于
      There is a KASAN warning in raid_resume when running the lvm test
      lvconvert-raid.sh. The reason for the warning is that mddev->raid_disks
      is greater than rs->raid_disks, so the loop touches one entry beyond
      the allocated length.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      7dad24db
    • M
      dm raid: fix address sanitizer warning in raid_status · 1fbeea21
      Mikulas Patocka 提交于
      There is this warning when using a kernel with the address sanitizer
      and running this testsuite:
      https://gitlab.com/cki-project/kernel-tests/-/tree/main/storage/swraid/scsi_raid
      
      ==================================================================
      BUG: KASAN: slab-out-of-bounds in raid_status+0x1747/0x2820 [dm_raid]
      Read of size 4 at addr ffff888079d2c7e8 by task lvcreate/13319
      CPU: 0 PID: 13319 Comm: lvcreate Not tainted 5.18.0-0.rc3.<snip> #1
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      Call Trace:
       <TASK>
       dump_stack_lvl+0x6a/0x9c
       print_address_description.constprop.0+0x1f/0x1e0
       print_report.cold+0x55/0x244
       kasan_report+0xc9/0x100
       raid_status+0x1747/0x2820 [dm_raid]
       dm_ima_measure_on_table_load+0x4b8/0xca0 [dm_mod]
       table_load+0x35c/0x630 [dm_mod]
       ctl_ioctl+0x411/0x630 [dm_mod]
       dm_ctl_ioctl+0xa/0x10 [dm_mod]
       __x64_sys_ioctl+0x12a/0x1a0
       do_syscall_64+0x5b/0x80
      
      The warning is caused by reading conf->max_nr_stripes in raid_status. The
      code in raid_status reads mddev->private, casts it to struct r5conf and
      reads the entry max_nr_stripes.
      
      However, if we have different raid type than 4/5/6, mddev->private
      doesn't point to struct r5conf; it may point to struct r0conf, struct
      r1conf, struct r10conf or struct mpconf. If we cast a pointer to one
      of these structs to struct r5conf, we will be reading invalid memory
      and KASAN warns about it.
      
      Fix this bug by reading struct r5conf only if raid type is 4, 5 or 6.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      1fbeea21
    • M
      dm: Start pr_preempt from the same starting path · c6adada5
      Mike Christie 提交于
      pr_preempt has a similar issue as reserve where for all the
      reservation types except the All Registrants ones the preempt can
      create a reservation. And a follow up reservation or release needs to
      go down the same path the preempt did. This has the pr_preempt work
      like reserve and release where we always start from the first path in
      the first group.
      
      This commit has been tested with windows failover clustering's
      validation test and libiscsi's PGR tests to check for regressions.
      They both don't have tests to verify this case, so I tested it
      manually.
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      c6adada5
    • M
      dm: Fix PR release handling for non All Registrants · 08a3c338
      Mike Christie 提交于
      This commit fixes a bug where we are leaving the reservation in place
      even though pr_release has run and returned success.
      
      If we have a Write Exclusive, Exclusive Access, or Write/Exclusive
      Registrants only reservation, the release must be sent down the path
      that is the reservation holder. The problem is multipath_prepare_ioctl
      most likely selected path N for the reservation, then later when we do
      the release multipath_prepare_ioctl will select a completely different
      path. The device will then return success becuase the nvme and scsi
      specs say to return success if there is no reservation or if the
      release is sent down from a path that is not the holder. We then think
      we have released the reservation.
      
      This commit has us loop over each path and send a release so we can
      make sure the release is executed on the correct path. It has been
      tested with windows failover clustering's validation test which checks
      this case, and it has been tested manually (the libiscsi PGR tests
      don't have a test case for this yet, but I will be adding one).
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      08a3c338
    • M
      dm: Start pr_reserve from the same starting path · 70151087
      Mike Christie 提交于
      When an app does a pr_reserve it will go to whatever path we happen to
      be using at the time. This can result in errors when the app does a
      second pr_reserve call and expects success but gets a failure because
      the reserve is not done on the holder's path. This commit has us
      always start trying to do reserves from the first path in the first
      group.
      
      Windows failover clustering will produce the type of pattern above.
      With this commit, we will now pass its validation test for this case.
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      70151087
    • M
      dm: Allow dm_call_pr to be used for path searches · 8dd87f3c
      Mike Christie 提交于
      The specs state that if you send a reserve down a path that is already
      the holder success must be returned and if it goes down a path that
      is not the holder reservation conflict must be returned. Windows
      failover clustering will send a second reservation and expects that a
      device returns success. The problem for multipathing is that for an
      All Registrants reservation, we can send the reserve down any path but
      for all other reservation types there is one path that is the holder.
      
      To handle this we could add PR state to dm but that can get nasty.
      Look at target_core_pr.c for an example of the type of things we'd
      have to track. It will also get more complicated because other
      initiators can change the state so we will have to add in async
      event/sense handling.
      
      This commit, and the 3 commits that follow, tries to keep dm simple
      and keep just doing passthrough. This commit modifies dm_call_pr to be
      able to find the first usable path that can execute our pr_op then
      return. When dm_pr_reserve is converted to dm_call_pr in the next
      commit for the normal case we will use the same path for every
      reserve.
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      8dd87f3c
    • M
      dm: return early from dm_pr_call() if DM device is suspended · e120a5f1
      Mike Snitzer 提交于
      Otherwise PR ops may be issued while the broader DM device is being
      reconfigured, etc.
      
      Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device")
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      e120a5f1