1. 29 7月, 2022 2 次提交
    • M
      dm: Allow dm_call_pr to be used for path searches · 8dd87f3c
      Mike Christie 提交于
      The specs state that if you send a reserve down a path that is already
      the holder success must be returned and if it goes down a path that
      is not the holder reservation conflict must be returned. Windows
      failover clustering will send a second reservation and expects that a
      device returns success. The problem for multipathing is that for an
      All Registrants reservation, we can send the reserve down any path but
      for all other reservation types there is one path that is the holder.
      
      To handle this we could add PR state to dm but that can get nasty.
      Look at target_core_pr.c for an example of the type of things we'd
      have to track. It will also get more complicated because other
      initiators can change the state so we will have to add in async
      event/sense handling.
      
      This commit, and the 3 commits that follow, tries to keep dm simple
      and keep just doing passthrough. This commit modifies dm_call_pr to be
      able to find the first usable path that can execute our pr_op then
      return. When dm_pr_reserve is converted to dm_call_pr in the next
      commit for the normal case we will use the same path for every
      reserve.
      Signed-off-by: NMike Christie <michael.christie@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      8dd87f3c
    • M
      dm: return early from dm_pr_call() if DM device is suspended · e120a5f1
      Mike Snitzer 提交于
      Otherwise PR ops may be issued while the broader DM device is being
      reconfigured, etc.
      
      Fixes: 9c72bad1 ("dm: call PR reserve/unreserve on each underlying device")
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      e120a5f1
  2. 07 7月, 2022 3 次提交
    • M
      dm table: audit all dm_table_get_target() callers · 564b5c54
      Mike Snitzer 提交于
      All callers of dm_table_get_target() are expected to do proper bounds
      checking on the index they pass.
      
      Move dm_table_get_target() to dm-core.h to make it extra clear that only
      DM core code should be using it. Switch it to be inlined while at it.
      
      Standardize all DM core callers to use the same for loop pattern and
      make associated variables as local as possible. Rename some variables
      (e.g. s/table/t/ and s/tgt/ti/) along the way.
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      564b5c54
    • M
      dm table: remove dm_table_get_num_targets() wrapper · 2aec377a
      Mike Snitzer 提交于
      More efficient and readable to just access table->num_targets directly.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      2aec377a
    • M
      dm: add two stage requeue mechanism · 8b211aac
      Ming Lei 提交于
      Commit 61b6e2e5 ("dm: fix BLK_STS_DM_REQUEUE handling when dm_io
      represents split bio") reverted DM core's bio splitting back to using
      bio_split()+bio_chain() because it was found that otherwise DM's
      BLK_STS_DM_REQUEUE would trigger a live-lock waiting for bio
      completion that would never occur.
      
      Restore using bio_trim()+bio_inc_remaining(), like was done in commit
      7dd76d1f ("dm: improve bio splitting and associated IO
      accounting"), but this time with proper handling for the above
      scenario that is covered in more detail in the commit header for
      61b6e2e5.
      
      Solve this issue by adding a two staged dm_io requeue mechanism that
      uses the new dm_bio_rewind() via dm_io_rewind():
      
      1) requeue the dm_io into the requeue_list added to struct
         mapped_device, and schedule it via new added requeue work. This
         workqueue just clones the dm_io->orig_bio (which DM saves and
         ensures its end sector isn't modified). dm_io_rewind() uses the
         sectors and sectors_offset members of the dm_io that are recorded
         relative to the end of orig_bio: dm_bio_rewind()+bio_trim() are
         then used to make that cloned bio reflect the subset of the
         original bio that is represented by the dm_io that is being
         requeued.
      
      2) the 2nd stage requeue is same with original requeue, but
         io->orig_bio points to new cloned bio (which matches the requeued
         dm_io as described above).
      
      This allows DM core to shift the need for bio cloning from bio-split
      time (during IO submission) to the less likely BLK_STS_DM_REQUEUE
      handling (after IO completes with that error).
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      8b211aac
  3. 30 6月, 2022 2 次提交
  4. 28 6月, 2022 1 次提交
  5. 27 6月, 2022 1 次提交
  6. 24 6月, 2022 1 次提交
    • M
      dm: fix BLK_STS_DM_REQUEUE handling when dm_io represents split bio · 61b6e2e5
      Ming Lei 提交于
      Commit 7dd76d1f ("dm: improve bio splitting and associated IO
      accounting") removed using cloned bio when dm io splitting is needed.
      Using bio_trim()+bio_inc_remaining() rather than bio_split()+bio_chain()
      causes multiple dm_io instances to share the same original bio, and it
      works fine if IOs are completed successfully.
      
      But a regression was caused for the case when BLK_STS_DM_REQUEUE is
      returned from any one of DM's cloned bios (whose dm_io share the same
      orig_bio). In this BLK_STS_DM_REQUEUE case only the mapped subset of
      the original bio for the current exact dm_io needs to be re-submitted.
      However, since the original bio is shared among all dm_io instances,
      the ->orig_bio actually only represents the last dm_io instance, so
      requeue can't work as expected. Also when more than one dm_io is
      requeued, the same original bio is requeued from all dm_io's
      completion handler, then race is caused.
      
      Fix this issue by still allocating one clone bio for completing io
      only, then io accounting can rely on ->orig_bio being unmodified. This
      is needed because the dm_io's sector_offset and sectors members are
      recorded relative to an unmodified ->orig_bio.
      
      In the future, we can go back to using bio_trim()+bio_inc_remaining()
      for dm's io splitting but then delay needing a bio clone only when
      handling BLK_STS_DM_REQUEUE, but that approach is a bit complicated
      (so it needs a development cycle):
      1) bio clone needs to be done in task context
      2) a block interface for unwinding bio is required
      
      Fixes: 7dd76d1f ("dm: improve bio splitting and associated IO accounting")
      Reported-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      61b6e2e5
  7. 22 6月, 2022 1 次提交
    • M
      dm: do not return early from dm_io_complete if BLK_STS_AGAIN without polling · 78ccef91
      Mike Snitzer 提交于
      Commit 52919840 ("dm: fix bio polling to handle possibile
      BLK_STS_AGAIN") inadvertently introduced an early return from
      dm_io_complete() without first queueing the bio to DM if BLK_STS_AGAIN
      occurs and bio-polling is _not_ being used.
      
      Fix this by only returning early from dm_io_complete() if the bio has
      first been properly queued to DM. Otherwise, the bio will never finish
      via bio_endio.
      
      Fixes: 52919840 ("dm: fix bio polling to handle possibile BLK_STS_AGAIN")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      78ccef91
  8. 17 6月, 2022 2 次提交
    • M
      dm: fix narrow race for REQ_NOWAIT bios being issued despite no support · 1ee88de3
      Mikulas Patocka 提交于
      Starting with the commit 63a225c9fd20, device mapper has an optimization
      that it will take cheaper table lock (dm_get_live_table_fast instead of
      dm_get_live_table) if the bio has REQ_NOWAIT. The bios with REQ_NOWAIT
      must not block in the target request routine, if they did, we would be
      blocking while holding rcu_read_lock, which is prohibited.
      
      The targets that are suitable for REQ_NOWAIT optimization (and that don't
      block in the map routine) have the flag DM_TARGET_NOWAIT set. Device
      mapper will test if all the targets and all the devices in a table
      support nowait (see the function dm_table_supports_nowait) and it will set
      or clear the QUEUE_FLAG_NOWAIT flag on its request queue according to
      this check.
      
      There's a test in submit_bio_noacct: "if ((bio->bi_opf & REQ_NOWAIT) &&
      !blk_queue_nowait(q)) goto not_supported" - this will make sure that
      REQ_NOWAIT bios can't enter a request queue that doesn't support them.
      
      This mechanism works to prevent REQ_NOWAIT bios from reaching dm targets
      that don't support the REQ_NOWAIT flag (and that may block in the map
      routine) - except that there is a small race condition:
      
      submit_bio_noacct checks if the queue has the QUEUE_FLAG_NOWAIT without
      holding any locks. Immediatelly after this check, the device mapper table
      may be reloaded with a table that doesn't support REQ_NOWAIT (for example,
      if we start moving the logical volume or if we activate a snapshot).
      However the REQ_NOWAIT bio that already passed the check in
      submit_bio_noacct would be sent to device mapper, where it could be
      redirected to a dm target that doesn't support REQ_NOWAIT - the result is
      sleeping while we hold rcu_read_lock.
      
      In order to fix this race, we double-check if the target supports
      REQ_NOWAIT while we hold the table lock (so that the table can't change
      under us).
      
      Fixes: 563a225c ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      1ee88de3
    • M
      dm: fix use-after-free in dm_put_live_table_bio · 5d7362d0
      Mikulas Patocka 提交于
      dm_put_live_table_bio is called from the end of dm_submit_bio.
      However, at this point, the bio may be already finished and the caller
      may have freed the bio. Consequently, dm_put_live_table_bio accesses
      the stale "bio" pointer.
      
      Fix this bug by loading the bi_opf value and passing it to
      dm_get_live_table_bio and dm_put_live_table_bio instead of the bio.
      
      This bug was found by running the lvm2 testsuite with kasan.
      
      Fixes: 563a225c ("dm: introduce dm_{get,put}_live_table_bio called from dm_submit_bio")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      5d7362d0
  9. 15 6月, 2022 1 次提交
  10. 11 6月, 2022 1 次提交
    • M
      dm: fix zoned locking imbalance due to needless check in clone_endio · dddf3056
      Mike Snitzer 提交于
      After the commit ca522482 ("dm: pass NULL bdev to bio_alloc_clone"),
      clone_endio() only calls dm_zone_endio() when DM targets remap the
      clone bio's bdev to something other than the md->disk->part0 default.
      
      However, if a DM target (e.g. dm-crypt) stacked ontop of a dm-zoned
      does not remap the clone bio using bio_set_dev() then dm_zone_endio()
      is not called at completion of the bios and zone locks are not
      properly unlocked. This triggers a hang, in dm_zone_map_bio(), when
      blktests block/004 is run for dm-crypt on zoned block devices. To
      avoid the hang, simply remove the clone_endio() check that verifies
      the target remapped the clone bio to a device other than the default.
      
      Fixes: ca522482 ("dm: pass NULL bdev to bio_alloc_clone")
      Reported-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      dddf3056
  11. 09 6月, 2022 1 次提交
    • C
      dm: fix bio_set allocation · 29dec90a
      Christoph Hellwig 提交于
      The use of bioset_init_from_src mean that the pre-allocated pools weren't
      used for anything except parameter passing, and the integrity pool
      creation got completely lost for the actual live mapped_device.  Fix that
      by assigning the actual preallocated dm_md_mempools to the mapped_device
      and using that for I/O instead of creating new mempools.
      
      Fixes: 2a2a4c51 ("dm: use bioset_init_from_src() to copy bio_set")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      29dec90a
  12. 17 5月, 2022 2 次提交
  13. 12 5月, 2022 1 次提交
    • M
      dm: pass NULL bdev to bio_alloc_clone · ca522482
      Mike Snitzer 提交于
      Most DM targets will remap the clone bio passed to their ->map
      function using bio_set_bdev(). So this change to pass NULL bdev to
      bio_alloc_clone avoids clone-time work that sets up resources for a
      bdev association that will not be used in practice (e.g. clone issued
      to underlying device will not use DM device's blk-cgroups resources).
      
      But clone->bi_bdev is still initialized following bio_alloc_clone to
      preserve DM target expectations that clone->bi_bdev will be set.
      Follow-up work is needed to audit DM targets to remove accesses to a
      clone->bi_bdev that the target didn't initialize with bio_set_dev().
      
      Depends-on: 7ecc56c6 ("block: allow passing a NULL bdev to bio_alloc_clone/bio_init_clone")
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      ca522482
  14. 06 5月, 2022 18 次提交
  15. 18 4月, 2022 1 次提交
  16. 16 4月, 2022 1 次提交
    • S
      dm: fix bio length of empty flush · 92b914e2
      Shin'ichiro Kawasaki 提交于
      The commit 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
      removed bio_clone_fast() call from alloc_tio() when ci->io->tio is
      available. In this case, ci->bio is not copied to ci->io->tio.clone.
      This is fine since init_clone_info() sets same values to ci->bio and
      ci->io->tio.clone.
      
      However, when incoming bios have REQ_PREFLUSH flag, __send_empty_flush()
      prepares a zero length bio on stack and set it to ci->bio. At this time,
      ci->io->tio.clone still keeps non-zero length. When alloc_tio() chooses
      this ci->io->tio.clone as the bio to map, it is passed to targets as
      non-empty flush bio. It causes bio length check failure in dm-zoned and
      unexpected operation such as dm_accept_partial_bio() call.
      
      To avoid the non-empty flush bio, set zero length to ci->io->tio.clone
      in __send_empty_flush().
      
      Fixes: 92986f6b ("dm: use bio_clone_fast in alloc_io/alloc_tio")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      92b914e2
  17. 15 4月, 2022 1 次提交