1. 05 1月, 2018 1 次提交
    • M
      dm mpath: implement NVMe bio-based support · cd025384
      Mike Snitzer 提交于
      This DM multipath NVMe bio-based support requires CONFIG_NVME_MULTIPATH
      to not be set.  In the future hopefully NVMe multipath and DM multipath
      can co-exist more seemlessly.  But as is, if CONFIG_NVME_MULTIPATH=Y
      then all the individal NVMe paths will remain hidden to upper layers and
      as such DM multipath will not be able to manage them.
      
      Though NVMe's native multipathing doesn't multipath namespaces across
      subsystems; so technically a user _could_ use CONFIG_NVME_MULTIPATH=Y
      and also use DM multipath to multipath across subsystems.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cd025384
  2. 03 1月, 2018 1 次提交
  3. 20 12月, 2017 5 次提交
  4. 18 12月, 2017 1 次提交
    • M
      dm: simplify start of block stats accounting for bio-based · f3986374
      Mike Snitzer 提交于
      No apparent need to generic_start_io_acct() until before the IO is ready
      for submission.  start_io_acct() is the proper place to do this
      accounting -- it is also where DM accounts for pending IO and, if
      enabled, starts dm-stats accounting.
      
      Replace start_io_acct()'s part_round_stats() with generic_start_io_acct().
      This eliminates needing to take part_stat_lock() multiple times when
      starting an IO on bio-based devices.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f3986374
  5. 17 12月, 2017 5 次提交
  6. 14 12月, 2017 15 次提交
    • M
      dm: set QUEUE_FLAG_DAX accordingly in dm_table_set_restrictions() · ad3793fc
      Mike Snitzer 提交于
      Rather than having DAX support be unique by setting it based on table
      type in dm_setup_md_queue().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ad3793fc
    • M
      dm: fix __send_changing_extent_only() to send first bio and chain remainder · 3d7f4562
      Mike Snitzer 提交于
      __send_changing_extent_only() must follow the same pattern that was
      established with commit "dm: ensure bio submission follows a depth-first
      tree walk".  That is: submit first bio up to split boundary and then
      split the remainder to further submissions.
      Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3d7f4562
    • M
      dm: ensure bio-based DM's bioset and io_pool support targets' maximum IOs · 0776aa0e
      Mike Snitzer 提交于
      alloc_multiple_bios() assumes it can allocate the requested number of
      bios but until now there was no gaurantee that the mempools would be
      accomodating.
      Suggested-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0776aa0e
    • M
      dm: remove BIOSET_NEED_RESCUER based dm_offload infrastructure · 4a3f54d9
      Mike Snitzer 提交于
      Now that all of DM has been revised and/or verified to no longer require
      the use of BIOSET_NEED_RESCUER the dm_offload code may be removed.
      Suggested-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4a3f54d9
    • M
      dm: safely allocate multiple bioset bios · 318716dd
      Mike Snitzer 提交于
      DM targets can request multiple bios be sent to them by DM core (see:
      num_{flush,discard,write_same,write_zeroes}_bios).  But until now these
      bios were allocated in an unsafe manner than could potentially exhaust
      the DM device's bioset -- in the face of multiple threads each trying to
      do multiple allocations from the same DM device's bioset.
      
      Fix __send_duplicate_bios() by using the new alloc_multiple_bios().  The
      allocation strategy used by alloc_multiple_bios() models that used by
      dm-crypt.c:crypt_alloc_buffer().
      
      Neil Brown initially proposed this fix but the implementation has been
      revised enough that it inappropriate to attribute the entirety of it to
      him.
      Suggested-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      318716dd
    • N
      dm: remove unused 'num_write_bios' target interface · f31c21e4
      NeilBrown 提交于
      No DM target provides num_write_bios and none has since dm-cache's
      brief use in 2013.
      
      Having the possibility of num_write_bios > 1 complicates bio
      allocation.  So remove the interface and assume there is only one bio
      needed.
      
      If a target ever needs more, it must provide a suitable bioset and
      allocate itself based on its particular needs.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f31c21e4
    • N
      dm: ensure bio submission follows a depth-first tree walk · 18a25da8
      NeilBrown 提交于
      A dm device can, in general, represent a tree of targets, each of which
      handles a sub-range of the range of blocks handled by the parent.
      
      The bio sequencing managed by generic_make_request() requires that bios
      are generated and handled in a depth-first manner.  Each call to a
      make_request_fn() may submit bios to a single member device, and may
      submit bios for a reduced region of the same device as the
      make_request_fn.
      
      In particular, any bios submitted to member devices must be expected to
      be processed in order, so a later one must never wait for an earlier
      one.
      
      This ordering is usually achieved by using bio_split() to reduce a bio
      to a size that can be completely handled by one target, and resubmitting
      the remainder to the originating device. bio_queue_split() shows the
      canonical approach.
      
      dm doesn't follow this approach, largely because it has needed to split
      bios since long before bio_split() was available.  It currently can
      submit bios to separate targets within the one dm_make_request() call.
      Dependencies between these targets, as can happen with dm-snap, can
      cause deadlocks if either bios gets stuck behind the other in the queues
      managed by generic_make_request().  This requires the 'rescue'
      functionality provided by dm_offload_{start,end}.
      
      Some of this requirement can be removed by changing the order of bio
      submission to follow the canonical approach.  That is, if dm finds that
      it needs to split a bio, the remainder should be sent to
      generic_make_request() rather than being handled immediately.  This
      delays the handling until the first part is completely processed, so the
      deadlock problems do not occur.
      
      __split_and_process_bio() can be called both from dm_make_request() and
      from dm_wq_work().  When called from dm_wq_work() the current approach
      is perfectly satisfactory as each bio will be processed immediately.
      When called from dm_make_request(), current->bio_list will be non-NULL,
      and in this case it is best to create a separate "clone" bio for the
      remainder.
      
      When we use bio_clone_bioset() to split off the front part of a bio
      and chain the two together and submit the remainder to
      generic_make_request(), it is important that the newly allocated
      bio is used as the head to be processed immediately, and the original
      bio gets "bio_advance()"d and sent to generic_make_request() as the
      remainder.  Otherwise, if the newly allocated bio is used as the
      remainder, and if it then needs to be split again, then the next
      bio_clone_bioset() call will be made while holding a reference a bio
      (result of the first clone) from the same bioset.  This can potentially
      exhaust the bioset mempool and result in a memory allocation deadlock.
      
      Note that there is no race caused by reassigning cio.io->bio after already
      calling __map_bio().  This bio will only be dereferenced again after
      dec_pending() has found io->io_count to be zero, and this cannot happen
      before the dec_pending() call at the end of __split_and_process_bio().
      
      To provide the clone bio when splitting, we use q->bio_split.  This
      was previously being freed by bio-based dm to avoid having excess
      rescuer threads.  As bio_split bio sets no longer create rescuer
      threads, there is little cost and much gain from restoring the
      q->bio_split bio set.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      18a25da8
    • N
      dm io: remove BIOSET_NEED_RESCUER flag from bios bioset · c110a4b6
      NeilBrown 提交于
      The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
      do two allocations from the one bioset, and the second one could block
      until the first bio completes.
      
      dm_io() is called from make_request_fn() context.  The closest it comes
      to multiple allocations is in chunk_io() in dm-snap-persistent.  But
      there the code uses a separate thread to avoid problems.
      
      So BIOSET_NEED_RESCUER is not needed.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c110a4b6
    • N
      dm crypt: remove BIOSET_NEED_RESCUER flag · 80cd1757
      NeilBrown 提交于
      The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
      do two allocations from the one bioset, and the second one could block
      until the first bio completes.
      
      dm-crypt does allocate from this bioset inside the dm make_request_fn,
      but does so using GFP_NOWAIT so that the allocation will not block.
      
      So BIOSET_NEED_RESCUER is not needed.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      80cd1757
    • N
      dm: fix comment above dm_accept_partial_bio · c06b3e58
      NeilBrown 提交于
      Clarify that dm_accept_partial_bio isn't allowed for REQ_OP_ZONE_RESET
      bios.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c06b3e58
    • H
      dm raid: use rs_is_raid*() · 552aa679
      Heinz Mauelshagen 提交于
      Cleanup, no functional change.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      552aa679
    • H
      dm raid: simplify rs_get_progress() · 7c29744e
      Heinz Mauelshagen 提交于
      No need to calculate the reshaping progress because
      mddev->curr_resync_completed holds it.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7c29744e
    • H
      dm raid: ensure 'a' chars during reshape · dc15b943
      Heinz Mauelshagen 提交于
      During reshape, 'A' chars were reported in status rather than 'a'.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dc15b943
    • H
      dm raid: stop keeping raid set frozen altogether · 11e47232
      Heinz Mauelshagen 提交于
      In order to avoid redoing synchronization/recovery/reshape partially,
      the raid set got frozen until after all passed in table line flags had
      been cleared.  The related table reload sequence had to be precisely
      followed, or reshaping may lead to data corruption caused by the active
      mapping carrying on with a reshape when the inactive mapping already
      had retrieved a stale reshape position.
      
      Harden by retrieving the actual resync/recovery/reshape position
      during resume whilst the active table is suspended thus avoiding
      to keep the raid set frozen altogether.  This prevents superfluous
      redoing of an already resynchronized or recovered segment and,
      most importantly, potential for redoing of an already reshaped
      segment causing data corruption.
      
      Fixes: d39f0010 ("dm raid: fix raid_resume() to keep raid set frozen as needed")
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      11e47232
    • H
      dm raid: validate current raid sets redundancy · 53bf5384
      Heinz Mauelshagen 提交于
      Verifying the current raid sets redundancy based on retrieved
      superblock content has to use the superblock's raid level (e.g. raid0),
      not the constructor requested one (e.g. raid10).
      
      Using the requested raid level of raid10 lead to a "divide error"
      on raid0 which defines data copies divided by to be zero.
      
      Also check for bogus data copies.
      Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      53bf5384
  7. 08 12月, 2017 12 次提交