1. 20 8月, 2015 1 次提交
  2. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  3. 26 6月, 2015 2 次提交
  4. 30 5月, 2015 2 次提交
    • M
      dm: do not allocate any mempools for blk-mq request-based DM · cbc4e3c1
      Mike Snitzer 提交于
      Do not allocate the io_pool mempool for blk-mq request-based DM
      (DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().
      
      Also refine __bind_mempools() to have more precise awareness of which
      mempools each type of DM device uses -- avoids mempool churn when
      reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cbc4e3c1
    • J
      dm: fix reload failure of 0 path multipath mapping on blk-mq devices · 15b94a69
      Junichi Nomura 提交于
      dm-multipath accepts 0 path mapping.
      
        # echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev
      
      Such a mapping can be used to release underlying devices while still
      holding requests in its queue until working paths come back.
      
      However, once the multipath device is created over blk-mq devices,
      it rejects reloading of 0 path mapping:
      
        # echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
            | dmsetup create mpath1
        # echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
        device-mapper: reload ioctl on mpath1 failed: Invalid argument
        Command failed
      
      With following kernel message:
        device-mapper: ioctl: can't change device type after initial table load.
      
      DM tries to inherit the current table type using dm_table_set_type()
      but it doesn't work as expected because of unnecessary check about
      whether the target type is hybrid or not.
      
      Hybrid type is for targets that work as either request-based or bio-based
      and not required for blk-mq or non blk-mq checking.
      
      Fixes: 65803c20 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      15b94a69
  5. 22 5月, 2015 1 次提交
    • C
      block, dm: don't copy bios for request clones · 5f1b670d
      Christoph Hellwig 提交于
      Currently dm-multipath has to clone the bios for every request sent
      to the lower devices, which wastes cpu cycles and ties down memory.
      
      This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
      to not complete bios attached to a request, which we set on clone
      requests similar to bios in a flush sequence.  With this change I/O
      errors on a path failure only get propagated to dm-multipath, which
      can then either resubmit the I/O or complete the bios on the original
      request.
      
      I've done some basic testing of this on a Linux target with ALUA support,
      and it survives path failures during I/O nicely.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5f1b670d
  6. 16 4月, 2015 4 次提交
    • J
      dm table: use bool function return values of true/false not 1/0 · 7f61f5a0
      Joe Perches 提交于
      Use the normal return values for bool functions.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7f61f5a0
    • D
      dm table: fall back to getting device using name_to_dev_t() · 644bda6f
      Dan Ehrenberg 提交于
      If a device is used as the root filesystem, it can't be built
      off of devices which are within the root filesystem (just like
      command line arguments to root=).  For this reason, Linux has a
      pseudo-filesystem for root= and MD initialization (based on the
      function name_to_dev_t) which handles different ways of specifying
      devices including PARTUUID and major:minor.
      
      Switch to using name_to_dev_t() in dm_get_device().  Rather than
      having DM assume that all things which are not major:minor are paths in
      an already-mounted filesystem, change dm_get_device() to first attempt
      to look up the device in the filesystem, and if not found it will fall
      back to using name_to_dev_t().
      
      In terms of backwards compatibility, there are some cases where
      behavior will be different:
      - If you have a file in the current working directory named 1:2 and
        you initialze DM there, then it will try to use that file rather
        than the disk with that major:minor pair as a backing device.
      - Similarly for other bdev types which name_to_dev_t() knows how to
        interpret, the previous behavior was to repeatedly check for the
        existence of the file (e.g., while waiting for rootfs to come up)
        but the new behavior is to use the name_to_dev_t() interpretation.
        For example, if you have a file named /dev/ubiblock0_0 which is
        a symlink to /dev/sda3, but it is not yet present when DM starts
        to initialize, then the name_to_dev_t() interpretation will take
        precedence.
      
      These incompatibilities would only show up in really strange setups
      with bad practices so we shouldn't have to worry about them.
      Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      644bda6f
    • M
      dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8
      Mike Snitzer 提交于
      Request-based DM's blk-mq support defaults to off; but a user can easily
      change the default using the dm_mod.use_blk_mq module/boot option.
      
      Also, you can check what mode a given request-based DM device is using
      with: cat /sys/block/dm-X/dm/use_blk_mq
      
      This change enabled further cleanup and reduced work (e.g. the
      md->io_pool and md->rq_pool isn't created if using blk-mq).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      17e149b8
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  7. 01 4月, 2015 1 次提交
    • M
      dm: remove request-based DM queue's lld_busy_fn hook · d56b9b28
      Mike Snitzer 提交于
      DM multipath is the only caller of blk_lld_busy() -- which calls a
      queue's lld_busy_fn hook.  Request-based DM doesn't support stacking
      multipath devices so there is no reason to register the lld_busy_fn hook
      on a multipath device's queue using blk_queue_lld_busy().
      
      As such, remove functions dm_lld_busy and dm_table_any_busy_target.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d56b9b28
  8. 11 2月, 2015 1 次提交
  9. 10 2月, 2015 2 次提交
    • M
      dm table: train hybrid target type detection to select blk-mq if appropriate · 65803c20
      Mike Snitzer 提交于
      Otherwise replacing the multipath target with the error target fails:
        device-mapper: ioctl: can't change device type after initial table load.
      
      The error target was mistakenly considered to be target type
      DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the
      target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      65803c20
    • M
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5863d9a
  10. 20 11月, 2014 1 次提交
  11. 06 10月, 2014 1 次提交
    • B
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski 提交于
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      86f1152b
  12. 11 8月, 2014 1 次提交
    • J
      dm table: propagate QUEUE_FLAG_NO_SG_MERGE · 200612ec
      Jeff Moyer 提交于
      Commit 05f1dd53 ("block: add queue flag for disabling SG merging")
      introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
      default in blk_mq_init_queue for mq-enabled devices.  The effect of
      the flag is to bypass the SG segment merging.  Instead, the
      bio->bi_vcnt is used as the number of hardware segments.
      
      With a device mapper target on top of a device with
      QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
      than a driver is prepared to handle.  I ran into this when backporting
      the virtio_blk mq support.  It triggerred this BUG_ON, in
      virtio_queue_rq:
      
              BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
      
      The queue's max is set here:
              blk_queue_max_segments(q, vblk->sg_elems-2);
      
      Basically, what happens is that a bio is built up for the dm device
      (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
      bio_add_page.  That path will call into __blk_recalc_rq_segments, so
      what you end up with is bi_phys_segments being much smaller than bi_vcnt
      (and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
      is submitted, it gets cloned.  When the cloned bio is submitted, it will
      end up in blk_recount_segments, here:
      
              if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                      bio->bi_phys_segments = bio->bi_vcnt;
      
      and now we've set bio->bi_phys_segments to a number that is beyond what
      was registered as queue_max_segments by the driver.
      
      The right way to fix this is to propagate the queue flag up the stack.
      
      The rules for propagating the flag are simple:
      - if the flag is set for any underlying device, it must be set for the
        upper device
      - consequently, if the flag is not set for any underlying device, it
        should not be set for the upper device.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.16+
      200612ec
  13. 02 8月, 2014 1 次提交
  14. 04 6月, 2014 1 次提交
  15. 28 3月, 2014 2 次提交
  16. 07 1月, 2014 1 次提交
    • M
      dm table: remove unused buggy code that extends the targets array · 57a2f238
      Mikulas Patocka 提交于
      A device mapper table is allocated in the following way:
      * The function dm_table_create is called, it gets the number of targets
        as an argument -- it allocates a targets array accordingly.
      * For each target, we call dm_table_add_target.
      
      If we add more targets than were specified in dm_table_create, the
      function dm_table_add_target reallocates the targets array.  However,
      this reallocation code is wrong - it moves the targets array to a new
      location, while some target constructors hold pointers to the array in
      the old location.
      
      The following DM target drivers save the pointer to the target
      structure, so they corrupt memory if the target array is moved:
      multipath, raid, mirror, snapshot, stripe, switch, thin, verity.
      
      Under normal circumstances, the reallocation function is not called
      (because dm_table_create is called with the correct number of targets),
      so the buggy reallocation code is not used.
      
      Prior to the fix "dm table: fail dm_table_create on dm_round_up
      overflow", the reallocation code could only be used in case the user
      specifies too large a value in param->target_count, such as 0xffffffff.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      57a2f238
  17. 11 12月, 2013 1 次提交
  18. 10 11月, 2013 1 次提交
  19. 01 11月, 2013 1 次提交
  20. 06 9月, 2013 2 次提交
  21. 11 7月, 2013 1 次提交
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
  22. 10 5月, 2013 1 次提交
  23. 02 3月, 2013 2 次提交
  24. 22 12月, 2012 3 次提交
    • M
      dm: introduce per_bio_data · c0820cf5
      Mikulas Patocka 提交于
      Introduce a field per_bio_data_size in struct dm_target.
      
      Targets can set this field in the constructor. If a target sets this
      field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
      are allocated for each bio submitted to the target. These data can be
      used for any purpose by the target and help us improve performance by
      removing some per-target mempools.
      
      Per-bio data is accessed with dm_per_bio_data. The
      argument data_size must be the same as the value per_bio_data_size in
      dm_target.
      
      If the target has a pointer to per_bio_data, it can get a pointer to
      the bio with dm_bio_from_per_bio_data() function (data_size must be the
      same as the value passed to dm_per_bio_data).
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c0820cf5
    • M
      dm: prepare to support WRITE SAME · d54eaa5a
      Mike Snitzer 提交于
      Allow targets to opt in to WRITE SAME support by setting
      'num_write_same_requests' in the dm_target structure.
      
      A dm device will only advertise WRITE SAME support if all its
      targets and all its underlying devices support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d54eaa5a
    • M
      dm: disable WRITE SAME · c1a94672
      Mike Snitzer 提交于
      WRITE SAME bios are not yet handled correctly by device-mapper so
      disable their use on device-mapper devices by setting
      max_write_same_sectors to zero.
      
      As an example, a ciphertext device is incompatible because the data
      gets changed according to the location at which it written and so the
      dm crypt target cannot support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Milan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c1a94672
  25. 27 9月, 2012 2 次提交
    • M
      dm: retain table limits when swapping to new table with no devices · 3ae70656
      Mike Snitzer 提交于
      Add a safety net that will re-use the DM device's existing limits in the
      event that DM device has a temporary table that doesn't have any
      component devices.  This is to reduce the chance that requests not
      respecting the hardware limits will reach the device.
      
      DM recalculates queue limits based only on devices which currently exist
      in the table.  This creates a problem in the event all devices are
      temporarily removed such as all paths being lost in multipath.  DM will
      reset the limits to the maximum permissible, which can then assemble
      requests which exceed the limits of the paths when the paths are
      restored.  The request will fail the blk_rq_check_limits() test when
      sent to a path with lower limits, and will be retried without end by
      multipath.  This became a much bigger issue after v3.6 commit fe86cdce
      ("block: do not artificially constrain max_sectors for stacking
      drivers").
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3ae70656
    • M
      dm table: clear add_random unless all devices have it set · c3c4555e
      Milan Broz 提交于
      Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
      have it set. Otherwise devices with predictable characteristics may
      contribute entropy.
      
      QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
      contribute to the random pool.
      
      For bio-based targets this flag is always 0 because such devices have no
      real queue.
      
      For request-based devices this flag was always set to 1 by default.
      
      Now set it according to the flags on underlying devices. If there is at
      least one device which should not contribute, set the flag to zero: If a
      device, such as fast SSD storage, is not suitable for supplying entropy,
      a request-based queue stacked over it will not be either.
      
      Because the checking logic is exactly same as for the rotational flag,
      share the iteration function with device_is_nonrot().
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c3c4555e
  26. 27 7月, 2012 1 次提交
  27. 29 3月, 2012 2 次提交
    • M
      dm: reject trailing characters in sccanf input · 31998ef1
      Mikulas Patocka 提交于
      Device mapper uses sscanf to convert arguments to numbers. The problem is that
      the way we use it ignores additional unmatched characters in the scanned string.
      
      For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
      but also it will match number with some garbage appended, like "123abc".
      
      As a result, device mapper accepts garbage after some numbers. For example
      the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
      will pass without an error.
      
      This patch fixes all sscanf uses in device mapper. It appends "%c" with
      a pointer to a dummy character variable to every sscanf statement.
      
      The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
      only if string is a null-terminated number (optionally preceded by some
      whitespace characters). If there is some character appended after the number,
      sscanf matches "%c", writes the character to the dummy variable and returns 2.
      We check the return value for 1 and consequently reject numbers with some
      garbage appended.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      31998ef1
    • H
      dm table: simplify call to free_devices · 574ce07e
      Hannes Reinecke 提交于
      free_devices in dm_table.c already uses list_for_each(), so we don't
      need to check if the list is empty.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      574ce07e