1. 20 11月, 2014 1 次提交
  2. 06 10月, 2014 1 次提交
    • B
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski 提交于
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      86f1152b
  3. 11 8月, 2014 1 次提交
    • J
      dm table: propagate QUEUE_FLAG_NO_SG_MERGE · 200612ec
      Jeff Moyer 提交于
      Commit 05f1dd53 ("block: add queue flag for disabling SG merging")
      introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
      default in blk_mq_init_queue for mq-enabled devices.  The effect of
      the flag is to bypass the SG segment merging.  Instead, the
      bio->bi_vcnt is used as the number of hardware segments.
      
      With a device mapper target on top of a device with
      QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
      than a driver is prepared to handle.  I ran into this when backporting
      the virtio_blk mq support.  It triggerred this BUG_ON, in
      virtio_queue_rq:
      
              BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
      
      The queue's max is set here:
              blk_queue_max_segments(q, vblk->sg_elems-2);
      
      Basically, what happens is that a bio is built up for the dm device
      (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
      bio_add_page.  That path will call into __blk_recalc_rq_segments, so
      what you end up with is bi_phys_segments being much smaller than bi_vcnt
      (and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
      is submitted, it gets cloned.  When the cloned bio is submitted, it will
      end up in blk_recount_segments, here:
      
              if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                      bio->bi_phys_segments = bio->bi_vcnt;
      
      and now we've set bio->bi_phys_segments to a number that is beyond what
      was registered as queue_max_segments by the driver.
      
      The right way to fix this is to propagate the queue flag up the stack.
      
      The rules for propagating the flag are simple:
      - if the flag is set for any underlying device, it must be set for the
        upper device
      - consequently, if the flag is not set for any underlying device, it
        should not be set for the upper device.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.16+
      200612ec
  4. 02 8月, 2014 1 次提交
  5. 04 6月, 2014 1 次提交
  6. 28 3月, 2014 2 次提交
  7. 07 1月, 2014 1 次提交
    • M
      dm table: remove unused buggy code that extends the targets array · 57a2f238
      Mikulas Patocka 提交于
      A device mapper table is allocated in the following way:
      * The function dm_table_create is called, it gets the number of targets
        as an argument -- it allocates a targets array accordingly.
      * For each target, we call dm_table_add_target.
      
      If we add more targets than were specified in dm_table_create, the
      function dm_table_add_target reallocates the targets array.  However,
      this reallocation code is wrong - it moves the targets array to a new
      location, while some target constructors hold pointers to the array in
      the old location.
      
      The following DM target drivers save the pointer to the target
      structure, so they corrupt memory if the target array is moved:
      multipath, raid, mirror, snapshot, stripe, switch, thin, verity.
      
      Under normal circumstances, the reallocation function is not called
      (because dm_table_create is called with the correct number of targets),
      so the buggy reallocation code is not used.
      
      Prior to the fix "dm table: fail dm_table_create on dm_round_up
      overflow", the reallocation code could only be used in case the user
      specifies too large a value in param->target_count, such as 0xffffffff.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      57a2f238
  8. 11 12月, 2013 1 次提交
  9. 10 11月, 2013 1 次提交
  10. 01 11月, 2013 1 次提交
  11. 06 9月, 2013 2 次提交
  12. 11 7月, 2013 1 次提交
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
  13. 10 5月, 2013 1 次提交
  14. 02 3月, 2013 2 次提交
  15. 22 12月, 2012 3 次提交
    • M
      dm: introduce per_bio_data · c0820cf5
      Mikulas Patocka 提交于
      Introduce a field per_bio_data_size in struct dm_target.
      
      Targets can set this field in the constructor. If a target sets this
      field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
      are allocated for each bio submitted to the target. These data can be
      used for any purpose by the target and help us improve performance by
      removing some per-target mempools.
      
      Per-bio data is accessed with dm_per_bio_data. The
      argument data_size must be the same as the value per_bio_data_size in
      dm_target.
      
      If the target has a pointer to per_bio_data, it can get a pointer to
      the bio with dm_bio_from_per_bio_data() function (data_size must be the
      same as the value passed to dm_per_bio_data).
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c0820cf5
    • M
      dm: prepare to support WRITE SAME · d54eaa5a
      Mike Snitzer 提交于
      Allow targets to opt in to WRITE SAME support by setting
      'num_write_same_requests' in the dm_target structure.
      
      A dm device will only advertise WRITE SAME support if all its
      targets and all its underlying devices support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d54eaa5a
    • M
      dm: disable WRITE SAME · c1a94672
      Mike Snitzer 提交于
      WRITE SAME bios are not yet handled correctly by device-mapper so
      disable their use on device-mapper devices by setting
      max_write_same_sectors to zero.
      
      As an example, a ciphertext device is incompatible because the data
      gets changed according to the location at which it written and so the
      dm crypt target cannot support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Milan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c1a94672
  16. 27 9月, 2012 2 次提交
    • M
      dm: retain table limits when swapping to new table with no devices · 3ae70656
      Mike Snitzer 提交于
      Add a safety net that will re-use the DM device's existing limits in the
      event that DM device has a temporary table that doesn't have any
      component devices.  This is to reduce the chance that requests not
      respecting the hardware limits will reach the device.
      
      DM recalculates queue limits based only on devices which currently exist
      in the table.  This creates a problem in the event all devices are
      temporarily removed such as all paths being lost in multipath.  DM will
      reset the limits to the maximum permissible, which can then assemble
      requests which exceed the limits of the paths when the paths are
      restored.  The request will fail the blk_rq_check_limits() test when
      sent to a path with lower limits, and will be retried without end by
      multipath.  This became a much bigger issue after v3.6 commit fe86cdce
      ("block: do not artificially constrain max_sectors for stacking
      drivers").
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3ae70656
    • M
      dm table: clear add_random unless all devices have it set · c3c4555e
      Milan Broz 提交于
      Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
      have it set. Otherwise devices with predictable characteristics may
      contribute entropy.
      
      QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
      contribute to the random pool.
      
      For bio-based targets this flag is always 0 because such devices have no
      real queue.
      
      For request-based devices this flag was always set to 1 by default.
      
      Now set it according to the flags on underlying devices. If there is at
      least one device which should not contribute, set the flag to zero: If a
      device, such as fast SSD storage, is not suitable for supplying entropy,
      a request-based queue stacked over it will not be either.
      
      Because the checking logic is exactly same as for the rotational flag,
      share the iteration function with device_is_nonrot().
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c3c4555e
  17. 27 7月, 2012 1 次提交
  18. 29 3月, 2012 2 次提交
    • M
      dm: reject trailing characters in sccanf input · 31998ef1
      Mikulas Patocka 提交于
      Device mapper uses sscanf to convert arguments to numbers. The problem is that
      the way we use it ignores additional unmatched characters in the scanned string.
      
      For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
      but also it will match number with some garbage appended, like "123abc".
      
      As a result, device mapper accepts garbage after some numbers. For example
      the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
      will pass without an error.
      
      This patch fixes all sscanf uses in device mapper. It appends "%c" with
      a pointer to a dummy character variable to every sscanf statement.
      
      The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
      only if string is a null-terminated number (optionally preceded by some
      whitespace characters). If there is some character appended after the number,
      sscanf matches "%c", writes the character to the dummy variable and returns 2.
      We check the return value for 1 and consequently reject numbers with some
      garbage appended.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      31998ef1
    • H
      dm table: simplify call to free_devices · 574ce07e
      Hannes Reinecke 提交于
      free_devices in dm_table.c already uses list_for_each(), so we don't
      need to check if the list is empty.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      574ce07e
  19. 11 1月, 2012 1 次提交
    • M
      block: Introduce blk_set_stacking_limits function · b1bd055d
      Martin K. Petersen 提交于
      Stacking driver queue limits are typically bounded exclusively by the
      capabilities of the low level devices, not by the stacking driver
      itself.
      
      This patch introduces blk_set_stacking_limits() which has more liberal
      metrics than the default queue limits function. This allows us to
      inherit topology parameters from bottom devices without manually
      tweaking the default limits in each driver prior to calling the stacking
      function.
      
      Since there is now a clear distinction between stacking and low-level
      devices, blk_set_default_limits() has been modified to carry the more
      conservative values that we used to manually set in
      blk_queue_make_request().
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1bd055d
  20. 01 11月, 2011 4 次提交
  21. 26 9月, 2011 2 次提交
  22. 02 8月, 2011 6 次提交
    • M
      dm table: set flush capability based on underlying devices · ed8b752b
      Mike Snitzer 提交于
      DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
      regardless of whether or not a given DM device's underlying devices
      also advertised a need for them.
      
      Block's flush-merge changes from 2.6.39 have proven to be more costly
      for DM devices.  Performance regressions have been reported even when
      DM's underlying devices do not advertise that they have a write cache.
      
      Fix the performance regressions by configuring a DM device's flushing
      capabilities based on those of the underlying devices' capabilities.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ed8b752b
    • M
      dm table: share target argument parsing functions · 498f0103
      Mike Snitzer 提交于
      Move multipath target argument parsing code into dm-table so other
      targets can share it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      498f0103
    • M
      dm: ignore merge_bvec for snapshots when safe · d5b9dd04
      Mikulas Patocka 提交于
      Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
      whether the device can accept bios larger than the size its merge
      function returns.  When set, use this to send large bios to snapshots
      which can split them if necessary.  Snapshot I/O may be significantly
      fragmented and this approach seems to improve peformance.
      
      Before the patch, dm_set_device_limits restricted bio size to page size
      if the underlying device had a merge function and the target didn't
      provide a merge function.  After the patch, dm_set_device_limits
      restricts bio size to page size if the underlying device has a merge
      function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
      provide a merge function.
      
      The snapshot target can't provide a merge function because when the merge
      function is called, it is impossible to determine where the bio will be
      remapped.  Previously this led us to impose a 4k limit, which we can
      now remove if the snapshot store is located on a device without a merge
      function.  Together with another patch for optimizing full chunk writes,
      it improves performance from 29MB/s to 40MB/s when writing to the
      filesystem on snapshot store.
      
      If the snapshot store is placed on a non-dm device with a merge function
      (such as md-raid), device mapper still limits all bios to page size.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5b9dd04
    • M
      dm table: clean dm_get_device and move exports · 08649012
      Mike Snitzer 提交于
      There is no need for __table_get_device to be factored out.
      Also move the exports to the end of their respective functions.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      08649012
    • J
      dm: use vzalloc · e29e65aa
      Joe Perches 提交于
      Use vzalloc() instead of vmalloc()+memset().
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      e29e65aa
    • M
      dm table: fix discard support · 936688d7
      Mike Snitzer 提交于
      Remove 'discards_supported' from the dm_table structure.  The same
      information can be easily discovered from the table's target(s) in
      dm_table_supports_discards().
      
      Before this fix dm_table_supports_discards() would skip checking the
      individual targets' 'discards_supported' flag if any one target in the
      table didn't set num_discard_requests > 0.  Now the per-target
      'discards_supported' flag is effective at insuring the final DM device
      advertises discard support.  But, to be clear, targets that don't
      support discards (!num_discard_requests) will not receive discard
      requests.
      
      Also DMWARN if a target sets 'discards_supported' override but forgets
      to set 'num_discard_requests'.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      936688d7
  23. 27 7月, 2011 1 次提交
  24. 29 5月, 2011 1 次提交
    • M
      dm table: reject devices without request fns · f4808ca9
      Milan Broz 提交于
      This patch adds a check that a block device has a request function
      defined before it is used.  Otherwise, misconfiguration can cause an oops.
      
      Because we are allowing devices with zero size e.g. an offline multipath
      device as in commit 2cd54d9b
      ("dm: allow offline devices") there needs to be an additional check
      to ensure devices are initialised.  Some block devices, like a loop
      device without a backing file, exist but have no request function.
      
      Reproducer is trivial: dm-mirror on unbound loop device
      (no backing file on loop devices)
      
      dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0"
      
      and mirror resync will immediatelly cause OOps.
      
      BUG: unable to handle kernel NULL pointer dereference at   (null)
       ? generic_make_request+0x2bd/0x590
       ? kmem_cache_alloc+0xad/0x190
       submit_bio+0x53/0xe0
       ? bio_add_page+0x3b/0x50
       dispatch_io+0x1ca/0x210 [dm_mod]
       ? read_callback+0x0/0xd0 [dm_mirror]
       dm_io+0xbb/0x290 [dm_mod]
       do_mirror+0x1e0/0x748 [dm_mirror]
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f4808ca9