1. 01 11月, 2013 1 次提交
  2. 06 9月, 2013 2 次提交
  3. 11 7月, 2013 1 次提交
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
  4. 10 5月, 2013 1 次提交
  5. 02 3月, 2013 2 次提交
  6. 22 12月, 2012 3 次提交
    • M
      dm: introduce per_bio_data · c0820cf5
      Mikulas Patocka 提交于
      Introduce a field per_bio_data_size in struct dm_target.
      
      Targets can set this field in the constructor. If a target sets this
      field to a non-zero value, "per_bio_data_size" bytes of auxiliary data
      are allocated for each bio submitted to the target. These data can be
      used for any purpose by the target and help us improve performance by
      removing some per-target mempools.
      
      Per-bio data is accessed with dm_per_bio_data. The
      argument data_size must be the same as the value per_bio_data_size in
      dm_target.
      
      If the target has a pointer to per_bio_data, it can get a pointer to
      the bio with dm_bio_from_per_bio_data() function (data_size must be the
      same as the value passed to dm_per_bio_data).
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c0820cf5
    • M
      dm: prepare to support WRITE SAME · d54eaa5a
      Mike Snitzer 提交于
      Allow targets to opt in to WRITE SAME support by setting
      'num_write_same_requests' in the dm_target structure.
      
      A dm device will only advertise WRITE SAME support if all its
      targets and all its underlying devices support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d54eaa5a
    • M
      dm: disable WRITE SAME · c1a94672
      Mike Snitzer 提交于
      WRITE SAME bios are not yet handled correctly by device-mapper so
      disable their use on device-mapper devices by setting
      max_write_same_sectors to zero.
      
      As an example, a ciphertext device is incompatible because the data
      gets changed according to the location at which it written and so the
      dm crypt target cannot support it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Milan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c1a94672
  7. 27 9月, 2012 2 次提交
    • M
      dm: retain table limits when swapping to new table with no devices · 3ae70656
      Mike Snitzer 提交于
      Add a safety net that will re-use the DM device's existing limits in the
      event that DM device has a temporary table that doesn't have any
      component devices.  This is to reduce the chance that requests not
      respecting the hardware limits will reach the device.
      
      DM recalculates queue limits based only on devices which currently exist
      in the table.  This creates a problem in the event all devices are
      temporarily removed such as all paths being lost in multipath.  DM will
      reset the limits to the maximum permissible, which can then assemble
      requests which exceed the limits of the paths when the paths are
      restored.  The request will fail the blk_rq_check_limits() test when
      sent to a path with lower limits, and will be retried without end by
      multipath.  This became a much bigger issue after v3.6 commit fe86cdce
      ("block: do not artificially constrain max_sectors for stacking
      drivers").
      Reported-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3ae70656
    • M
      dm table: clear add_random unless all devices have it set · c3c4555e
      Milan Broz 提交于
      Always clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
      have it set. Otherwise devices with predictable characteristics may
      contribute entropy.
      
      QUEUE_FLAG_ADD_RANDOM specifies whether or not queue IO timings
      contribute to the random pool.
      
      For bio-based targets this flag is always 0 because such devices have no
      real queue.
      
      For request-based devices this flag was always set to 1 by default.
      
      Now set it according to the flags on underlying devices. If there is at
      least one device which should not contribute, set the flag to zero: If a
      device, such as fast SSD storage, is not suitable for supplying entropy,
      a request-based queue stacked over it will not be either.
      
      Because the checking logic is exactly same as for the rotational flag,
      share the iteration function with device_is_nonrot().
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      c3c4555e
  8. 27 7月, 2012 1 次提交
  9. 29 3月, 2012 2 次提交
    • M
      dm: reject trailing characters in sccanf input · 31998ef1
      Mikulas Patocka 提交于
      Device mapper uses sscanf to convert arguments to numbers. The problem is that
      the way we use it ignores additional unmatched characters in the scanned string.
      
      For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
      but also it will match number with some garbage appended, like "123abc".
      
      As a result, device mapper accepts garbage after some numbers. For example
      the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
      will pass without an error.
      
      This patch fixes all sscanf uses in device mapper. It appends "%c" with
      a pointer to a dummy character variable to every sscanf statement.
      
      The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
      only if string is a null-terminated number (optionally preceded by some
      whitespace characters). If there is some character appended after the number,
      sscanf matches "%c", writes the character to the dummy variable and returns 2.
      We check the return value for 1 and consequently reject numbers with some
      garbage appended.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      31998ef1
    • H
      dm table: simplify call to free_devices · 574ce07e
      Hannes Reinecke 提交于
      free_devices in dm_table.c already uses list_for_each(), so we don't
      need to check if the list is empty.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      574ce07e
  10. 11 1月, 2012 1 次提交
    • M
      block: Introduce blk_set_stacking_limits function · b1bd055d
      Martin K. Petersen 提交于
      Stacking driver queue limits are typically bounded exclusively by the
      capabilities of the low level devices, not by the stacking driver
      itself.
      
      This patch introduces blk_set_stacking_limits() which has more liberal
      metrics than the default queue limits function. This allows us to
      inherit topology parameters from bottom devices without manually
      tweaking the default limits in each driver prior to calling the stacking
      function.
      
      Since there is now a clear distinction between stacking and low-level
      devices, blk_set_default_limits() has been modified to carry the more
      conservative values that we used to manually set in
      blk_queue_make_request().
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1bd055d
  11. 01 11月, 2011 4 次提交
  12. 26 9月, 2011 2 次提交
  13. 02 8月, 2011 6 次提交
    • M
      dm table: set flush capability based on underlying devices · ed8b752b
      Mike Snitzer 提交于
      DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
      regardless of whether or not a given DM device's underlying devices
      also advertised a need for them.
      
      Block's flush-merge changes from 2.6.39 have proven to be more costly
      for DM devices.  Performance regressions have been reported even when
      DM's underlying devices do not advertise that they have a write cache.
      
      Fix the performance regressions by configuring a DM device's flushing
      capabilities based on those of the underlying devices' capabilities.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ed8b752b
    • M
      dm table: share target argument parsing functions · 498f0103
      Mike Snitzer 提交于
      Move multipath target argument parsing code into dm-table so other
      targets can share it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      498f0103
    • M
      dm: ignore merge_bvec for snapshots when safe · d5b9dd04
      Mikulas Patocka 提交于
      Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
      whether the device can accept bios larger than the size its merge
      function returns.  When set, use this to send large bios to snapshots
      which can split them if necessary.  Snapshot I/O may be significantly
      fragmented and this approach seems to improve peformance.
      
      Before the patch, dm_set_device_limits restricted bio size to page size
      if the underlying device had a merge function and the target didn't
      provide a merge function.  After the patch, dm_set_device_limits
      restricts bio size to page size if the underlying device has a merge
      function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
      provide a merge function.
      
      The snapshot target can't provide a merge function because when the merge
      function is called, it is impossible to determine where the bio will be
      remapped.  Previously this led us to impose a 4k limit, which we can
      now remove if the snapshot store is located on a device without a merge
      function.  Together with another patch for optimizing full chunk writes,
      it improves performance from 29MB/s to 40MB/s when writing to the
      filesystem on snapshot store.
      
      If the snapshot store is placed on a non-dm device with a merge function
      (such as md-raid), device mapper still limits all bios to page size.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5b9dd04
    • M
      dm table: clean dm_get_device and move exports · 08649012
      Mike Snitzer 提交于
      There is no need for __table_get_device to be factored out.
      Also move the exports to the end of their respective functions.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      08649012
    • J
      dm: use vzalloc · e29e65aa
      Joe Perches 提交于
      Use vzalloc() instead of vmalloc()+memset().
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      e29e65aa
    • M
      dm table: fix discard support · 936688d7
      Mike Snitzer 提交于
      Remove 'discards_supported' from the dm_table structure.  The same
      information can be easily discovered from the table's target(s) in
      dm_table_supports_discards().
      
      Before this fix dm_table_supports_discards() would skip checking the
      individual targets' 'discards_supported' flag if any one target in the
      table didn't set num_discard_requests > 0.  Now the per-target
      'discards_supported' flag is effective at insuring the final DM device
      advertises discard support.  But, to be clear, targets that don't
      support discards (!num_discard_requests) will not receive discard
      requests.
      
      Also DMWARN if a target sets 'discards_supported' override but forgets
      to set 'num_discard_requests'.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      936688d7
  14. 27 7月, 2011 1 次提交
  15. 29 5月, 2011 2 次提交
    • M
      dm table: reject devices without request fns · f4808ca9
      Milan Broz 提交于
      This patch adds a check that a block device has a request function
      defined before it is used.  Otherwise, misconfiguration can cause an oops.
      
      Because we are allowing devices with zero size e.g. an offline multipath
      device as in commit 2cd54d9b
      ("dm: allow offline devices") there needs to be an additional check
      to ensure devices are initialised.  Some block devices, like a loop
      device without a backing file, exist but have no request function.
      
      Reproducer is trivial: dm-mirror on unbound loop device
      (no backing file on loop devices)
      
      dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0"
      
      and mirror resync will immediatelly cause OOps.
      
      BUG: unable to handle kernel NULL pointer dereference at   (null)
       ? generic_make_request+0x2bd/0x590
       ? kmem_cache_alloc+0xad/0x190
       submit_bio+0x53/0xe0
       ? bio_add_page+0x3b/0x50
       dispatch_io+0x1ca/0x210 [dm_mod]
       ? read_callback+0x0/0xd0 [dm_mirror]
       dm_io+0xbb/0x290 [dm_mod]
       do_mirror+0x1e0/0x748 [dm_mirror]
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reported-by: NZdenek Kabelac <zkabelac@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f4808ca9
    • M
      dm table: allow targets to support discards internally · 4c259327
      Mike Snitzer 提交于
      Permit a target to support discards regardless of whether or not all its
      underlying devices do.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4c259327
  16. 06 4月, 2011 1 次提交
    • M
      dm: improve block integrity support · a63a5cf8
      Mike Snitzer 提交于
      The current block integrity (DIF/DIX) support in DM is verifying that
      all devices' integrity profiles match during DM device resume (which
      is past the point of no return).  To some degree that is unavoidable
      (stacked DM devices force this late checking).  But for most DM
      devices (which aren't stacking on other DM devices) the ideal time to
      verify all integrity profiles match is during table load.
      
      Introduce the notion of an "initialized" integrity profile: a profile
      that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
      template.  Add blk_integrity_is_initialized() to allow checking if a
      profile was initialized.
      
      Update DM integrity support to:
      - check all devices with _initialized_ integrity profiles match
        during table load; uninitialized profiles (e.g. for underlying DM
        device(s) of a stacked DM device) are ignored.
      - disallow a table load that would result in an integrity profile that
        conflicts with a DM device's existing (in-use) integrity profile
      - avoid clearing an existing integrity profile
      - validate all integrity profiles match during resume; but if they
        don't all we can do is report the mismatch (during resume we're past
        the point of no return)
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a63a5cf8
  17. 17 3月, 2011 1 次提交
  18. 10 3月, 2011 1 次提交
  19. 15 1月, 2011 1 次提交
    • T
      block: restore multiple bd_link_disk_holder() support · 49731baa
      Tejun Heo 提交于
      Commit e09b457b (block: simplify holder symlink handling) incorrectly
      assumed that there is only one link at maximum.  dm may use multiple
      links and expects block layer to track reference count for each link,
      which is different from and unrelated to the exclusive device holder
      identified by @holder when the device is opened.
      
      Remove the single holder assumption and automatic removal of the link
      and revive the per-link reference count tracking.  The code
      essentially behaves the same as before commit e09b457b sans the
      unnecessary kobject reference count dancing.
      
      While at it, note that this facility should not be used by anyone else
      than the current ones.  Sysfs symlinks shouldn't be abused like this
      and the whole thing doesn't belong in the block layer at all.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMilan Broz <mbroz@redhat.com>
      Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      49731baa
  20. 14 1月, 2011 2 次提交
  21. 17 12月, 2010 2 次提交
    • M
      block: max hardware sectors limit wrapper · 72d4cd9f
      Mike Snitzer 提交于
      Implement blk_limits_max_hw_sectors() and make
      blk_queue_max_hw_sectors() a wrapper around it.
      
      DM needs this to avoid setting queue_limits' max_hw_sectors and
      max_sectors directly.  dm_set_device_limits() now leverages
      blk_limits_max_hw_sectors() logic to establish the appropriate
      max_hw_sectors minimum (PAGE_SIZE).  Fixes issue where DM was
      incorrectly setting max_sectors rather than max_hw_sectors (which
      caused dm_merge_bvec()'s max_hw_sectors check to be ineffective).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      72d4cd9f
    • M
      block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead · e692cb66
      Martin K. Petersen 提交于
      When stacking devices, a request_queue is not always available. This
      forced us to have a no_cluster flag in the queue_limits that could be
      used as a carrier until the request_queue had been set up for a
      metadevice.
      
      There were several problems with that approach. First of all it was up
      to the stacking device to remember to set queue flag after stacking had
      completed. Also, the queue flag and the queue limits had to be kept in
      sync at all times. We got that wrong, which could lead to us issuing
      commands that went beyond the max scatterlist limit set by the driver.
      
      The proper fix is to avoid having two flags for tracking the same thing.
      We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
      block layer merging functions. The queue_limit 'no_cluster' is turned
      into 'cluster' to avoid double negatives and to ease stacking.
      Clustering defaults to being enabled as before. The queue flag logic is
      removed from the stacking function, and explicitly setting the cluster
      flag is no longer necessary in DM and MD.
      Reported-by: NEd Lin <ed.lin@promise.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e692cb66
  22. 13 11月, 2010 1 次提交
    • T
      block: clean up blkdev_get() wrappers and their users · d4d77629
      Tejun Heo 提交于
      After recent blkdev_get() modifications, open_by_devnum() and
      open_bdev_exclusive() are simple wrappers around blkdev_get().
      Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
      
      blkdev_get_by_dev() is identical to open_by_devnum().
      blkdev_get_by_path() is slightly different in that it doesn't
      automatically add %FMODE_EXCL to @mode.
      
      All users are converted.  Most conversions are mechanical and don't
      introduce any behavior difference.  There are several exceptions.
      
      * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
        reason to OR it explicitly on blkdev_put().
      
      * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
        sb->s_mode.
      
      * With the above changes, sb->s_mode now always should contain
        FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
        errors.
      
      The new blkdev_get_*() functions are with proper docbook comments.
      While at it, add function description to blkdev_get() too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Joern Engel <joern@lazybastard.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: xfs-masters@oss.sgi.com
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      d4d77629