1. 21 7月, 2016 2 次提交
    • M
      dm error: add DAX support · f8df1fdf
      Mike Snitzer 提交于
      Allow the error target to replace an existing DAX-enabled target.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f8df1fdf
    • T
      dm: add infrastructure for DAX support · 545ed20e
      Toshi Kani 提交于
      Change mapped device to implement direct_access function,
      dm_blk_direct_access(), which calls a target direct_access function.
      'struct target_type' is extended to have target direct_access interface.
      This function limits direct accessible size to the dm_target's limit
      with max_io_len().
      
      Add dm_table_supports_dax() to iterate all targets and associated block
      devices to check for DAX support.  To add DAX support to a DM target the
      target must only implement the direct_access function.
      
      Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
      device supports DAX and is bio based.  This new type is used to assure
      that all target devices have DAX support and remain that way after
      QUEUE_FLAG_DAX is set in mapped device.
      
      At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
      DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
      mapped device must have the same type, or else it fails per the check in
      table_load().
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      545ed20e
  2. 11 6月, 2016 2 次提交
    • M
      dm mpath: add optional "queue_mode" feature · e83068a5
      Mike Snitzer 提交于
      Allow a user to specify an optional feature 'queue_mode <mode>' where
      <mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
      request_fn rq-based, and blk-mq rq-based respectively.
      
      If the queue_mode feature isn't specified the default for the
      "multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
      it'll default to mode "mq".
      
      This new queue_mode feature introduces the ability for each multipath
      device to have its own queue_mode (whereas before this feature all
      multipath devices effectively had to have the same queue_mode).
      
      This commit also goes a long way to eliminate the awkward (ab)use of
      DM_TYPE_*, the associated filter_md_type() and other relatively fragile
      and difficult to maintain code.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e83068a5
    • M
      dm: move request-based code out to dm-rq.[hc] · 4cc96131
      Mike Snitzer 提交于
      Add some seperation between bio-based and request-based DM core code.
      
      'struct mapped_device' and other DM core only structures and functions
      have been moved to dm-core.h and all relevant DM core .c files have been
      updated to include dm-core.h rather than dm.h
      
      DM targets should _never_ include dm-core.h!
      
      [block core merge conflict resolution from Stephen Rothwell]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      4cc96131
  3. 14 4月, 2016 1 次提交
  4. 13 4月, 2016 1 次提交
  5. 11 3月, 2016 1 次提交
    • D
      dm snapshot: disallow the COW and origin devices from being identical · 4df2bf46
      DingXiang 提交于
      Otherwise loading a "snapshot" table using the same device for the
      origin and COW devices, e.g.:
      
      echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap
      
      will trigger:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      [ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
      [ 1958.989655] PGD 0
      [ 1958.991903] Oops: 0000 [#1] SMP
      ...
      [ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G          IO    4.5.0-rc5.snitm+ #150
      ...
      [ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000
      [ 1959.091865] RIP: 0010:[<ffffffffa040efba>]  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
      [ 1959.104295] RSP: 0018:ffff88032a957b30  EFLAGS: 00010246
      [ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
      [ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00
      [ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001
      [ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80
      [ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80
      [ 1959.150021] FS:  00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000
      [ 1959.159047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0
      [ 1959.173415] Stack:
      [ 1959.175656]  ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc
      [ 1959.183946]  ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000
      [ 1959.192233]  ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf
      [ 1959.200521] Call Trace:
      [ 1959.203248]  [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot]
      [ 1959.211986]  [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot]
      [ 1959.219469]  [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod]
      [ 1959.226656]  [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod]
      [ 1959.234328]  [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod]
      [ 1959.241129]  [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
      [ 1959.248607]  [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod]
      [ 1959.255307]  [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20
      [ 1959.261816]  [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
      [ 1959.268615]  [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0
      [ 1959.274637]  [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100
      [ 1959.281726]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
      [ 1959.288814]  [<ffffffff81216449>] SyS_ioctl+0x79/0x90
      [ 1959.294450]  [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71
      ...
      [ 1959.323277] RIP  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
      [ 1959.333090]  RSP <ffff88032a957b30>
      [ 1959.336978] CR2: 0000000000000098
      [ 1959.344121] ---[ end trace b049991ccad1169e ]---
      
      Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899
      Cc: stable@vger.kernel.org
      Signed-off-by: NDing Xiang <dingxiang@huawei.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4df2bf46
  6. 23 2月, 2016 3 次提交
    • M
      dm: rename target's per_bio_data_size to per_io_data_size · 30187e1d
      Mike Snitzer 提交于
      Request-based DM will also make use of per_bio_data_size.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      30187e1d
    • M
      dm: optimize dm_mq_queue_rq() · 16f12266
      Mike Snitzer 提交于
      DM multipath is the only dm-mq target.  But that aside, request-based DM
      only supports tables with a single target that is immutable.  Leverage
      this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
      the mapped_device when the table was made active.  This saves the need
      to even take the read-side of the SRCU via dm_{get,put}_live_table.
      
      If the active DM table does not have an immutable target (e.g. "error"
      target was swapped in) then fallback to the slow-path where the target
      is looked up from the live table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      16f12266
    • M
      dm: set DM_TARGET_WILDCARD feature on "error" target · f083b09b
      Mike Snitzer 提交于
      The DM_TARGET_WILDCARD feature indicates that the "error" target may
      replace any target; even immutable targets.  This feature will be useful
      to preserve the ability to replace the "multipath" target even once it
      is formally converted over to having the DM_TARGET_IMMUTABLE feature.
      
      Also, implicit in the DM_TARGET_WILDCARD feature flag being set is that
      .map, .map_rq, .clone_and_map_rq and .release_clone_rq are all defined
      in the target_type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      f083b09b
  7. 22 10月, 2015 1 次提交
    • M
      block: Inline blk_integrity in struct gendisk · 25520d55
      Martin K. Petersen 提交于
      Up until now the_integrity profile has been dynamically allocated and
      attached to struct gendisk after the disk has been made active.
      
      This causes problems because NVMe devices need to register the profile
      prior to the partition table being read due to a mandatory metadata
      buffer requirement. In addition, DM goes through hoops to deal with
      preallocating, but not initializing integrity profiles.
      
      Since the integrity profile is small (4 bytes + a pointer), Christoph
      suggested moving it to struct gendisk proper. This requires several
      changes:
      
       - Moving the blk_integrity definition to genhd.h.
      
       - Inlining blk_integrity in struct gendisk.
      
       - Removing the dynamic allocation code.
      
       - Adding helper functions which allow gendisk to set up and tear down
         the integrity sysfs dir when a disk is added/deleted.
      
       - Adding a blk_integrity_revalidate() callback for updating the stable
         pages bdi setting.
      
       - The calls that depend on whether a device has an integrity profile or
         not now key off of the bi->profile pointer.
      
       - Simplifying the integrity support routines in DM (Mike Snitzer).
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reported-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      25520d55
  8. 20 8月, 2015 1 次提交
  9. 14 8月, 2015 1 次提交
    • K
      block: kill merge_bvec_fn() completely · 8ae12666
      Kent Overstreet 提交于
      As generic_make_request() is now able to handle arbitrarily sized bios,
      it's no longer necessary for each individual block driver to define its
      own ->merge_bvec_fn() callback. Remove every invocation completely.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: drbd-user@lists.linbit.com
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@kernel.org>
      Cc: ceph-devel@vger.kernel.org
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      [dpark: also remove ->merge_bvec_fn() in dm-thin as well as
       dm-era-target, and resolve merge conflicts]
      Signed-off-by: NDongsu Park <dpark@posteo.net>
      Signed-off-by: NMing Lin <ming.l@ssi.samsung.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      8ae12666
  10. 26 6月, 2015 2 次提交
  11. 30 5月, 2015 2 次提交
    • M
      dm: do not allocate any mempools for blk-mq request-based DM · cbc4e3c1
      Mike Snitzer 提交于
      Do not allocate the io_pool mempool for blk-mq request-based DM
      (DM_TYPE_MQ_REQUEST_BASED) in dm_alloc_rq_mempools().
      
      Also refine __bind_mempools() to have more precise awareness of which
      mempools each type of DM device uses -- avoids mempool churn when
      reloading DM tables (particularly for DM_TYPE_REQUEST_BASED).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cbc4e3c1
    • J
      dm: fix reload failure of 0 path multipath mapping on blk-mq devices · 15b94a69
      Junichi Nomura 提交于
      dm-multipath accepts 0 path mapping.
      
        # echo '0 2097152 multipath 0 0 0 0' | dmsetup create newdev
      
      Such a mapping can be used to release underlying devices while still
      holding requests in its queue until working paths come back.
      
      However, once the multipath device is created over blk-mq devices,
      it rejects reloading of 0 path mapping:
      
        # echo '0 2097152 multipath 0 0 1 1 queue-length 0 1 1 /dev/sda 1' \
            | dmsetup create mpath1
        # echo '0 2097152 multipath 0 0 0 0' | dmsetup load mpath1
        device-mapper: reload ioctl on mpath1 failed: Invalid argument
        Command failed
      
      With following kernel message:
        device-mapper: ioctl: can't change device type after initial table load.
      
      DM tries to inherit the current table type using dm_table_set_type()
      but it doesn't work as expected because of unnecessary check about
      whether the target type is hybrid or not.
      
      Hybrid type is for targets that work as either request-based or bio-based
      and not required for blk-mq or non blk-mq checking.
      
      Fixes: 65803c20 ("dm table: train hybrid target type detection to select blk-mq if appropriate")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      15b94a69
  12. 22 5月, 2015 1 次提交
    • C
      block, dm: don't copy bios for request clones · 5f1b670d
      Christoph Hellwig 提交于
      Currently dm-multipath has to clone the bios for every request sent
      to the lower devices, which wastes cpu cycles and ties down memory.
      
      This patch instead adds a new REQ_CLONE flag that instructs req_bio_endio
      to not complete bios attached to a request, which we set on clone
      requests similar to bios in a flush sequence.  With this change I/O
      errors on a path failure only get propagated to dm-multipath, which
      can then either resubmit the I/O or complete the bios on the original
      request.
      
      I've done some basic testing of this on a Linux target with ALUA support,
      and it survives path failures during I/O nicely.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5f1b670d
  13. 16 4月, 2015 4 次提交
    • J
      dm table: use bool function return values of true/false not 1/0 · 7f61f5a0
      Joe Perches 提交于
      Use the normal return values for bool functions.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      7f61f5a0
    • D
      dm table: fall back to getting device using name_to_dev_t() · 644bda6f
      Dan Ehrenberg 提交于
      If a device is used as the root filesystem, it can't be built
      off of devices which are within the root filesystem (just like
      command line arguments to root=).  For this reason, Linux has a
      pseudo-filesystem for root= and MD initialization (based on the
      function name_to_dev_t) which handles different ways of specifying
      devices including PARTUUID and major:minor.
      
      Switch to using name_to_dev_t() in dm_get_device().  Rather than
      having DM assume that all things which are not major:minor are paths in
      an already-mounted filesystem, change dm_get_device() to first attempt
      to look up the device in the filesystem, and if not found it will fall
      back to using name_to_dev_t().
      
      In terms of backwards compatibility, there are some cases where
      behavior will be different:
      - If you have a file in the current working directory named 1:2 and
        you initialze DM there, then it will try to use that file rather
        than the disk with that major:minor pair as a backing device.
      - Similarly for other bdev types which name_to_dev_t() knows how to
        interpret, the previous behavior was to repeatedly check for the
        existence of the file (e.g., while waiting for rootfs to come up)
        but the new behavior is to use the name_to_dev_t() interpretation.
        For example, if you have a file named /dev/ubiblock0_0 which is
        a symlink to /dev/sda3, but it is not yet present when DM starts
        to initialize, then the name_to_dev_t() interpretation will take
        precedence.
      
      These incompatibilities would only show up in really strange setups
      with bad practices so we shouldn't have to worry about them.
      Signed-off-by: NDan Ehrenberg <dehrenberg@chromium.org>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      644bda6f
    • M
      dm: add 'use_blk_mq' module param and expose in per-device ro sysfs attr · 17e149b8
      Mike Snitzer 提交于
      Request-based DM's blk-mq support defaults to off; but a user can easily
      change the default using the dm_mod.use_blk_mq module/boot option.
      
      Also, you can check what mode a given request-based DM device is using
      with: cat /sys/block/dm-X/dm/use_blk_mq
      
      This change enabled further cleanup and reduced work (e.g. the
      md->io_pool and md->rq_pool isn't created if using blk-mq).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      17e149b8
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  14. 01 4月, 2015 1 次提交
    • M
      dm: remove request-based DM queue's lld_busy_fn hook · d56b9b28
      Mike Snitzer 提交于
      DM multipath is the only caller of blk_lld_busy() -- which calls a
      queue's lld_busy_fn hook.  Request-based DM doesn't support stacking
      multipath devices so there is no reason to register the lld_busy_fn hook
      on a multipath device's queue using blk_queue_lld_busy().
      
      As such, remove functions dm_lld_busy and dm_table_any_busy_target.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      d56b9b28
  15. 11 2月, 2015 1 次提交
  16. 10 2月, 2015 2 次提交
    • M
      dm table: train hybrid target type detection to select blk-mq if appropriate · 65803c20
      Mike Snitzer 提交于
      Otherwise replacing the multipath target with the error target fails:
        device-mapper: ioctl: can't change device type after initial table load.
      
      The error target was mistakenly considered to be target type
      DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the
      target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      65803c20
    • M
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5863d9a
  17. 20 11月, 2014 1 次提交
  18. 06 10月, 2014 1 次提交
    • B
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski 提交于
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      86f1152b
  19. 11 8月, 2014 1 次提交
    • J
      dm table: propagate QUEUE_FLAG_NO_SG_MERGE · 200612ec
      Jeff Moyer 提交于
      Commit 05f1dd53 ("block: add queue flag for disabling SG merging")
      introduced a new queue flag: QUEUE_FLAG_NO_SG_MERGE.  This gets set by
      default in blk_mq_init_queue for mq-enabled devices.  The effect of
      the flag is to bypass the SG segment merging.  Instead, the
      bio->bi_vcnt is used as the number of hardware segments.
      
      With a device mapper target on top of a device with
      QUEUE_FLAG_NO_SG_MERGE set, we can end up sending down more segments
      than a driver is prepared to handle.  I ran into this when backporting
      the virtio_blk mq support.  It triggerred this BUG_ON, in
      virtio_queue_rq:
      
              BUG_ON(req->nr_phys_segments + 2 > vblk->sg_elems);
      
      The queue's max is set here:
              blk_queue_max_segments(q, vblk->sg_elems-2);
      
      Basically, what happens is that a bio is built up for the dm device
      (which does not have the QUEUE_FLAG_NO_SG_MERGE flag set) using
      bio_add_page.  That path will call into __blk_recalc_rq_segments, so
      what you end up with is bi_phys_segments being much smaller than bi_vcnt
      (and bi_vcnt grows beyond the maximum sg elements).  Then, when the bio
      is submitted, it gets cloned.  When the cloned bio is submitted, it will
      end up in blk_recount_segments, here:
      
              if (test_bit(QUEUE_FLAG_NO_SG_MERGE, &q->queue_flags))
                      bio->bi_phys_segments = bio->bi_vcnt;
      
      and now we've set bio->bi_phys_segments to a number that is beyond what
      was registered as queue_max_segments by the driver.
      
      The right way to fix this is to propagate the queue flag up the stack.
      
      The rules for propagating the flag are simple:
      - if the flag is set for any underlying device, it must be set for the
        upper device
      - consequently, if the flag is not set for any underlying device, it
        should not be set for the upper device.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.16+
      200612ec
  20. 02 8月, 2014 1 次提交
  21. 04 6月, 2014 1 次提交
  22. 28 3月, 2014 2 次提交
  23. 07 1月, 2014 1 次提交
    • M
      dm table: remove unused buggy code that extends the targets array · 57a2f238
      Mikulas Patocka 提交于
      A device mapper table is allocated in the following way:
      * The function dm_table_create is called, it gets the number of targets
        as an argument -- it allocates a targets array accordingly.
      * For each target, we call dm_table_add_target.
      
      If we add more targets than were specified in dm_table_create, the
      function dm_table_add_target reallocates the targets array.  However,
      this reallocation code is wrong - it moves the targets array to a new
      location, while some target constructors hold pointers to the array in
      the old location.
      
      The following DM target drivers save the pointer to the target
      structure, so they corrupt memory if the target array is moved:
      multipath, raid, mirror, snapshot, stripe, switch, thin, verity.
      
      Under normal circumstances, the reallocation function is not called
      (because dm_table_create is called with the correct number of targets),
      so the buggy reallocation code is not used.
      
      Prior to the fix "dm table: fail dm_table_create on dm_round_up
      overflow", the reallocation code could only be used in case the user
      specifies too large a value in param->target_count, such as 0xffffffff.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      57a2f238
  24. 11 12月, 2013 1 次提交
  25. 10 11月, 2013 1 次提交
  26. 01 11月, 2013 1 次提交
  27. 06 9月, 2013 2 次提交
  28. 11 7月, 2013 1 次提交
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0