1. 24 3月, 2015 1 次提交
  2. 28 2月, 2015 3 次提交
    • M
      dm snapshot: suspend merging snapshot when doing exception handover · 09ee96b2
      Mikulas Patocka 提交于
      The "dm snapshot: suspend origin when doing exception handover" commit
      fixed a exception store handover bug associated with pending exceptions
      to the "snapshot-origin" target.
      
      However, a similar problem exists in snapshot merging.  When snapshot
      merging is in progress, we use the target "snapshot-merge" instead of
      "snapshot-origin".  Consequently, during exception store handover, we
      must find the snapshot-merge target and suspend its associated
      mapped_device.
      
      To avoid lockdep warnings, the target must be suspended and resumed
      without holding _origins_lock.
      
      Introduce a dm_hold() function that grabs a reference on a
      mapped_device, but unlike dm_get(), it doesn't crash if the device has
      the DMF_FREEING flag set, it returns an error in this case.
      
      In snapshot_resume() we grab the reference to the origin device using
      dm_hold() while holding _origins_lock (_origins_lock guarantees that the
      device won't disappear).  Then we release _origins_lock, suspend the
      device and grab _origins_lock again.
      
      NOTE to stable@ people:
      When backporting to kernels 3.18 and older, use dm_internal_suspend and
      dm_internal_resume instead of dm_internal_suspend_fast and
      dm_internal_resume_fast.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      09ee96b2
    • M
      dm snapshot: suspend origin when doing exception handover · b735fede
      Mikulas Patocka 提交于
      In the function snapshot_resume we perform exception store handover.  If
      there is another active snapshot target, the exception store is moved
      from this target to the target that is being resumed.
      
      The problem is that if there is some pending exception, it will point to
      an incorrect exception store after that handover, causing a crash due to
      dm-snap-persistent.c:get_exception()'s BUG_ON.
      
      This bug can be triggered by repeatedly changing snapshot permissions
      with "lvchange -p r" and "lvchange -p rw" while there are writes on the
      associated origin device.
      
      To fix this bug, we must suspend the origin device when doing the
      exception store handover to make sure that there are no pending
      exceptions:
      - introduce _origin_hash that keeps track of dm_origin structures.
      - introduce functions __lookup_dm_origin, __insert_dm_origin and
        __remove_dm_origin that manipulate the origin hash.
      - modify snapshot_resume so that it calls dm_internal_suspend_fast() and
        dm_internal_resume_fast() on the origin device.
      
      NOTE to stable@ people:
      
      When backporting to kernels 3.12-3.18, use dm_internal_suspend and
      dm_internal_resume instead of dm_internal_suspend_fast and
      dm_internal_resume_fast.
      
      When backporting to kernels older than 3.12, you need to pick functions
      dm_internal_suspend and dm_internal_resume from the commit
      fd2ed4d2.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      b735fede
    • M
      dm: hold suspend_lock while suspending device during device deletion · ab7c7bb6
      Mikulas Patocka 提交于
      __dm_destroy() must take the suspend_lock so that its presuspend and
      postsuspend calls do not race with an internal suspend.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      ab7c7bb6
  3. 18 2月, 2015 1 次提交
    • M
      dm: fix a race condition in dm_get_md · 2bec1f4a
      Mikulas Patocka 提交于
      The function dm_get_md finds a device mapper device with a given dev_t,
      increases the reference count and returns the pointer.
      
      dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
      device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
      flag, drops _minor_lock and returns pointer to the device. dm_get_md then
      calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
      otherwise it increments the reference count.
      
      There is a possible race condition - after dm_find_md exits and before
      dm_get is called, there are no locks held, so the device may disappear or
      DMF_FREEING flag may be set, which results in BUG.
      
      To fix this bug, we need to call dm_get while we hold _minor_lock. This
      patch renames dm_find_md to dm_get_md and changes it so that it calls
      dm_get while holding the lock.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2bec1f4a
  4. 10 2月, 2015 6 次提交
    • M
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5863d9a
    • K
      dm: prepare for allocating blk-mq clone requests in target · 466d89a6
      Keith Busch 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request will be transfered from DM core to the target type.
      
      To prepare for conditionally using this new model the original
      request's 'special' now points to the dm_rq_target_io because the
      clone is allocated later in the block layer rather than in DM core.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      466d89a6
    • K
      dm: submit stacked requests in irq enabled context · 2eb6e1e3
      Keith Busch 提交于
      Switch to having request-based DM enqueue all prep'ed requests into work
      processed by another thread.  This allows request-based DM to invoke
      block APIs that assume interrupt enabled context (e.g. blk_get_request)
      and is a prerequisite for adding blk-mq support to request-based DM.
      
      The new kernel thread is only initialized for request-based DM devices.
      
      multipath_map() is now always in irq enabled context so change multipath
      spinlock (m->lock) locking to always disable interrupts.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2eb6e1e3
    • M
      dm: split request structure out from dm_rq_target_io structure · 1ae49ea2
      Mike Snitzer 提交于
      Request-based DM support for blk-mq devices requires that
      dm_rq_target_io structures not be allocated with an embedded request
      structure.  The request-based DM target (e.g. dm-multipath) must
      allocate the request from the blk-mq devices' request_queue using
      blk_get_request().
      
      The unfortunate side-effect of this change is old-style request-based DM
      support will no longer use contiguous memory for the dm_rq_target_io and
      request structures for each clone.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1ae49ea2
    • M
      dm: remove exports for request-based interfaces without external callers · dbf9782c
      Mike Snitzer 提交于
      Remove exports for dm_dispatch_request, dm_requeue_unmapped_request,
      and dm_kill_unmapped_request.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dbf9782c
    • M
      dm: fix multipath regression due to initializing wrong request · db507b3f
      Mike Snitzer 提交于
      Commit febf7158 ("block: require blk_rq_prep_clone() be given an
      initialized clone request") introduced a regression by calling
      blk_rq_init() on the original request rather than the clone
      request that is passed to setup_clone().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Fixes: febf7158 ("block: require blk_rq_prep_clone() be given an initialized clone request")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      db507b3f
  5. 29 1月, 2015 1 次提交
  6. 25 1月, 2015 1 次提交
    • M
      dm: fix handling of multiple internal suspends · 96b26c8c
      Mikulas Patocka 提交于
      Commit ffcc3936 ("dm: enhance internal suspend and resume interface")
      attempted to handle multiple internal suspends on the same device, but
      it did that incorrectly.  When these functions are called in this order
      on the same device the device is no longer suspended, but it should be:
      	dm_internal_suspend_noflush
      	dm_internal_suspend_noflush
      	dm_internal_resume
      
      Fix this bug by maintaining an 'internal_suspend_count' and resuming
      the device when this count drops to zero.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      96b26c8c
  7. 18 12月, 2014 1 次提交
  8. 24 11月, 2014 2 次提交
  9. 20 11月, 2014 3 次提交
    • M
      dm: enhance internal suspend and resume interface · ffcc3936
      Mike Snitzer 提交于
      Rename dm_internal_{suspend,resume} to dm_internal_{suspend,resume}_fast
      -- dm-stats will continue using these methods to avoid all the extra
      suspend/resume logic that is not needed in order to quickly flush IO.
      
      Introduce dm_internal_suspend_noflush() variant that actually calls the
      mapped_device's target callbacks -- otherwise target-specific hooks are
      avoided (e.g. dm-thin's thin_presuspend and thin_postsuspend).  Common
      code between dm_internal_{suspend_noflush,resume} and
      dm_{suspend,resume} was factored out as __dm_{suspend,resume}.
      
      Update dm_internal_{suspend_noflush,resume} to always take and release
      the mapped_device's suspend_lock.  Also update dm_{suspend,resume} to be
      aware of potential for DM_INTERNAL_SUSPEND_FLAG to be set and respond
      accordingly by interruptibly waiting for the DM_INTERNAL_SUSPEND_FLAG to
      be cleared.  Add lockdep annotation to dm_suspend() and dm_resume().
      
      The existing DM_SUSPEND_FLAG remains unchanged.
      DM_INTERNAL_SUSPEND_FLAG is set by dm_internal_suspend_noflush() and
      cleared by dm_internal_resume().
      
      Both DM_SUSPEND_FLAG and DM_INTERNAL_SUSPEND_FLAG may be set if a device
      was already suspended when dm_internal_suspend_noflush() was called --
      this can be thought of as a "nested suspend".  A "nested suspend" can
      occur with legacy userspace dm-thin code that might suspend all active
      thin volumes before suspending the pool for resize.
      
      But otherwise, in the normal dm-thin-pool suspend case moving forward:
      the thin-pool will have DM_SUSPEND_FLAG set and all active thins from
      that thin-pool will have DM_INTERNAL_SUSPEND_FLAG set.
      
      Also add DM_INTERNAL_SUSPEND_FLAG to status report.  This new
      DM_INTERNAL_SUSPEND_FLAG state is being reported to assist with
      debugging (e.g. 'dmsetup info' will report an internally suspended
      device accordingly).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      ffcc3936
    • M
      dm: add presuspend_undo hook to target_type · d67ee213
      Mike Snitzer 提交于
      The DM thin-pool target now must undo the changes performed during
      pool_presuspend() so introduce presuspend_undo hook in target_type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <ejt@redhat.com>
      d67ee213
    • M
      dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl · 4d341d82
      Mike Snitzer 提交于
      No point checking if the device is suspended if the current target
      doesn't even implement .ioctl
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4d341d82
  10. 11 11月, 2014 4 次提交
  11. 06 10月, 2014 3 次提交
    • B
      dm: allow active and inactive tables to share dm_devs · 86f1152b
      Benjamin Marzinski 提交于
      Until this change, when loading a new DM table, DM core would re-open
      all of the devices in the DM table.  Now, DM core will avoid redundant
      device opens (and closes when destroying the old table) if the old
      table already has a device open using the same mode.  This is achieved
      by managing reference counts on the table_devices that DM core now
      stores in the mapped_device structure (rather than in the dm_table
      structure).  So a mapped_device's active and inactive dm_tables' dm_dev
      lists now just point to the dm_devs stored in the mapped_device's
      table_devices list.
      
      This improvement in DM core's device reference counting has the
      side-effect of fixing a long-standing limitation of the multipath
      target: a DM multipath table couldn't include any paths that were unusable
      (failed).  For example: if all paths have failed and you add a new,
      working, path to the table; you can't use it since the table load would
      fail due to it still containing failed paths.  Now a re-load of a
      multipath table can include failed devices and when those devices become
      active again they can be used instantly.
      
      The device list code in dm.c isn't a straight copy/paste from the code in
      dm-table.c, but it's very close (aside from some variable renames).  One
      subtle difference is that find_table_device for the tables_devices list
      will only match devices with the same name and mode.  This is because we
      don't want to upgrade a device's mode in the active table when an
      inactive table is loaded.
      
      Access to the mapped_device structure's tables_devices list requires a
      mutex (tables_devices_lock), so that tables cannot be created and
      destroyed concurrently.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      86f1152b
    • J
      dm: use bioset_create_nobvec() · 3d8aab2d
      Junichi Nomura 提交于
      Since DM core uses bio_clone_fast() for both bio-based and request-based
      DM devices there is no need for DM's bioset to have a bvec mempool.
      
      With this patch, on arch with 4KB page for example, memory usage will be
      reduced by 64KB for each bio-based DM device and 1MB for each
      request-based DM device.
      
      For example, when you create 10,000 bio-based DM devices and 1,000
      request-based DM devices, memory usage of biovec under no load is:
        # grep biovec /proc/slabinfo
      
        biovec-256        418068 418068   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      With this patch series applied, the usage becomes:
        # grep biovec /proc/slabinfo
      
        biovec-256           116    116   4096  ...
        biovec-128             0      0   2048  ...
        biovec-64              0      0   1024  ...
        biovec-16              0      0    256  ...
      
      So 4096 * (418068 - 116) = 1.6GB of memory is saved in this example.
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3d8aab2d
    • J
      dm: remove nr_iovecs parameter from alloc_tio() · 99778273
      Junichi Nomura 提交于
      alloc_tio() uses bio_alloc_bioset() to allocate a clone-bio for a bio.
      alloc_tio() takes the number of bvecs to allocate for the clone-bio.
      However, with v3.14's immutable biovec changes DM now uses
      __bio_clone_fast() and no longer needs to allocate bvecs.
      
      In practice, the 'nr_iovecs' passed to alloc_tio() is always effectively
      0.  __clone_and_map_simple_bio() looked like it was passing non-zero
      nr_iovecs, but its value was always within the range of inline bvecs and
      no allocation actually happened.  If allocation happened, the BUG_ON() in
      __bio_clone_fast() would've triggered.
      
      Remove the nr_iovecs parameter from alloc_tio() to prevent possible
      future bio_alloc_bioset() mis-use of a new bioset interface that will no
      longer allow bvecs to be allocated.
      
      Also fix extra whitespace before the __bio_clone_fast() call in
      __clone_and_map_simple_bio().
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      99778273
  12. 11 7月, 2014 1 次提交
    • M
      dm: allocate a special workqueue for deferred device removal · acfe0ad7
      Mikulas Patocka 提交于
      The commit 2c140a24 ("dm: allow remove to be deferred") introduced a
      deferred removal feature for the device mapper.  When this feature is
      used (by passing a flag DM_DEFERRED_REMOVE to DM_DEV_REMOVE_CMD ioctl)
      and the user tries to remove a device that is currently in use, the
      device will be removed automatically in the future when the last user
      closes it.
      
      Device mapper used the system workqueue to perform deferred removals.
      However, some targets (dm-raid1, dm-mpath, dm-stripe) flush work items
      scheduled for the system workqueue from their destructor.  If the
      destructor itself is called from the system workqueue during deferred
      removal, it introduces a possible deadlock - the workqueue tries to flush
      itself.
      
      Fix this possible deadlock by introducing a new workqueue for deferred
      removals.  We allocate just one workqueue for all dm targets.  The
      ability of dm targets to process IOs isn't dependent on deferred removal
      of unused targets, so a deadlock due to shared workqueue isn't possible.
      
      Also, cleanup local_init() to eliminate potential for returning success
      on failure.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.13+
      acfe0ad7
  13. 04 6月, 2014 4 次提交
  14. 18 4月, 2014 1 次提交
  15. 16 4月, 2014 1 次提交
    • J
      block: remove struct request buffer member · b4f42e28
      Jens Axboe 提交于
      This was used in the olden days, back when onions were proper
      yellow. Basically it mapped to the current buffer to be
      transferred. With highmem being added more than a decade ago,
      most drivers map pages out of a bio, and rq->buffer isn't
      pointing at anything valid.
      
      Convert old style drivers to just use bio_data().
      
      For the discard payload use case, just reference the page
      in the bio.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b4f42e28
  16. 28 3月, 2014 4 次提交
  17. 15 1月, 2014 1 次提交
    • M
      dm sysfs: fix a module unload race · 2995fa78
      Mikulas Patocka 提交于
      This reverts commit be35f486 ("dm: wait until embedded kobject is
      released before destroying a device") and provides an improved fix.
      
      The kobject release code that calls the completion must be placed in a
      non-module file, otherwise there is a module unload race (if the process
      calling dm_kobject_release is preempted and the DM module unloaded after
      the completion is triggered, but before dm_kobject_release returns).
      
      To fix this race, this patch moves the completion code to dm-builtin.c
      which is always compiled directly into the kernel if BLK_DEV_DM is
      selected.
      
      The patch introduces a new dm_kobject_holder structure, its purpose is
      to keep the completion and kobject in one place, so that it can be
      accessed from non-module code without the need to export the layout of
      struct mapped_device to that code.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2995fa78
  18. 08 1月, 2014 2 次提交
    • M
      dm: wait until embedded kobject is released before destroying a device · be35f486
      Mikulas Patocka 提交于
      There may be other parts of the kernel holding a reference on the dm
      kobject.  We must wait until all references are dropped before
      deallocating the mapped_device structure.
      
      The dm_kobject_release method signals that all references are dropped
      via completion.  But dm_kobject_release doesn't free the kobject (which
      is embedded in the mapped_device structure).
      
      This is the sequence of operations:
      * when destroying a DM device, call kobject_put from dm_sysfs_exit
      * wait until all users stop using the kobject, when it happens the
        release method is called
      * the release method signals the completion and should return without
        delay
      * the dm device removal code that waits on the completion continues
      * the dm device removal code drops the dm_mod reference the device had
      * the dm device removal code frees the mapped_device structure that
        contains the kobject
      
      Using kobject this way should avoid the module unload race that was
      mentioned at the beginning of this thread:
      https://lkml.org/lkml/2014/1/4/83Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      be35f486
    • M
      dm: remove pointless kobject comparison in dm_get_from_kobject · 1ddd641d
      Mikulas Patocka 提交于
      The comparison is always true and the compiler optimizes it out anyway.
      
      Milan offered additional context relative to the original commit
      784aae73 ("dm: add name and uuid to sysfs") which introduced the code:
      "I think it is just relict of some experiments before I committed this
      simple embedded sysfs kobj handling".
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Acked-by: NMilan Broz <gmazyland@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      1ddd641d