1. 01 11月, 2011 4 次提交
  2. 02 8月, 2011 4 次提交
    • M
      dm table: set flush capability based on underlying devices · ed8b752b
      Mike Snitzer 提交于
      DM has always advertised both REQ_FLUSH and REQ_FUA flush capabilities
      regardless of whether or not a given DM device's underlying devices
      also advertised a need for them.
      
      Block's flush-merge changes from 2.6.39 have proven to be more costly
      for DM devices.  Performance regressions have been reported even when
      DM's underlying devices do not advertise that they have a write cache.
      
      Fix the performance regressions by configuring a DM device's flushing
      capabilities based on those of the underlying devices' capabilities.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ed8b752b
    • M
      dm: ignore merge_bvec for snapshots when safe · d5b9dd04
      Mikulas Patocka 提交于
      Add a new flag DMF_MERGE_IS_OPTIONAL to struct mapped_device to indicate
      whether the device can accept bios larger than the size its merge
      function returns.  When set, use this to send large bios to snapshots
      which can split them if necessary.  Snapshot I/O may be significantly
      fragmented and this approach seems to improve peformance.
      
      Before the patch, dm_set_device_limits restricted bio size to page size
      if the underlying device had a merge function and the target didn't
      provide a merge function.  After the patch, dm_set_device_limits
      restricts bio size to page size if the underlying device has a merge
      function, doesn't have DMF_MERGE_IS_OPTIONAL flag and the target doesn't
      provide a merge function.
      
      The snapshot target can't provide a merge function because when the merge
      function is called, it is impossible to determine where the bio will be
      remapped.  Previously this led us to impose a 4k limit, which we can
      now remove if the snapshot store is located on a device without a merge
      function.  Together with another patch for optimizing full chunk writes,
      it improves performance from 29MB/s to 40MB/s when writing to the
      filesystem on snapshot store.
      
      If the snapshot store is placed on a non-dm device with a merge function
      (such as md-raid), device mapper still limits all bios to page size.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5b9dd04
    • M
      dm table: fix discard support · 936688d7
      Mike Snitzer 提交于
      Remove 'discards_supported' from the dm_table structure.  The same
      information can be easily discovered from the table's target(s) in
      dm_table_supports_discards().
      
      Before this fix dm_table_supports_discards() would skip checking the
      individual targets' 'discards_supported' flag if any one target in the
      table didn't set num_discard_requests > 0.  Now the per-target
      'discards_supported' flag is effective at insuring the final DM device
      advertises discard support.  But, to be clear, targets that don't
      support discards (!num_discard_requests) will not receive discard
      requests.
      
      Also DMWARN if a target sets 'discards_supported' override but forgets
      to set 'num_discard_requests'.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      936688d7
    • A
      dm: fix idr leak on module removal · d15b774c
      Alasdair G Kergon 提交于
      Destroy _minor_idr when unloading the core dm module.  (Found by kmemleak.)
      
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d15b774c
  3. 22 3月, 2011 1 次提交
  4. 17 3月, 2011 1 次提交
  5. 10 3月, 2011 1 次提交
  6. 14 1月, 2011 5 次提交
  7. 07 1月, 2011 1 次提交
  8. 16 11月, 2010 1 次提交
  9. 05 10月, 2010 1 次提交
    • A
      block: autoconvert trivial BKL users to private mutex · 2a48fc0a
      Arnd Bergmann 提交于
      The block device drivers have all gained new lock_kernel
      calls from a recent pushdown, and some of the drivers
      were already using the BKL before.
      
      This turns the BKL into a set of per-driver mutexes.
      Still need to check whether this is safe to do.
      
      file=$1
      name=$2
      if grep -q lock_kernel ${file} ; then
          if grep -q 'include.*linux.mutex.h' ${file} ; then
                  sed -i '/include.*<linux\/smp_lock.h>/d' ${file}
          else
                  sed -i 's/include.*<linux\/smp_lock.h>.*$/include <linux\/mutex.h>/g' ${file}
          fi
          sed -i ${file} \
              -e "/^#include.*linux.mutex.h/,$ {
                      1,/^\(static\|int\|long\)/ {
                           /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);
      
      } }"  \
          -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
          -e '/[      ]*cycle_kernel_lock();/d'
      else
          sed -i -e '/include.*\<smp_lock.h\>/d' ${file}  \
                      -e '/cycle_kernel_lock()/d'
      fi
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      2a48fc0a
  10. 10 9月, 2010 6 次提交
    • M
      dm: convey that all flushes are processed as empty · b372d360
      Mike Snitzer 提交于
      Rename __clone_and_map_flush to __clone_and_map_empty_flush for added
      clarity.
      
      Simplify logic associated with REQ_FLUSH conditionals.
      
      Introduce a BUG_ON() and add a few more helpful comments to the code
      so that it is clear that all flushes are empty.
      
      Cleanup __split_and_process_bio() so that an empty flush isn't processed
      by a 'sector_count' focused while loop.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b372d360
    • K
      dm: fix locking context in queue_io() · 05447420
      Kiyoshi Ueda 提交于
      Now queue_io() is called from dec_pending(), which may be called with
      interrupts disabled, so queue_io() must not enable interrupts
      unconditionally and must save/restore the current interrupts status.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      05447420
    • T
      dm: relax ordering of bio-based flush implementation · 6a8736d1
      Tejun Heo 提交于
      Unlike REQ_HARDBARRIER, REQ_FLUSH/FUA doesn't mandate any ordering
      against other bio's.  This patch relaxes ordering around flushes.
      
      * A flush bio is no longer deferred to workqueue directly.  It's
        processed like other bio's but __split_and_process_bio() uses
        md->flush_bio as the clone source.  md->flush_bio is initialized to
        empty flush during md initialization and shared for all flushes.
      
      * As a flush bio now travels through the same execution path as other
        bio's, there's no need for dedicated error handling path either.  It
        can use the same error handling path in dec_pending().  Dedicated
        error handling removed along with md->flush_error.
      
      * When dec_pending() detects that a flush has completed, it checks
        whether the original bio has data.  If so, the bio is queued to the
        deferred list w/ REQ_FLUSH cleared; otherwise, it's completed.
      
      * As flush sequencing is handled in the usual issue/completion path,
        dm_wq_work() no longer needs to handle flushes differently.  Now its
        only responsibility is re-issuing deferred bio's the same way as
        _dm_request() would.  REQ_FLUSH handling logic including
        process_flush() is dropped.
      
      * There's no reason for queue_io() and dm_wq_work() write lock
        dm->io_lock.  queue_io() now only uses md->deferred_lock and
        dm_wq_work() read locks dm->io_lock.
      
      * bio's no longer need to be queued on the deferred list while a flush
        is in progress making DMF_QUEUE_IO_TO_THREAD unncessary.  Drop it.
      
      This avoids stalling the device during flushes and simplifies the
      implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6a8736d1
    • T
      dm: implement REQ_FLUSH/FUA support for request-based dm · 29e4013d
      Tejun Heo 提交于
      This patch converts request-based dm to support the new REQ_FLUSH/FUA.
      
      The original request-based flush implementation depended on
      request_queue blocking other requests while a barrier sequence is in
      progress, which is no longer true for the new REQ_FLUSH/FUA.
      
      In general, request-based dm doesn't have infrastructure for cloning
      one source request to multiple targets, but the original flush
      implementation had a special mostly independent path which can issue
      flushes to multiple targets and sequence them.  However, the
      capability isn't currently in use and adds a lot of complexity.
      Moreoever, it's unlikely to be useful in its current form as it
      doesn't make sense to be able to send out flushes to multiple targets
      when write requests can't be.
      
      This patch rips out special flush code path and deals handles
      REQ_FLUSH/FUA requests the same way as other requests.  The only
      special treatment is that REQ_FLUSH requests use the block address 0
      when finding target, which is enough for now.
      
      * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
        suggested by Mike Snitzer
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      29e4013d
    • T
      dm: implement REQ_FLUSH/FUA support for bio-based dm · d87f4c14
      Tejun Heo 提交于
      This patch converts bio-based dm to support REQ_FLUSH/FUA instead of
      now deprecated REQ_HARDBARRIER.
      
      * -EOPNOTSUPP handling logic dropped.
      
      * Preflush is handled as before but postflush is dropped and replaced
        with passing down REQ_FUA to member request_queues.  This replaces
        one array wide cache flush w/ member specific FUA writes.
      
      * __split_and_process_bio() now calls __clone_and_map_flush() directly
        for flushes and guarantees all FLUSH bio's going to targets are zero
      `  length.
      
      * It's now guaranteed that all FLUSH bio's which are passed onto dm
        targets are zero length.  bio_empty_barrier() tests are replaced
        with REQ_FLUSH tests.
      
      * Empty WRITE_BARRIERs are replaced with WRITE_FLUSHes.
      
      * Dropped unlikely() around REQ_FLUSH tests.  Flushes are not unlikely
        enough to be marked with unlikely().
      
      * Block layer now filters out REQ_FLUSH/FUA bio's if the request_queue
        doesn't support cache flushing.  Advertise REQ_FLUSH | REQ_FUA
        capability.
      
      * Request based dm isn't converted yet.  dm_init_request_based_queue()
        resets flush support to 0 for now.  To avoid disturbing request
        based dm code, dm->flush_error is added for bio based dm while
        requested based dm continues to use dm->barrier_error.
      
      Lightly tested linear, stripe, raid1, snap and crypt targets.  Please
      proceed with caution as I'm not familiar with the code base.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: dm-devel@redhat.com
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d87f4c14
    • T
      block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() · 4913efe4
      Tejun Heo 提交于
      Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
      requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
      -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
      blk_queue_flush().
      
      blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
      device has write cache and can flush it, it should set REQ_FLUSH.  If
      the device can handle FUA writes, it should also set REQ_FUA.
      
      All blk_queue_ordered() users are converted.
      
      * ORDERED_DRAIN is mapped to 0 which is the default value.
      * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
      * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4913efe4
  11. 12 8月, 2010 10 次提交
    • M
      dm: split discard requests on target boundaries · a79245b3
      Mike Snitzer 提交于
      Update __clone_and_map_discard to loop across all targets in a DM
      device's table when it processes a discard bio.  If a discard crosses a
      target boundary it must be split accordingly.
      
      Update __issue_target_requests and __issue_target_request to allow a
      cloned discard bio to have a custom start sector and size.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a79245b3
    • M
      dm: factor out max_io_len_target_boundary · 56a67df7
      Mike Snitzer 提交于
      Split max_io_len_target_boundary out of max_io_len so that the discard
      support can make use of it without duplicating max_io_len code.
      
      Avoiding max_io_len's split_io logic enables DM's discard support to
      submit the entire discard request to a target.  But discards must still
      be split on target boundaries.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      56a67df7
    • M
      dm: use common __issue_target_request for flush and discard support · 06a426ce
      Mike Snitzer 提交于
      Rename __flush_target to __issue_target_request now that it is used to
      issue both flush and discard requests.
      
      Introduce __issue_target_requests as a convenient wrapper to
      __issue_target_request 'num_flush_requests' or 'num_discard_requests'
      times per target.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      06a426ce
    • M
      dm: linear support discard · 5ae89a87
      Mike Snitzer 提交于
      Allow discards to be passed through to linear mappings if at least one
      underlying device supports it.  Discards will be forwarded only to
      devices that support them.
      
      A target that supports discards should set num_discard_requests to
      indicate how many times each discard request must be submitted to it.
      
      Verify table's underlying devices support discards prior to setting the
      associated DM device as capable of discards (via QUEUE_FLAG_DISCARD).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Reviewed-by: NJoe Thornber <thornber@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5ae89a87
    • M
      dm: rename map_info flush_request to target_request_nr · 57cba5d3
      Mike Snitzer 提交于
      'target_request_nr' is a more generic name that reflects the fact that
      it will be used for both flush and discard support.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      57cba5d3
    • M
      dm: do not initialise full request queue when bio based · 4a0b4ddf
      Mike Snitzer 提交于
      Change bio-based mapped devices no longer to have a fully initialized
      request_queue (request_fn, elevator, etc).  This means bio-based DM
      devices no longer register elevator sysfs attributes ('iosched/' tree
      or 'scheduler' other than "none").
      
      In contrast, a request-based DM device will continue to have a full
      request_queue and will register elevator sysfs attributes.  Therefore
      a user can determine a DM device's type by checking if elevator sysfs
      attributes exist.
      
      First allocate a minimalist request_queue structure for a DM device
      (needed for both bio and request-based DM).
      
      Initialization of a full request_queue is deferred until it is known
      that the DM device is request-based, at the end of the table load
      sequence.
      
      Factor DM device's request_queue initialization:
      - common to both request-based and bio-based into dm_init_md_queue().
      - specific to request-based into dm_init_request_based_queue().
      
      The md->type_lock mutex is used to protect md->queue, in addition to
      md->type, during table_load().
      
      A DM device's first table_load will establish the immutable md->type.
      But md->queue initialization, based on md->type, may fail at that time
      (because blk_init_allocated_queue cannot allocate memory).  Therefore
      any subsequent table_load must (re)try dm_setup_md_queue independently of
      establishing md->type.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4a0b4ddf
    • M
      dm ioctl: make bio or request based device type immutable · a5664dad
      Mike Snitzer 提交于
      Determine whether a mapped device is bio-based or request-based when
      loading its first (inactive) table and don't allow that to be changed
      later.
      
      This patch performs different device initialisation in each of the two
      cases.  (We don't think it's necessary to add code to support changing
      between the two types.)
      
      Allowed md->type transitions:
        DM_TYPE_NONE to DM_TYPE_BIO_BASED
        DM_TYPE_NONE to DM_TYPE_REQUEST_BASED
      
      We now prevent table_load from replacing the inactive table with a
      conflicting type of table even after an explicit table_clear.
      
      Introduce 'type_lock' into the struct mapped_device to protect md->type
      and to prepare for the next patch that will change the queue
      initialization and allocate memory while md->type_lock is held.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      
       drivers/md/dm-ioctl.c    |   15 +++++++++++++++
       drivers/md/dm.c          |   37 ++++++++++++++++++++++++++++++-------
       drivers/md/dm.h          |    5 +++++
       include/linux/dm-ioctl.h |    4 ++--
       4 files changed, 52 insertions(+), 9 deletions(-)
      a5664dad
    • M
      dm: skip second flush on bio unsupported error · 708e9295
      Mikulas Patocka 提交于
      When processing barriers, skip the second flush if processing the bio
      failed with -EOPNOTSUPP.  This can happen with discard+barrier requests.
      If the device doesn't support discard, there would be two useless
      SYNCHRONIZE CACHE commands.  The first dm_flush cannot be so easily
      optimized out, so we leave it there.
      
      Previously, -EOPNOTSUPP could be received in dec_pending only with empty
      barriers and we ignored that error, assuming the device not supporting
      cache flushes has cache always consistent.  With the addition of discard
      barriers, this -EOPNOTSUPP can also be generated by discards and we
      must record it in md->barrier_error for process_barrier.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      708e9295
    • K
      dm: separate device deletion from dm_put · 3f77316d
      Kiyoshi Ueda 提交于
      This patch separates the device deletion code from dm_put()
      to make sure the deletion happens in the process context.
      
      By this patch, device deletion always occurs in an ioctl (process)
      context and dm_put() can be called in interrupt context.
      As a result, the request-based dm's bad dm_put() usage pointed out
      by Mikulas below disappears.
          http://marc.info/?l=dm-devel&m=126699981019735&w=2
      
      Without this patch, I confirmed there is a case to crash the system:
          dm_put() => dm_table_destroy() => vfree() => BUG_ON(in_interrupt())
      
      Some more backgrounds and details:
      In request-based dm, a device opener can remove a mapped_device
      while the last request is still completing, because bios in the last
      request complete first and then the device opener can close and remove
      the mapped_device before the last request completes:
        CPU0                                          CPU1
        =================================================================
        <<INTERRUPT>>
        blk_end_request_all(clone_rq)
          blk_update_request(clone_rq)
            bio_endio(clone_bio) == end_clone_bio
              blk_update_request(orig_rq)
                bio_endio(orig_bio)
                                                      <<I/O completed>>
                                                      dm_blk_close()
                                                      dev_remove()
                                                        dm_put(md)
                                                          <<Free md>>
         blk_finish_request(clone_rq)
           ....
           dm_end_request(clone_rq)
             free_rq_clone(clone_rq)
             blk_end_request_all(orig_rq)
             rq_completed(md)
      
      So request-based dm used dm_get()/dm_put() to hold md for each I/O
      until its request completion handling is fully done.
      However, the final dm_put() can call the device deletion code which
      must not be run in interrupt context and may cause kernel panic.
      
      To solve the problem, this patch moves the device deletion code,
      dm_destroy(), to predetermined places that is actually deleting
      the mapped_device in ioctl (process) context, and changes dm_put()
      just to decrement the reference count of the mapped_device.
      By this change, dm_put() can be used in any context and the symmetric
      model below is introduced:
          dm_create():  create a mapped_device
          dm_destroy(): destroy a mapped_device
          dm_get():     increment the reference count of a mapped_device
          dm_put():     decrement the reference count of a mapped_device
      
      dm_destroy() waits for all references of the mapped_device to disappear,
      then deletes the mapped_device.
      
      dm_destroy() uses active waiting with msleep(1), since deleting
      the mapped_device isn't performance-critical task.
      And since at this point, nobody opens the mapped_device and no new
      reference will be taken, the pending counts are just for racing
      completing activity and will eventually decrease to zero.
      
      For the unlikely case of the forced module unload, dm_destroy_immediate(),
      which doesn't wait and forcibly deletes the mapped_device, is also
      introduced and used in dm_hash_remove_all().  Otherwise, "rmmod -f"
      may be stuck and never return.
      And now, because the mapped_device is deleted at this point, subsequent
      accesses to the mapped_device may cause NULL pointer references.
      
      Cc: stable@kernel.org
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      3f77316d
    • K
      dm: prevent access to md being deleted · abdc568b
      Kiyoshi Ueda 提交于
      This patch prevents access to mapped_device which is being deleted.
      
      Currently, even after a mapped_device has been removed from the hash,
      it could be accessed through idr_find() using minor number.
      That could cause a race and NULL pointer reference below:
        CPU0                          CPU1
        ------------------------------------------------------------------
        dev_remove(param)
          down_write(_hash_lock)
          dm_lock_for_deletion(md)
            spin_lock(_minor_lock)
            set_bit(DMF_DELETING)
            spin_unlock(_minor_lock)
          __hash_remove(hc)
          up_write(_hash_lock)
                                      dev_status(param)
                                        md = find_device(param)
                                               down_read(_hash_lock)
                                               __find_device_hash_cell(param)
                                                 dm_get_md(param->dev)
                                                   md = dm_find_md(dev)
                                                          spin_lock(_minor_lock)
                                                          md = idr_find(MINOR(dev))
                                                          spin_unlock(_minor_lock)
          dm_put(md)
            free_dev(md)
                                                   dm_get(md)
                                               up_read(_hash_lock)
                                        __dev_status(md, param)
                                        dm_put(md)
      
      This patch fixes such problems.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: stable@kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      abdc568b
  12. 08 8月, 2010 5 次提交