1. 22 6月, 2009 5 次提交
    • K
      dm: do not set QUEUE_ORDERED_DRAIN if request based · 5d67aa23
      Kiyoshi Ueda 提交于
      Request-based dm doesn't have barrier support yet.
      So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
      Since the device type is decided at the first table loading time,
      the flag set is deferred until then.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5d67aa23
    • K
      dm: enable request based option · e6ee8c0b
      Kiyoshi Ueda 提交于
      This patch enables request-based dm.
      
      o Request-based dm and bio-based dm coexist, since there are
        some target drivers which are more fitting to bio-based dm.
        Also, there are other bio-based devices in the kernel
        (e.g. md, loop).
        Since bio-based device can't receive struct request,
        there are some limitations on device stacking between
        bio-based and request-based.
      
                           type of underlying device
                         bio-based      request-based
         ----------------------------------------------
          bio-based         OK                OK
          request-based     --                OK
      
        The device type is recognized by the queue flag in the kernel,
        so dm follows that.
      
      o The type of a dm device is decided at the first table binding time.
        Once the type of a dm device is decided, the type can't be changed.
      
      o Mempool allocations are deferred to at the table loading time, since
        mempools for request-based dm are different from those for bio-based
        dm and needed mempool type is fixed by the type of table.
      
      o Currently, request-based dm supports only tables that have a single
        target.  To support multiple targets, we need to support request
        splitting or prevent bio/request from spanning multiple targets.
        The former needs lots of changes in the block layer, and the latter
        needs that all target drivers support merge() function.
        Both will take a time.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      e6ee8c0b
    • K
      dm: prepare for request based option · cec47e3d
      Kiyoshi Ueda 提交于
      This patch adds core functions for request-based dm.
      
      When struct mapped device (md) is initialized, md->queue has
      an I/O scheduler and the following functions are used for
      request-based dm as the queue functions:
          make_request_fn: dm_make_request()
          pref_fn:         dm_prep_fn()
          request_fn:      dm_request_fn()
          softirq_done_fn: dm_softirq_done()
          lld_busy_fn:     dm_lld_busy()
      Actual initializations are done in another patch (PATCH 2).
      
      Below is a brief summary of how request-based dm behaves, including:
        - making request from bio
        - cloning, mapping and dispatching request
        - completing request and bio
        - suspending md
        - resuming md
      
        bio to request
        ==============
        md->queue->make_request_fn() (dm_make_request()) calls __make_request()
        for a bio submitted to the md.
        Then, the bio is kept in the queue as a new request or merged into
        another request in the queue if possible.
      
        Cloning and Mapping
        ===================
        Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
        when requests are dispatched after they are sorted by the I/O scheduler.
      
        dm_request_fn() checks busy state of underlying devices using
        target's busy() function and stops dispatching requests to keep them
        on the dm device's queue if busy.
        It helps better I/O merging, since no merge is done for a request
        once it is dispatched to underlying devices.
      
        Actual cloning and mapping are done in dm_prep_fn() and map_request()
        called from dm_request_fn().
        dm_prep_fn() clones not only request but also bios of the request
        so that dm can hold bio completion in error cases and prevent
        the bio submitter from noticing the error.
        (See the "Completion" section below for details.)
      
        After the cloning, the clone is mapped by target's map_rq() function
          and inserted to underlying device's queue using
          blk_insert_cloned_request().
      
        Completion
        ==========
        Request completion can be hooked by rq->end_io(), but then, all bios
        in the request will have been completed even error cases, and the bio
        submitter will have noticed the error.
        To prevent the bio completion in error cases, request-based dm clones
        both bio and request and hooks both bio->bi_end_io() and rq->end_io():
            bio->bi_end_io(): end_clone_bio()
            rq->end_io():     end_clone_request()
      
        Summary of the request completion flow is below:
        blk_end_request() for a clone request
          => blk_update_request()
             => bio->bi_end_io() == end_clone_bio() for each clone bio
                => Free the clone bio
                => Success: Complete the original bio (blk_update_request())
                   Error:   Don't complete the original bio
          => blk_finish_request()
             => rq->end_io() == end_clone_request()
                => blk_complete_request()
                   => dm_softirq_done()
                      => Free the clone request
                      => Success: Complete the original request (blk_end_request())
                         Error:   Requeue the original request
      
        end_clone_bio() completes the original request on the size of
        the original bio in successful cases.
        Even if all bios in the original request are completed by that
        completion, the original request must not be completed yet to keep
        the ordering of request completion for the stacking.
        So end_clone_bio() uses blk_update_request() instead of
        blk_end_request().
        In error cases, end_clone_bio() doesn't complete the original bio.
        It just frees the cloned bio and gives over the error handling to
        end_clone_request().
      
        end_clone_request(), which is called with queue lock held, completes
        the clone request and the original request in a softirq context
        (dm_softirq_done()), which has no queue lock, to avoid a deadlock
        issue on submission of another request during the completion:
            - The submitted request may be mapped to the same device
            - Request submission requires queue lock, but the queue lock
              has been held by itself and it doesn't know that
      
        The clone request has no clone bio when dm_softirq_done() is called.
        So target drivers can't resubmit it again even error cases.
        Instead, they can ask dm core for requeueing and remapping
        the original request in that cases.
      
        suspend
        =======
        Request-based dm uses stopping md->queue as suspend of the md.
        For noflush suspend, just stops md->queue.
      
        For flush suspend, inserts a marker request to the tail of md->queue.
        And dispatches all requests in md->queue until the marker comes to
        the front of md->queue.  Then, stops dispatching request and waits
        for the all dispatched requests to complete.
        After that, completes the marker request, stops md->queue and
        wake up the waiter on the suspend queue, md->wait.
      
        resume
        ======
        Starts md->queue.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cec47e3d
    • M
      dm: calculate queue limits during resume not load · 754c5fc7
      Mike Snitzer 提交于
      Currently, device-mapper maintains a separate instance of 'struct
      queue_limits' for each table of each device.  When the configuration of
      a device is to be changed, first its table is loaded and this structure
      is populated, then the device is 'resumed' and the calculated
      queue_limits are applied.
      
      This places restrictions on how userspace may process related devices,
      where it is often advantageous to 'load' tables for several devices
      at once before 'resuming' them together.  As the new queue_limits
      only take effect after the 'resume', if they are changing and one
      device uses another, the latter must be 'resumed' before the former
      may be 'loaded'.
      
      This patch moves the calculation of these queue_limits out of
      the 'load' operation into 'resume'.  Since we are no longer
      pre-calculating this struct, we no longer need to maintain copies
      within our dm structs.
      
      dm_set_device_limits() now passes the 'start' of the device's
      data area (aka pe_start) as the 'offset' to blk_stack_limits().
      
      init_valid_queue_limits() is replaced by blk_set_default_limits().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: martin.petersen@oracle.com
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      754c5fc7
    • M
      dm ioctl: support cookies for udev · 60935eb2
      Milan Broz 提交于
      Add support for passing a 32 bit "cookie" into the kernel with the
      DM_SUSPEND, DM_DEV_RENAME and DM_DEV_REMOVE ioctls.  The (unsigned)
      value of this cookie is returned to userspace alongside the uevents
      issued by these ioctls in the variable DM_COOKIE.
      
      This means the userspace process issuing these ioctls can be notified
      by udev after udev has completed any actions triggered.
      
      To minimise the interface extension, we pass the cookie into the
      kernel in the event_nr field which is otherwise unused when calling
      these ioctls.  Incrementing the version number allows userspace to
      determine in advance whether or not the kernel supports the cookie.
      If the kernel does support this but userspace does not, there should
      be no impact as the new variable will just get ignored.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      60935eb2
  2. 09 4月, 2009 1 次提交
  3. 03 4月, 2009 1 次提交
  4. 06 1月, 2009 3 次提交
    • M
      dm: add name and uuid to sysfs · 784aae73
      Milan Broz 提交于
      Implement simple read-only sysfs entry for device-mapper block device.
      
      This patch adds a simple sysfs directory named "dm" under block device
      properties and implements
      	- name attribute (string containing mapped device name)
      	- uuid attribute (string containing UUID, or empty string if not set)
      
      The kobject is embedded in mapped_device struct, so no additional
      memory allocation is needed for initializing sysfs entry.
      
      During the processing of sysfs attribute we need to lock mapped device
      which is done by a new function dm_get_from_kobj, which returns the md
      associated with kobject and increases the usage count.
      
      Each 'show attribute' function is responsible for its own locking.
      Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      784aae73
    • M
      dm table: rework reference counting · d5816876
      Mikulas Patocka 提交于
      Rework table reference counting.
      
      The existing code uses a reference counter. When the last reference is
      dropped and the counter reaches zero, the table destructor is called.
      Table reference counters are acquired/released from upcalls from other
      kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
      If the reference counter reaches zero in one of the upcalls, the table
      destructor is called from almost random kernel code.
      
      This leads to various problems:
      * dm_any_congested being called under a spinlock, which calls the
        destructor, which calls some sleeping function.
      * the destructor attempting to take a lock that is already taken by the
        same process.
      * stale reference from some other kernel code keeps the table
        constructed, which keeps some devices open, even after successful
        return from "dmsetup remove". This can confuse lvm and prevent closing
        of underlying devices or reusing device minor numbers.
      
      The patch changes reference counting so that the table destructor can be
      called only at predetermined places.
      
      The table has always exactly one reference from either mapped_device->map
      or hash_cell->new_map. After this patch, this reference is not counted
      in table->holders.  A pair of dm_create_table/dm_destroy_table functions
      is used for table creation/destruction.
      
      Temporary references from the other code increase table->holders. A pair
      of dm_table_get/dm_table_put functions is used to manipulate it.
      
      When the table is about to be destroyed, we wait for table->holders to
      reach 0. Then, we call the table destructor.  We use active waiting with
      msleep(1), because the situation happens rarely (to one user in 5 years)
      and removing the device isn't performance-critical task: the user doesn't
      care if it takes one tick more or not.
      
      This way, the destructor is called only at specific points
      (dm_table_destroy function) and the above problems associated with lazy
      destruction can't happen.
      
      Finally remove the temporary protection added to dm_any_congested().
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5816876
    • A
      dm: support barriers on simple devices · ab4c1424
      Andi Kleen 提交于
      Implement barrier support for single device DM devices
      
      This patch implements barrier support in DM for the common case of dm linear
      just remapping a single underlying device. In this case we can safely
      pass the barrier through because there can be no reordering between
      devices.
      
       NB. Any DM device might cease to support barriers if it gets
           reconfigured so code must continue to allow for a possible
           -EOPNOTSUPP on every barrier bio submitted.  - agk
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ab4c1424
  5. 22 10月, 2008 1 次提交
  6. 10 10月, 2008 4 次提交
  7. 21 7月, 2008 1 次提交
  8. 25 4月, 2008 3 次提交
  9. 21 12月, 2007 2 次提交
    • A
      dm: trigger change uevent on rename · 69267a30
      Alasdair G Kergon 提交于
      Insert a missing KOBJ_CHANGE notification when a device is renamed.
      
      Cc: Scott James Remnant <scott@ubuntu.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      69267a30
    • J
      dm: table detect io beyond device · 512875bd
      Jun'ichi Nomura 提交于
      This patch fixes a panic on shrinking a DM device if there is
      outstanding I/O to the part of the device that is being removed.
      (Normally this doesn't happen - a filesystem would be resized first,
      for example.)
      
      The bug is that __clone_and_map() assumes dm_table_find_target()
      always returns a valid pointer.  It may fail if a bio arrives from the
      block layer but its target sector is no longer included in the DM
      btree.
      
      This patch appends an empty entry to table->targets[] which will
      be returned by a lookup beyond the end of the device.
      
      After calling dm_table_find_target(), __clone_and_map() and target_message()
      check for this condition using
      dm_target_is_valid().
      
      Sample test script to trigger oops:
      512875bd
  10. 16 10月, 2007 1 次提交
  11. 13 7月, 2007 1 次提交
  12. 09 12月, 2006 4 次提交
    • K
      [PATCH] dm: suspend: add noflush pushback · 2e93ccc1
      Kiyoshi Ueda 提交于
      In device-mapper I/O is sometimes queued within targets for later processing.
      For example the multipath target can be configured to store I/O when no paths
      are available instead of returning it -EIO.
      
      This patch allows the device-mapper core to instruct a target to transfer the
      contents of any such in-target queue back into the core.  This frees up the
      resources used by the target so the core can replace that target with an
      alternative one and then resend the I/O to it.  Without this patch the only
      way to change the target in such circumstances involves returning the I/O with
      an error back to the filesystem/application.  In the multipath case, this
      patch will let us add new paths for existing I/O to try after all the existing
      paths have failed.
      
          DMF_NOFLUSH_SUSPENDING
          ----------------------
      
      If the DM_NOFLUSH_FLAG ioctl option is specified at suspend time, the
      DMF_NOFLUSH_SUSPENDING flag is set in md->flags during dm_suspend().  It
      is always cleared before dm_suspend() returns.
      
      The flag must be visible while the target is flushing pending I/Os so it
      is set before presuspend where the flush starts and unset after the wait
      for md->pending where the flush ends.
      
      Target drivers can check this flag by calling dm_noflush_suspending().
      
          DM_MAPIO_REQUEUE / DM_ENDIO_REQUEUE
          -----------------------------------
      
      A target's map() function can now return DM_MAPIO_REQUEUE to request the
      device mapper core queue the bio.
      
      Similarly, a target's end_io() function can return DM_ENDIO_REQUEUE to request
      the same.  This has been labelled 'pushback'.
      
      The __map_bio() and clone_endio() functions in the core treat these return
      values as errors and call dec_pending() to end the I/O.
      
          dec_pending
          -----------
      
      dec_pending() saves the pushback request in struct dm_io->error.  Once all
      the split clones have ended, dec_pending() will put the original bio on
      the md->pushback list.  Note that this supercedes any I/O errors.
      
      It is possible for the suspend with DM_NOFLUSH_FLAG to be aborted while
      in progress (e.g. by user interrupt).  dec_pending() checks for this and
      returns -EIO if it happened.
      
          pushdback list and pushback_lock
          --------------------------------
      
      The bio is queued on md->pushback temporarily in dec_pending(), and after
      all pending I/Os return, md->pushback is merged into md->deferred in
      dm_suspend() for re-issuing at resume time.
      
      md->pushback_lock protects md->pushback.
      The lock should be held with irq disabled because dec_pending() can be
      called from interrupt context.
      
      Queueing bios to md->pushback in dec_pending() must be done atomically
      with the check for DMF_NOFLUSH_SUSPENDING flag.  So md->pushback_lock is
      held when checking the flag.  Otherwise dec_pending() may queue a bio to
      md->pushback after the interrupted dm_suspend() flushes md->pushback.
      Then the bio would be left in md->pushback.
      
      Flag setting in dm_suspend() can be done without md->pushback_lock because
      the flag is checked only after presuspend and the set value is already
      made visible via the target's presuspend function.
      
      The flag can be checked without md->pushback_lock (e.g. the first part of
      the dec_pending() or target drivers), because the flag is checked again
      with md->pushback_lock held when the bio is really queued to md->pushback
      as described above.  So even if the flag is cleared after the lockless
      checkings, the bio isn't left in md->pushback but returned to applications
      with -EIO.
      
          Other notes on the current patch
          --------------------------------
      
      - md->pushback is added to the struct mapped_device instead of using
        md->deferred directly because md->io_lock which protects md->deferred is
        rw_semaphore and can't be used in interrupt context like dec_pending(),
        and md->io_lock protects the DMF_BLOCK_IO flag of md->flags too.
      
      - Don't issue lock_fs() in dm_suspend() if the DM_NOFLUSH_FLAG
        ioctl option is specified, because I/Os generated by lock_fs() would be
        pushed back and never return if there were no valid devices.
      
      - If an error occurs in dm_suspend() after the DMF_NOFLUSH_SUSPENDING
        flag is set, md->pushback must be flushed because I/Os may be queued to
        the list already.  (flush_and_out label in dm_suspend())
      
          Test results
          ------------
      
      I have tested using multipath target with the next patch.
      
      The following tests are for regression/compatibility:
        - I/Os succeed when valid paths exist;
        - I/Os fail when there are no valid paths and queue_if_no_path is not
          set;
        - I/Os are queued in the multipath target when there are no valid paths and
          queue_if_no_path is set;
        - The queued I/Os above fail when suspend is issued without the
          DM_NOFLUSH_FLAG ioctl option.  I/Os spanning 2 multipath targets also
          fail.
      
      The following tests are for the normal code path of new pushback feature:
        - Queued I/Os in the multipath target are flushed from the target
          but don't return when suspend is issued with the DM_NOFLUSH_FLAG
          ioctl option;
        - The I/Os above are queued in the multipath target again when
          resume is issued without path recovery;
        - The I/Os above succeed when resume is issued after path recovery
          or table load;
        - Queued I/Os in the multipath target succeed when resume is issued
          with the DM_NOFLUSH_FLAG ioctl option after table load. I/Os
          spanning 2 multipath targets also succeed.
      
      The following tests are for the error paths of the new pushback feature:
        - When the bdget_disk() fails in dm_suspend(), the
          DMF_NOFLUSH_SUSPENDING flag is cleared and I/Os already queued to the
          pushback list are flushed properly.
        - When suspend with the DM_NOFLUSH_FLAG ioctl option is interrupted,
            o I/Os which had already been queued to the pushback list
              at the time don't return, and are re-issued at resume time;
            o I/Os which hadn't been returned at the time return with EIO.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2e93ccc1
    • K
      [PATCH] dm: ioctl: add noflush suspend · 81fdb096
      Kiyoshi Ueda 提交于
      Provide a dm ioctl option to request noflush suspending.  (See next patch for
      what this is for.) As the interface is extended, the version number is
      incremented.
      
      Other than accepting the new option through the interface, There is no change
      to existing behaviour.
      
      Test results:
      Confirmed the option is given from user-space correctly.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      81fdb096
    • K
      [PATCH] dm: map and endio return code clarification · 45cbcd79
      Kiyoshi Ueda 提交于
      Tighten the use of return values from the target map and end_io functions.
      Values of 2 and above are now explictly reserved for future use.  There are no
      existing targets using such values.
      
      The patch has no effect on existing behaviour.
      
      o Reserve return values of 2 and above from target map functions.
        Any positive value currently indicates "mapping complete", but all
        existing drivers use the value 1.  We now make that a requirement
        so we can assign new meaning to higher values in future.
      
        The new definition of return values from target map functions is:
            < 0 : error
            = 0 : The target will handle the io (DM_MAPIO_SUBMITTED).
            = 1 : Mapping completed (DM_MAPIO_REMAPPED).
            > 1 : Reserved (undefined).  Previously this was the same as '= 1'.
      
      o Reserve return values of 2 and above from target end_io functions
        for similar reasons.
        DM_ENDIO_INCOMPLETE is introduced for a return value of 1.
      
      Test results:
      
        I have tested by using the multipath target.
      
        I/Os succeed when valid paths exist.
      
        I/Os are queued in the multipath target when there are no valid paths and
      queue_if_no_path is set.
      
        I/Os fail when there are no valid paths and queue_if_no_path is not set.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      45cbcd79
    • K
      [PATCH] dm: suspend: parameter change · a3d77d35
      Kiyoshi Ueda 提交于
      Change the interface of dm_suspend() so that we can pass several options
      without increasing the number of parameters.  The existing 'do_lockfs' integer
      parameter is replaced by a flag DM_SUSPEND_LOCKFS_FLAG.
      
      There is no functional change to the code.
      
      Test results:
      I have tested 'dmsetup suspend' command with/without the '--nolockfs'
      option and confirmed the do_lockfs value is correctly set.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: dm-devel@redhat.com
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a3d77d35
  13. 03 10月, 2006 2 次提交
  14. 27 6月, 2006 4 次提交
  15. 28 3月, 2006 4 次提交
  16. 07 1月, 2006 3 次提交