1. 13 11月, 2010 3 次提交
    • T
      block: clean up blkdev_get() wrappers and their users · d4d77629
      Tejun Heo 提交于
      After recent blkdev_get() modifications, open_by_devnum() and
      open_bdev_exclusive() are simple wrappers around blkdev_get().
      Replace them with blkdev_get_by_dev() and blkdev_get_by_path().
      
      blkdev_get_by_dev() is identical to open_by_devnum().
      blkdev_get_by_path() is slightly different in that it doesn't
      automatically add %FMODE_EXCL to @mode.
      
      All users are converted.  Most conversions are mechanical and don't
      introduce any behavior difference.  There are several exceptions.
      
      * btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
        reason to OR it explicitly on blkdev_put().
      
      * gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
        sb->s_mode.
      
      * With the above changes, sb->s_mode now always should contain
        FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
        errors.
      
      The new blkdev_get_*() functions are with proper docbook comments.
      While at it, add function description to blkdev_get() too.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Philipp Reisner <philipp.reisner@linbit.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Joern Engel <joern@lazybastard.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: xfs-masters@oss.sgi.com
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      d4d77629
    • T
      block: make blkdev_get/put() handle exclusive access · e525fd89
      Tejun Heo 提交于
      Over time, block layer has accumulated a set of APIs dealing with bdev
      open, close, claim and release.
      
      * blkdev_get/put() are the primary open and close functions.
      
      * bd_claim/release() deal with exclusive open.
      
      * open/close_bdev_exclusive() are combination of open and claim and
        the other way around, respectively.
      
      * bd_link/unlink_disk_holder() to create and remove holder/slave
        symlinks.
      
      * open_by_devnum() wraps bdget() + blkdev_get().
      
      The interface is a bit confusing and the decoupling of open and claim
      makes it impossible to properly guarantee exclusive access as
      in-kernel open + claim sequence can disturb the existing exclusive
      open even before the block layer knows the current open if for another
      exclusive access.  Reorganize the interface such that,
      
      * blkdev_get() is extended to include exclusive access management.
        @holder argument is added and, if is @FMODE_EXCL specified, it will
        gain exclusive access atomically w.r.t. other exclusive accesses.
      
      * blkdev_put() is similarly extended.  It now takes @mode argument and
        if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
        the last exclusive claim is released, the holder/slave symlinks are
        removed automatically.
      
      * bd_claim/release() and close_bdev_exclusive() are no longer
        necessary and either made static or removed.
      
      * bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
        is no longer necessary and removed.
      
      * open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
        and blkdev_get().  It also has an unexpected extra bdev_read_only()
        test which probably should be moved into blkdev_get().
      
      * open_by_devnum() is modified to take @holder argument and pass it to
        blkdev_get().
      
      Most of bdev open/close operations are unified into blkdev_get/put()
      and most exclusive accesses are tested atomically at the open time (as
      it should).  This cleans up code and removes some, both valid and
      invalid, but unnecessary all the same, corner cases.
      
      open_bdev_exclusive() and open_by_devnum() can use further cleanup -
      rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
      special features.  Well, let's leave them for another day.
      
      Most conversions are straight-forward.  drbd conversion is a bit more
      involved as there was some reordering, but the logic should stay the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Alex Elder <aelder@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: dm-devel@redhat.com
      Cc: drbd-dev@lists.linbit.com
      Cc: Leo Chen <leochen@broadcom.com>
      Cc: Scott Branden <sbranden@broadcom.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: Joern Engel <joern@logfs.org>
      Cc: reiserfs-devel@vger.kernel.org
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      e525fd89
    • T
      block: simplify holder symlink handling · e09b457b
      Tejun Heo 提交于
      Code to manage symlinks in /sys/block/*/{holders|slaves} are overly
      complex with multiple holder considerations, redundant extra
      references to all involved kobjects, unused generic kobject holder
      support and unnecessary mixup with bd_claim/release functionalities.
      
      Strip it down to what's necessary (single gendisk holder) and make it
      use a separate interface.  This is a step for cleaning up
      bd_claim/release.  This patch makes dm-table slightly more complex but
      it will be simplified again with further changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NNeil Brown <neilb@suse.de>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      e09b457b
  2. 11 9月, 2010 1 次提交
  3. 12 8月, 2010 2 次提交
  4. 06 3月, 2010 2 次提交
    • N
      dm table: remove unused dm_get_device range parameters · 8215d6ec
      Nikanth Karthikesan 提交于
      Remove unused parameters(start and len) of dm_get_device()
      and fix the callers.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8215d6ec
    • K
      dm table: remove dm_get from dm_table_get_md · ecdb2e25
      Kiyoshi Ueda 提交于
      Remove the dm_get() in dm_table_get_md() because dm_table_get_md() could
      be called from presuspend/postsuspend, which are called while
      mapped_device is in DMF_FREEING state, where dm_get() is not allowed.
      
      Justification for that is the lifetime of both objects: As far as the
      current dm design/implementation, mapped_device is never freed while
      targets are doing something, because dm core waits for targets to become
      quiet in dm_put() using presuspend/postsuspend.  So targets should be
      able to touch mapped_device without holding reference count of the
      mapped_device, and we should allow targets to touch mapped_device even
      if it is in DMF_FREEING state.
      
      Backgrounds:
      I'm trying to remove the multipath internal queue, since dm core now has
      a generic queue for request-based dm.  In the patch-set, the multipath
      target wants to request dm core to start/stop queue.  One of such
      start/stop requests can happen during postsuspend() while the target
      waits for pg-init to complete, because the target stops queue when
      starting pg-init and tries to restart it when completing pg-init.  Since
      queue belongs to mapped_device, it involves calling dm_table_get_md()
      and dm_put().  On the other hand, postsuspend() is called in dm_put()
      for mapped_device which is in DMF_FREEING state, and that triggers
      BUG_ON(DMF_FREEING) in the 2nd dm_put().
      
      I had tried to solve this problem by changing only multipath not to
      touch mapped_device which is in DMF_FREEING state, but I couldn't and I
      came up with a question why we need dm_get() in dm_table_get_md().
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ecdb2e25
  5. 11 1月, 2010 1 次提交
  6. 16 12月, 2009 1 次提交
    • A
      tree-wide: convert open calls to remove spaces to skip_spaces() lib function · e7d2860b
      André Goddard Rosa 提交于
      Makes use of skip_spaces() defined in lib/string.c for removing leading
      spaces from strings all over the tree.
      
      It decreases lib.a code size by 47 bytes and reuses the function tree-wide:
         text    data     bss     dec     hex filename
        64688     584     592   65864   10148 (TOTALS-BEFORE)
        64641     584     592   65817   10119 (TOTALS-AFTER)
      
      Also, while at it, if we see (*str && isspace(*str)), we can be sure to
      remove the first condition (*str) as the second one (isspace(*str)) also
      evaluates to 0 whenever *str == 0, making it redundant. In other words,
      "a char equals zero is never a space".
      
      Julia Lawall tried the semantic patch (http://coccinelle.lip6.fr) below,
      and found occurrences of this pattern on 3 more files:
          drivers/leds/led-class.c
          drivers/leds/ledtrig-timer.c
          drivers/video/output.c
      
      @@
      expression str;
      @@
      
      ( // ignore skip_spaces cases
      while (*str &&  isspace(*str)) { \(str++;\|++str;\) }
      |
      - *str &&
      isspace(*str)
      )
      Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
      Cc: Julia Lawall <julia@diku.dk>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: David Howells <dhowells@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Samuel Ortiz <samuel@sortiz.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7d2860b
  7. 11 12月, 2009 1 次提交
    • A
      dm: bind new table before destroying old · a7940155
      Alasdair G Kergon 提交于
      When replacing a mapped device's table during a 'resume', delay the
      destruction of the old table until the new one is successfully in place.
      
      This will make it easier for a later patch to transfer internal state
      information from the old table to the new one (something we do not currently
      support) while giving us more options for reversion if a later part
      of the operation fails.
      
      Devices are always in the suspended state during dm_swap_table().
      This patch reinforces the requirement that all I/O must have been
      flushed from the table targets while in this state (including any in
      workqueues).  In the case of 'noflush' suspending, unprocessed
      I/O should have been 'pushed back' to the dm core prior to this point,
      for resubmission after the new table is in place.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a7940155
  8. 05 9月, 2009 3 次提交
  9. 24 7月, 2009 2 次提交
  10. 30 6月, 2009 1 次提交
  11. 22 6月, 2009 10 次提交
    • K
      dm: do not set QUEUE_ORDERED_DRAIN if request based · 5d67aa23
      Kiyoshi Ueda 提交于
      Request-based dm doesn't have barrier support yet.
      So we need to set QUEUE_ORDERED_DRAIN only for bio-based dm.
      Since the device type is decided at the first table loading time,
      the flag set is deferred until then.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5d67aa23
    • K
      dm: enable request based option · e6ee8c0b
      Kiyoshi Ueda 提交于
      This patch enables request-based dm.
      
      o Request-based dm and bio-based dm coexist, since there are
        some target drivers which are more fitting to bio-based dm.
        Also, there are other bio-based devices in the kernel
        (e.g. md, loop).
        Since bio-based device can't receive struct request,
        there are some limitations on device stacking between
        bio-based and request-based.
      
                           type of underlying device
                         bio-based      request-based
         ----------------------------------------------
          bio-based         OK                OK
          request-based     --                OK
      
        The device type is recognized by the queue flag in the kernel,
        so dm follows that.
      
      o The type of a dm device is decided at the first table binding time.
        Once the type of a dm device is decided, the type can't be changed.
      
      o Mempool allocations are deferred to at the table loading time, since
        mempools for request-based dm are different from those for bio-based
        dm and needed mempool type is fixed by the type of table.
      
      o Currently, request-based dm supports only tables that have a single
        target.  To support multiple targets, we need to support request
        splitting or prevent bio/request from spanning multiple targets.
        The former needs lots of changes in the block layer, and the latter
        needs that all target drivers support merge() function.
        Both will take a time.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      e6ee8c0b
    • K
      dm: prepare for request based option · cec47e3d
      Kiyoshi Ueda 提交于
      This patch adds core functions for request-based dm.
      
      When struct mapped device (md) is initialized, md->queue has
      an I/O scheduler and the following functions are used for
      request-based dm as the queue functions:
          make_request_fn: dm_make_request()
          pref_fn:         dm_prep_fn()
          request_fn:      dm_request_fn()
          softirq_done_fn: dm_softirq_done()
          lld_busy_fn:     dm_lld_busy()
      Actual initializations are done in another patch (PATCH 2).
      
      Below is a brief summary of how request-based dm behaves, including:
        - making request from bio
        - cloning, mapping and dispatching request
        - completing request and bio
        - suspending md
        - resuming md
      
        bio to request
        ==============
        md->queue->make_request_fn() (dm_make_request()) calls __make_request()
        for a bio submitted to the md.
        Then, the bio is kept in the queue as a new request or merged into
        another request in the queue if possible.
      
        Cloning and Mapping
        ===================
        Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
        when requests are dispatched after they are sorted by the I/O scheduler.
      
        dm_request_fn() checks busy state of underlying devices using
        target's busy() function and stops dispatching requests to keep them
        on the dm device's queue if busy.
        It helps better I/O merging, since no merge is done for a request
        once it is dispatched to underlying devices.
      
        Actual cloning and mapping are done in dm_prep_fn() and map_request()
        called from dm_request_fn().
        dm_prep_fn() clones not only request but also bios of the request
        so that dm can hold bio completion in error cases and prevent
        the bio submitter from noticing the error.
        (See the "Completion" section below for details.)
      
        After the cloning, the clone is mapped by target's map_rq() function
          and inserted to underlying device's queue using
          blk_insert_cloned_request().
      
        Completion
        ==========
        Request completion can be hooked by rq->end_io(), but then, all bios
        in the request will have been completed even error cases, and the bio
        submitter will have noticed the error.
        To prevent the bio completion in error cases, request-based dm clones
        both bio and request and hooks both bio->bi_end_io() and rq->end_io():
            bio->bi_end_io(): end_clone_bio()
            rq->end_io():     end_clone_request()
      
        Summary of the request completion flow is below:
        blk_end_request() for a clone request
          => blk_update_request()
             => bio->bi_end_io() == end_clone_bio() for each clone bio
                => Free the clone bio
                => Success: Complete the original bio (blk_update_request())
                   Error:   Don't complete the original bio
          => blk_finish_request()
             => rq->end_io() == end_clone_request()
                => blk_complete_request()
                   => dm_softirq_done()
                      => Free the clone request
                      => Success: Complete the original request (blk_end_request())
                         Error:   Requeue the original request
      
        end_clone_bio() completes the original request on the size of
        the original bio in successful cases.
        Even if all bios in the original request are completed by that
        completion, the original request must not be completed yet to keep
        the ordering of request completion for the stacking.
        So end_clone_bio() uses blk_update_request() instead of
        blk_end_request().
        In error cases, end_clone_bio() doesn't complete the original bio.
        It just frees the cloned bio and gives over the error handling to
        end_clone_request().
      
        end_clone_request(), which is called with queue lock held, completes
        the clone request and the original request in a softirq context
        (dm_softirq_done()), which has no queue lock, to avoid a deadlock
        issue on submission of another request during the completion:
            - The submitted request may be mapped to the same device
            - Request submission requires queue lock, but the queue lock
              has been held by itself and it doesn't know that
      
        The clone request has no clone bio when dm_softirq_done() is called.
        So target drivers can't resubmit it again even error cases.
        Instead, they can ask dm core for requeueing and remapping
        the original request in that cases.
      
        suspend
        =======
        Request-based dm uses stopping md->queue as suspend of the md.
        For noflush suspend, just stops md->queue.
      
        For flush suspend, inserts a marker request to the tail of md->queue.
        And dispatches all requests in md->queue until the marker comes to
        the front of md->queue.  Then, stops dispatching request and waits
        for the all dispatched requests to complete.
        After that, completes the marker request, stops md->queue and
        wake up the waiter on the suspend queue, md->wait.
      
        resume
        ======
        Starts md->queue.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cec47e3d
    • M
      dm: calculate queue limits during resume not load · 754c5fc7
      Mike Snitzer 提交于
      Currently, device-mapper maintains a separate instance of 'struct
      queue_limits' for each table of each device.  When the configuration of
      a device is to be changed, first its table is loaded and this structure
      is populated, then the device is 'resumed' and the calculated
      queue_limits are applied.
      
      This places restrictions on how userspace may process related devices,
      where it is often advantageous to 'load' tables for several devices
      at once before 'resuming' them together.  As the new queue_limits
      only take effect after the 'resume', if they are changing and one
      device uses another, the latter must be 'resumed' before the former
      may be 'loaded'.
      
      This patch moves the calculation of these queue_limits out of
      the 'load' operation into 'resume'.  Since we are no longer
      pre-calculating this struct, we no longer need to maintain copies
      within our dm structs.
      
      dm_set_device_limits() now passes the 'start' of the device's
      data area (aka pe_start) as the 'offset' to blk_stack_limits().
      
      init_valid_queue_limits() is replaced by blk_set_default_limits().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: martin.petersen@oracle.com
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      754c5fc7
    • M
      dm table: establish queue limits by copying table limits · 1197764e
      Mike Snitzer 提交于
      Copy the table's queue_limits to the DM device's request_queue.  This
      properly initializes the queue's topology limits and also avoids having
      to track the evolution of 'struct queue_limits' in
      dm_table_set_restrictions()
      
      Also fixes a bug that was introduced in dm_table_set_restrictions() via
      commit ae03bf63.  In addition to
      establishing 'bounce_pfn' in the queue's limits blk_queue_bounce_limit()
      also performs an allocation to setup the ISA DMA pool.  This allocation
      resulted in "sleeping function called from invalid context" when called
      from dm_table_set_restrictions().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1197764e
    • M
      dm table: replace struct io_restrictions with struct queue_limits · 5ab97588
      Mike Snitzer 提交于
      Use blk_stack_limits() to stack block limits (including topology) rather
      than duplicate the equivalent within Device Mapper.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5ab97588
    • M
      dm table: validate device logical_block_size · be6d4305
      Mike Snitzer 提交于
      Impose necessary and sufficient conditions on a devices's table such
      that any incoming bio which respects its logical_block_size can be
      processed successfully.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      be6d4305
    • M
      dm table: ensure targets are aligned to logical_block_size · 02acc3a4
      Mike Snitzer 提交于
      Ensure I/O is aligned to the logical block size of target devices.
      
      Rename check_device_area() to device_area_is_valid() for clarity and
      establish the device limits including the logical block size prior to
      calling it.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02acc3a4
    • J
      dm table: improve warning message when devices not freed before destruction · 1b6da754
      Jonthan Brassow 提交于
      Report any devices forgotten to be freed before a table is destroyed.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1b6da754
    • M
      dm: use i_size_read · 5657e8fa
      Mikulas Patocka 提交于
      Use i_size_read() instead of reading i_size.
      
      If someone changes the size of the device simultaneously, i_size_read
      is guaranteed to return a valid value (either the old one or the new one).
      
      i_size can return some intermediate invalid value (on 32-bit computers
      with 64-bit i_size, the reads to both halves of i_size can be interleaved
      with updates to i_size, resulting in garbage being returned).
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      5657e8fa
  12. 09 6月, 2009 1 次提交
  13. 03 6月, 2009 1 次提交
  14. 23 5月, 2009 2 次提交
  15. 09 4月, 2009 2 次提交
  16. 03 4月, 2009 1 次提交
    • A
      dm table: fix upgrade mode race · 570b9d96
      Alasdair G Kergon 提交于
      upgrade_mode() sets bdev to NULL temporarily, and does not have any
      locking to exclude anything from seeing that NULL.
      
      In dm_table_any_congested() bdev_get_queue() can dereference that NULL and
      cause a reported oops.
      
      Fix this by not changing that field during the mode upgrade.
      
      Cc: stable@kernel.org
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      570b9d96
  17. 06 1月, 2009 2 次提交
    • M
      dm table: rework reference counting · d5816876
      Mikulas Patocka 提交于
      Rework table reference counting.
      
      The existing code uses a reference counter. When the last reference is
      dropped and the counter reaches zero, the table destructor is called.
      Table reference counters are acquired/released from upcalls from other
      kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
      If the reference counter reaches zero in one of the upcalls, the table
      destructor is called from almost random kernel code.
      
      This leads to various problems:
      * dm_any_congested being called under a spinlock, which calls the
        destructor, which calls some sleeping function.
      * the destructor attempting to take a lock that is already taken by the
        same process.
      * stale reference from some other kernel code keeps the table
        constructed, which keeps some devices open, even after successful
        return from "dmsetup remove". This can confuse lvm and prevent closing
        of underlying devices or reusing device minor numbers.
      
      The patch changes reference counting so that the table destructor can be
      called only at predetermined places.
      
      The table has always exactly one reference from either mapped_device->map
      or hash_cell->new_map. After this patch, this reference is not counted
      in table->holders.  A pair of dm_create_table/dm_destroy_table functions
      is used for table creation/destruction.
      
      Temporary references from the other code increase table->holders. A pair
      of dm_table_get/dm_table_put functions is used to manipulate it.
      
      When the table is about to be destroyed, we wait for table->holders to
      reach 0. Then, we call the table destructor.  We use active waiting with
      msleep(1), because the situation happens rarely (to one user in 5 years)
      and removing the device isn't performance-critical task: the user doesn't
      care if it takes one tick more or not.
      
      This way, the destructor is called only at specific points
      (dm_table_destroy function) and the above problems associated with lazy
      destruction can't happen.
      
      Finally remove the temporary protection added to dm_any_congested().
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d5816876
    • A
      dm: support barriers on simple devices · ab4c1424
      Andi Kleen 提交于
      Implement barrier support for single device DM devices
      
      This patch implements barrier support in DM for the common case of dm linear
      just remapping a single underlying device. In this case we can safely
      pass the barrier through because there can be no reordering between
      devices.
      
       NB. Any DM device might cease to support barriers if it gets
           reconfigured so code must continue to allow for a possible
           -EOPNOTSUPP on every barrier bio submitted.  - agk
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ab4c1424
  18. 03 12月, 2008 1 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
  19. 23 10月, 2008 1 次提交
  20. 21 10月, 2008 2 次提交