1. 15 5月, 2008 1 次提交
    • N
      Remove blkdev warning triggered by using md · e7e72bf6
      Neil Brown 提交于
      As setting and clearing queue flags now requires that we hold a spinlock
      on the queue, and as blk_queue_stack_limits is called without that lock,
      get the lock inside blk_queue_stack_limits.
      
      For blk_queue_stack_limits to be able to find the right lock, each md
      personality needs to set q->queue_lock to point to the appropriate lock.
      Those personalities which didn't previously use a spin_lock, us
      q->__queue_lock.  So always initialise that lock when allocated.
      
      With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
      longer cause warnings as it will be clear that the proper lock is held.
      
      Thanks to Dan Williams for review and fixing the silly bugs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Alistair John Strachan <alistair@devzero.co.uk>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Jacek Luczak <difrost.kernel@gmail.com>
      Cc: Prakash Punnoor <prakash@punnoor.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e72bf6
  2. 07 5月, 2008 2 次提交
    • J
      block: avoid duplicate calls to get_part() in disk stat code · 28f13702
      Jens Axboe 提交于
      get_part() is fairly expensive, as it O(N) loops over partitions
      to find the right one. In lots of normal IO paths we end up looking
      up the partition twice, to make matters even worse. Change the
      stat add code to accept a passed in partition instead.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      28f13702
    • J
      block: optimize generic_unplug_device() · dbaf2c00
      Jens Axboe 提交于
      Original patch from Mikulas Patocka <mpatocka@redhat.com>
      
      Mike Anderson was doing an OLTP benchmark on a computer with 48 physical
      disks mapped to one logical device via device mapper.
      
      He found that there was a slowdown on request_queue->lock in function
      generic_unplug_device. The slowdown is caused by the fact that when some
      code calls unplug on the device mapper, device mapper calls unplug on all
      physical disks. These unplug calls take the lock, find that the queue is
      already unplugged, release the lock and exit.
      
      With the below patch, performance of the benchmark was increased by 18%
      (the whole OLTP application, not just block layer microbenchmarks).
      
      So I'm submitting this patch for upstream. I think the patch is correct,
      because when more threads call simultaneously plug and unplug, it is
      unspecified, if the queue is or isn't plugged (so the patch can't make
      this worse). And the caller that plugged the queue should unplug it
      anyway. (if it doesn't, there's 3ms timeout).
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      dbaf2c00
  3. 01 5月, 2008 1 次提交
  4. 29 4月, 2008 5 次提交
  5. 04 3月, 2008 3 次提交
  6. 19 2月, 2008 2 次提交
    • T
      block: add request->raw_data_len · 6b00769f
      Tejun Heo 提交于
      With padding and draining moved into it, block layer now may extend
      requests as directed by queue parameters, so now a request has two
      sizes - the original request size and the extended size which matches
      the size of area pointed to by bios and later by sgs.  The latter size
      is what lower layers are primarily interested in when allocating,
      filling up DMA tables and setting up the controller.
      
      Both padding and draining extend the data area to accomodate
      controller characteristics.  As any controller which speaks SCSI can
      handle underflows, feeding larger data area is safe.
      
      So, this patch makes the primary data length field, request->data_len,
      indicate the size of full data area and add a separate length field,
      request->raw_data_len, for the unmodified request size.  The latter is
      used to report to higher layer (userland) and where the original
      request size should be fed to the controller or device.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6b00769f
    • A
      make blk-core.c:request_cachep static again · 5ece6c52
      Adrian Bunk 提交于
      request_cachep needlessly became global.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>
      5ece6c52
  7. 08 2月, 2008 3 次提交
  8. 01 2月, 2008 2 次提交
  9. 30 1月, 2008 6 次提交
  10. 28 1月, 2008 12 次提交
    • J
      block: implement drain buffers · fa0ccd83
      James Bottomley 提交于
      These DMA drain buffer implementations in drivers are pretty horrible
      to do in terms of manipulating the scatterlist.  Plus they're being
      done at least in drivers/ide and drivers/ata, so we now have code
      duplication.
      
      The one use case for this, as I understand it is AHCI controllers doing
      PIO mode to mmc devices but translating this to DMA at the controller
      level.
      
      So, what about adding a callback to the block layer that permits the
      adding of the drain buffer for the problem devices.  The idea is that
      you'd do this in slave_configure after you find one of these devices.
      
      The beauty of doing it in the block layer is that it quietly adds the
      drain buffer to the end of the sg list, so it automatically gets mapped
      (and unmapped) without anything unusual having to be done to the
      scatterlist in driver/scsi or drivers/ata and without any alteration to
      the transfer length.
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fa0ccd83
    • J
      block: cfq: make the io contect sharing lockless · 4ac845a2
      Jens Axboe 提交于
      The io context sharing introduced a per-ioc spinlock, that would protect
      the cfq io context lookup. That is a regression from the original, since
      we never needed any locking there because the ioc/cic were process private.
      
      The cic lookup is changed from an rbtree construct to a radix tree, which
      we can then use RCU to make the reader side lockless. That is the performance
      critical path, modifying the radix tree is only done on process creation
      (when that process first does IO, actually) and on process exit (if that
      process has done IO).
      
      As it so happens, radix trees are also much faster for this type of
      lookup where the key is a pointer. It's a very sparse tree.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      4ac845a2
    • J
      io context sharing: preliminary support · d38ecf93
      Jens Axboe 提交于
      Detach task state from ioc, instead keep track of how many processes
      are accessing the ioc.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d38ecf93
    • J
      ioprio: move io priority from task_struct to io_context · fd0928df
      Jens Axboe 提交于
      This is where it belongs and then it doesn't take up space for a
      process that doesn't do IO.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fd0928df
    • K
      blk_end_request: cleanup of request completion (take 4) · b8286239
      Kiyoshi Ueda 提交于
      This patch merges complete_request() into end_that_request_last()
      for cleanup.
      
      complete_request() was introduced by earlier part of this patch-set,
      not to break the existing users of end_that_request_last().
      
      Since all users are converted to blk_end_request interfaces and
      end_that_request_last() is no longer exported, the code can be
      merged to end_that_request_last().
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      b8286239
    • K
      blk_end_request: cleanup 'uptodate' related code (take 4) · 5450d3e1
      Kiyoshi Ueda 提交于
      This patch converts 'uptodate' arguments of no longer exported
      interfaces, end_that_request_first/last, to 'error', and removes
      internal conversions for it in blk_end_request interfaces.
      
      Also, this patch removes no longer needed end_io_error().
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5450d3e1
    • K
      blk_end_request: remove/unexport end_that_request_* (take 4) · 3bcddeac
      Kiyoshi Ueda 提交于
      This patch removes the following functions:
        o end_that_request_first()
        o end_that_request_chunk()
      and stops exporting the functions below:
        o end_that_request_last()
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3bcddeac
    • K
      blk_end_request: add bidi completion interface (take 4) · e3a04fe3
      Kiyoshi Ueda 提交于
      This patch adds a variant of the interface, blk_end_bidi_request(),
      which completes a bidi request.
      
      Bidi request must be completed as a whole, both rq and rq->next_rq
      at once.  So the interface has 2 arguments for completion size.
      
      As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
      called).  So if special completion handling is needed, the handler
      must be set to rq->end_io.
      And the handler must take care of freeing next_rq too, since
      the interface doesn't care of it if rq->end_io is not NULL.
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e3a04fe3
    • K
      blk_end_request: add callback feature (take 4) · e19a3ab0
      Kiyoshi Ueda 提交于
      This patch adds a variant of the interface, blk_end_request_callback(),
      which has driver callback feature.
      
      Drivers may need to do special works between end_that_request_first()
      and end_that_request_last().
      For such drivers, blk_end_request_callback() allows it to pass
      a callback function which is called between end_that_request_first()
      and end_that_request_last().
      
      This interface is only for fallback of other blk_end_request interfaces.
      Drivers should avoid their tricky behaviors and use other interfaces
      as much as possible.
      
      Currently, only one driver, ide-cd, needs this interface.
      So this interface should/will be removed, after the driver removes
      such tricky behaviors.
      
      o ide-cd (cdrom_newpc_intr())
        In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
        until the device clears DRQ_STAT and raises an interrupt after
        end_that_request_first().
        So end_that_request_first() and end_that_request_last() are called
        separately in cdrom_newpc_intr().
      
        This means blk_end_request_callback() has to return without
        completing request even if no leftover in the request.
        To satisfy the requirement, callback function has return value
        so that drivers can tell blk_end_request_callback() to return
        without completing request.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e19a3ab0
    • K
      blk_end_request: changing block layer core (take 4) · 9e6e39f2
      Kiyoshi Ueda 提交于
      This patch converts core parts of block layer to use blk_end_request
      interfaces.  Related 'uptodate' arguments are converted to 'error'.
      
      'dequeue' argument was originally introduced for end_dequeued_request(),
      where no attempt should be made to dequeue the request as it's already
      dequeued.
      However, it's not necessary as it can be checked with
      list_empty(&rq->queuelist).
      (Dequeued request has empty list and queued request doesn't.)
      And it has been done in blk_end_request interfaces.
      
      As a result of this patch, end_queued_request() and
      end_dequeued_request() become identical.  A future patch will merge
      and rename them and change users of those functions.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9e6e39f2
    • K
      blk_end_request: add/export functions to get request size (take 4) · 3b11313a
      Kiyoshi Ueda 提交于
      This patch adds/exports functions to get the size of request in bytes.
      They are useful because blk_end_request interfaces take bytes
      as a completed I/O size instead of sectors.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3b11313a
    • K
      blk_end_request: add new request completion interface (take 4) · 336cdb40
      Kiyoshi Ueda 提交于
      This patch adds 2 new interfaces for request completion:
        o blk_end_request()   : called without queue lock
        o __blk_end_request() : called with queue lock held
      
      blk_end_request takes 'error' as an argument instead of 'uptodate',
      which current end_that_request_* take.
      The meanings of values are below and the value is used when bio is
      completed.
          0 : success
        < 0 : error
      
      Some device drivers call some generic functions below between
      end_that_request_{first/chunk} and end_that_request_last().
        o add_disk_randomness()
        o blk_queue_end_tag()
        o blkdev_dequeue_request()
      These are called in the blk_end_request interfaces as a part of
      generic request completion.
      So all device drivers become to call above functions.
      To decide whether to call blkdev_dequeue_request(), blk_end_request
      uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it).
      So drivers must re-initialize it using list_init() or so before calling
      blk_end_request if drivers use it for its specific purpose.
      (Currently, there is no driver which completes request without
       re-initializing the queuelist after used it.  So rq->queuelist
       can be used for the purpose above.)
      
      "Normal" drivers can be converted to use blk_end_request()
      in a standard way shown below.
      
       a) end_that_request_{chunk/first}
          spin_lock_irqsave()
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => blk_end_request()
      
       b) spin_lock_irqsave()
          end_that_request_{chunk/first}
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => spin_lock_irqsave()
             __blk_end_request()
             spin_unlock_irqsave()
      
       c) spin_lock_irqsave()
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => blk_end_request()   or   spin_lock_irqsave()
                                      __blk_end_request()
                                      spin_unlock_irqrestore()
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      336cdb40
  11. 25 1月, 2008 3 次提交