1. 27 8月, 2008 1 次提交
    • F
      block: move cmdfilter from gendisk to request_queue · abf54393
      FUJITA Tomonori 提交于
      cmd_filter works only for the block layer SG_IO with SCSI block
      devices. It breaks scsi/sg.c, bsg, and the block layer SG_IO with SCSI
      character devices (such as st). We hit a kernel crash with them.
      
      The problem is that cmd_filter code accesses to gendisk (having struct
      blk_scsi_cmd_filter) via inode->i_bdev->bd_disk. It works for only
      SCSI block device files. With character device files, inode->i_bdev
      leads you to struct cdev. inode->i_bdev->bd_disk->blk_scsi_cmd_filter
      isn't safe.
      
      SCSI ULDs don't expose gendisk; they keep it private. bsg needs to be
      independent on any protocols. We shouldn't change ULDs to expose their
      gendisk.
      
      This patch moves struct blk_scsi_cmd_filter from gendisk to
      request_queue, a common object, which eveyone can access to.
      
      The user interface doesn't change; users can change the filters via
      /sys/block/. gendisk has a pointer to request_queue so the cmd_filter
      code accesses to struct blk_scsi_cmd_filter.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      abf54393
  2. 02 8月, 2008 1 次提交
  3. 17 7月, 2008 1 次提交
  4. 16 7月, 2008 1 次提交
  5. 04 7月, 2008 1 次提交
  6. 03 7月, 2008 8 次提交
  7. 30 4月, 2008 2 次提交
  8. 29 4月, 2008 4 次提交
  9. 21 4月, 2008 2 次提交
    • A
      block: fix memory hotplug and bouncing in block layer · 2472892a
      Andi Kleen 提交于
      Only noticed this while hacking something else, no test case.
      
      blk_max_low_pfn is initialized once at bootup by the block layer from
      max_low_pfn.  But max_low_pfn is not necessarily constant over the runtime of
      the system when you consider memory hotplug.  What could happen if that
      someone adds memory later the block layer wouldn't get updated and then start
      bouncing memory unnecessarily.
      
      Also on 64bit blk_max_low_pfn actually isn't needed because it just disables
      bouncing essentially and there is no highmem.  And nobody can pass pfns >
      max_low_pfn to the block layer, because those wouldn't have a struct page and
      I suspect block layer wouldn't be very happy without that.
      
      So set BLK_BOUNCE_HIGH to infinity (-1ULL) on 64bit.  That avoids the problem
      of having to update it on memory hotadd.
      
      On 32bit I kept the same behaviour because at least on i386
      memory hotadd only adds HIGHMEM, never lowmem.
      
      BLK_BOUNCE_ANY is always set to infinity on both 32 and 64bit.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2472892a
    • F
      block: move the padding adjustment to blk_rq_map_sg · f18573ab
      FUJITA Tomonori 提交于
      blk_rq_map_user adjusts bi_size of the last bio. It breaks the rule
      that req->data_len (the true data length) is equal to sum(bio). It
      broke the scsi command completion code.
      
      commit e97a294e was introduced to fix
      the above issue. However, the partial completion code doesn't work
      with it. The commit is also a layer violation (scsi mid-layer should
      not know about the block layer's padding).
      
      This patch moves the padding adjustment to blk_rq_map_sg (suggested by
      James). The padding works like the drain buffer. This patch breaks the
      rule that req->data_len is equal to sum(sg), however, the drain buffer
      already broke it. So this patch just restores the rule that
      req->data_len is equal to sub(bio) without breaking anything new.
      
      Now when a low level driver needs padding, blk_rq_map_user and
      blk_rq_map_user_iov guarantee there's enough room for padding.
      blk_rq_map_sg can safely extend the last entry of a scatter list.
      
      blk_rq_map_sg must extend the last entry of a scatter list only for a
      request that got through bio_copy_user_iov. This patches introduces
      new REQ_COPY_USER flag.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Christie <michaelc@cs.wisc.edu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      f18573ab
  10. 04 3月, 2008 2 次提交
  11. 19 2月, 2008 2 次提交
    • T
      block: implement request_queue->dma_drain_needed · 2fb98e84
      Tejun Heo 提交于
      Draining shouldn't be done for commands where overflow may indicate
      data integrity issues.  Add dma_drain_needed callback to
      request_queue.  Drain buffer is appened iff this function returns
      non-zero.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2fb98e84
    • T
      block: add request->raw_data_len · 6b00769f
      Tejun Heo 提交于
      With padding and draining moved into it, block layer now may extend
      requests as directed by queue parameters, so now a request has two
      sizes - the original request size and the extended size which matches
      the size of area pointed to by bios and later by sgs.  The latter size
      is what lower layers are primarily interested in when allocating,
      filling up DMA tables and setting up the controller.
      
      Both padding and draining extend the data area to accomodate
      controller characteristics.  As any controller which speaks SCSI can
      handle underflows, feeding larger data area is safe.
      
      So, this patch makes the primary data length field, request->data_len,
      indicate the size of full data area and add a separate length field,
      request->raw_data_len, for the unmodified request size.  The latter is
      used to report to higher layer (userland) and where the original
      request size should be fed to the controller or device.
      Signed-off-by: NTejun Heo <htejun@gmail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6b00769f
  12. 08 2月, 2008 1 次提交
    • J
      block: fixup rq_init() a bit · 63a71386
      Jens Axboe 提交于
      Rearrange fields in cache order and initialize some fields that
      we didn't previously init. Remove init of ->completion_data, it's
      part of a union with ->hash. Luckily clearing the rb node is the same
      as setting it to null!
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      63a71386
  13. 01 2月, 2008 2 次提交
  14. 30 1月, 2008 1 次提交
  15. 28 1月, 2008 10 次提交
    • J
      block: implement drain buffers · fa0ccd83
      James Bottomley 提交于
      These DMA drain buffer implementations in drivers are pretty horrible
      to do in terms of manipulating the scatterlist.  Plus they're being
      done at least in drivers/ide and drivers/ata, so we now have code
      duplication.
      
      The one use case for this, as I understand it is AHCI controllers doing
      PIO mode to mmc devices but translating this to DMA at the controller
      level.
      
      So, what about adding a callback to the block layer that permits the
      adding of the drain buffer for the problem devices.  The idea is that
      you'd do this in slave_configure after you find one of these devices.
      
      The beauty of doing it in the block layer is that it quietly adds the
      drain buffer to the end of the sg list, so it automatically gets mapped
      (and unmapped) without anything unusual having to be done to the
      scatterlist in driver/scsi or drivers/ata and without any alteration to
      the transfer length.
      Signed-off-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fa0ccd83
    • J
      io context sharing: preliminary support · d38ecf93
      Jens Axboe 提交于
      Detach task state from ioc, instead keep track of how many processes
      are accessing the ioc.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d38ecf93
    • J
      ioprio: move io priority from task_struct to io_context · fd0928df
      Jens Axboe 提交于
      This is where it belongs and then it doesn't take up space for a
      process that doesn't do IO.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      fd0928df
    • K
      blk_end_request: cleanup 'uptodate' related code (take 4) · 5450d3e1
      Kiyoshi Ueda 提交于
      This patch converts 'uptodate' arguments of no longer exported
      interfaces, end_that_request_first/last, to 'error', and removes
      internal conversions for it in blk_end_request interfaces.
      
      Also, this patch removes no longer needed end_io_error().
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5450d3e1
    • K
      blk_end_request: remove/unexport end_that_request_* (take 4) · 3bcddeac
      Kiyoshi Ueda 提交于
      This patch removes the following functions:
        o end_that_request_first()
        o end_that_request_chunk()
      and stops exporting the functions below:
        o end_that_request_last()
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3bcddeac
    • K
      blk_end_request: add bidi completion interface (take 4) · e3a04fe3
      Kiyoshi Ueda 提交于
      This patch adds a variant of the interface, blk_end_bidi_request(),
      which completes a bidi request.
      
      Bidi request must be completed as a whole, both rq and rq->next_rq
      at once.  So the interface has 2 arguments for completion size.
      
      As for ->end_io, only rq->end_io is called (rq->next_rq->end_io is not
      called).  So if special completion handling is needed, the handler
      must be set to rq->end_io.
      And the handler must take care of freeing next_rq too, since
      the interface doesn't care of it if rq->end_io is not NULL.
      
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e3a04fe3
    • K
      blk_end_request: add callback feature (take 4) · e19a3ab0
      Kiyoshi Ueda 提交于
      This patch adds a variant of the interface, blk_end_request_callback(),
      which has driver callback feature.
      
      Drivers may need to do special works between end_that_request_first()
      and end_that_request_last().
      For such drivers, blk_end_request_callback() allows it to pass
      a callback function which is called between end_that_request_first()
      and end_that_request_last().
      
      This interface is only for fallback of other blk_end_request interfaces.
      Drivers should avoid their tricky behaviors and use other interfaces
      as much as possible.
      
      Currently, only one driver, ide-cd, needs this interface.
      So this interface should/will be removed, after the driver removes
      such tricky behaviors.
      
      o ide-cd (cdrom_newpc_intr())
        In PIO mode, cdrom_newpc_intr() needs to defer end_that_request_last()
        until the device clears DRQ_STAT and raises an interrupt after
        end_that_request_first().
        So end_that_request_first() and end_that_request_last() are called
        separately in cdrom_newpc_intr().
      
        This means blk_end_request_callback() has to return without
        completing request even if no leftover in the request.
        To satisfy the requirement, callback function has return value
        so that drivers can tell blk_end_request_callback() to return
        without completing request.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e19a3ab0
    • K
      blk_end_request: add/export functions to get request size (take 4) · 3b11313a
      Kiyoshi Ueda 提交于
      This patch adds/exports functions to get the size of request in bytes.
      They are useful because blk_end_request interfaces take bytes
      as a completed I/O size instead of sectors.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3b11313a
    • K
      blk_end_request: add new request completion interface (take 4) · 336cdb40
      Kiyoshi Ueda 提交于
      This patch adds 2 new interfaces for request completion:
        o blk_end_request()   : called without queue lock
        o __blk_end_request() : called with queue lock held
      
      blk_end_request takes 'error' as an argument instead of 'uptodate',
      which current end_that_request_* take.
      The meanings of values are below and the value is used when bio is
      completed.
          0 : success
        < 0 : error
      
      Some device drivers call some generic functions below between
      end_that_request_{first/chunk} and end_that_request_last().
        o add_disk_randomness()
        o blk_queue_end_tag()
        o blkdev_dequeue_request()
      These are called in the blk_end_request interfaces as a part of
      generic request completion.
      So all device drivers become to call above functions.
      To decide whether to call blkdev_dequeue_request(), blk_end_request
      uses list_empty(&rq->queuelist) (blk_queued_rq() macro is added for it).
      So drivers must re-initialize it using list_init() or so before calling
      blk_end_request if drivers use it for its specific purpose.
      (Currently, there is no driver which completes request without
       re-initializing the queuelist after used it.  So rq->queuelist
       can be used for the purpose above.)
      
      "Normal" drivers can be converted to use blk_end_request()
      in a standard way shown below.
      
       a) end_that_request_{chunk/first}
          spin_lock_irqsave()
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => blk_end_request()
      
       b) spin_lock_irqsave()
          end_that_request_{chunk/first}
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => spin_lock_irqsave()
             __blk_end_request()
             spin_unlock_irqsave()
      
       c) spin_lock_irqsave()
          (add_disk_randomness(), blk_queue_end_tag(), blkdev_dequeue_request())
          end_that_request_last()
          spin_unlock_irqrestore()
          => blk_end_request()   or   spin_lock_irqsave()
                                      __blk_end_request()
                                      spin_unlock_irqrestore()
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      336cdb40
    • P
      block: allow queue dma_alignment of zero · 482eb689
      Pete Wyckoff 提交于
      Let queue_dma_alignment return 0 if it was specifically set to 0.
      This permits devices with no particular alignment restrictions to
      use arbitrary user space buffers without copying.
      Signed-off-by: NPete Wyckoff <pw@osc.edu>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      482eb689
  16. 27 1月, 2008 1 次提交