1. 22 4月, 2009 2 次提交
    • T
      block: fix queue bounce limit setting · cd0aca2d
      Tejun Heo 提交于
      Impact: don't set GFP_DMA in q->bounce_gfp unnecessarily
      
      All DMA address limits are expressed in terms of the last addressable
      unit (byte or page) instead of one plus that.  However, when
      determining bounce_gfp for 64bit machines in blk_queue_bounce_limit(),
      it compares the specified limit against 0x100000000UL to determine
      whether it's below 4G ending up falsely setting GFP_DMA in
      q->bounce_gfp.
      
      As DMA zone is very small on x86_64, this makes larger SG_IO transfers
      very eager to trigger OOM killer.  Fix it.  While at it, rename the
      parameter to @dma_mask for clarity and convert comment to proper
      winged style.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cd0aca2d
    • T
      block: fix SG_IO vector request data length handling · 25636e28
      Tejun Heo 提交于
      Impact: fix SG_IO behavior such that it matches the documentation
      
      SG_IO howto says that if ->dxfer_len and sum of iovec disagress, the
      shorter one wins.  However, the current implementation returns -EINVAL
      for such cases.  Trim iovc if it's longer than ->dxfer_len.
      
      This patch uses iov_*() helpers which take struct iovec * by casting
      struct sg_iovec * to it.  sg_iovec is always identical to iovec and
      this will be further cleaned up with later patches.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      25636e28
  2. 15 4月, 2009 11 次提交
  3. 07 4月, 2009 7 次提交
  4. 06 4月, 2009 3 次提交
  5. 03 4月, 2009 1 次提交
    • L
      blktrace: fix pdu_len when tracing packet command requests · e2494e1b
      Li Zefan 提交于
      Impact: output all of packet commands - not just the first 4 / 8 bytes
      
      Since commit d7e3c324 ("block: add
      large command support"), struct request->cmd has been changed from
      unsinged char cmd[BLK_MAX_CDB] to unsigned char *cmd.
      
      v1 -> v2: by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      
      - make sure rq->cmd_len is always intialized, and then we can use
        rq->cmd_len instead of BLK_MAX_CDB.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      LKML-Reference: <49D4507E.2060602@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e2494e1b
  6. 26 3月, 2009 2 次提交
  7. 24 3月, 2009 3 次提交
  8. 13 3月, 2009 2 次提交
  9. 06 3月, 2009 1 次提交
  10. 26 2月, 2009 2 次提交
    • J
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe 提交于
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1e428079
    • M
      block: add documentation for register_blkdev() · 9e8c0bcc
      Márton Németh 提交于
      Add documentation for register_blkdev() function and for the parameters.
      Signed-off-by: NMárton Németh <nm127@freemail.hu>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9e8c0bcc
  11. 25 2月, 2009 1 次提交
    • P
      generic-ipi: remove CSD_FLAG_WAIT · 6e275637
      Peter Zijlstra 提交于
      Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
      the code so that we can use CSD_FLAG_LOCK for both purposes.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e275637
  12. 20 2月, 2009 1 次提交
  13. 18 2月, 2009 4 次提交
    • H
      block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list · be987fdb
      Hannes Reinecke 提交于
      blk_abort_queue() iterates the timeout list and aborts each request on the
      list, but if the driver error handling readds a request to the timeout list
      during this processing, we could be looping forever. Fix this by splicing
      current entries to a local list and run over that list instead.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      be987fdb
    • N
      block: fix booting from partitioned md array · 41b8c853
      Neil Brown 提交于
      Hi Tejun,
      
       it looks like your commit:
      
         block: don't depend on consecutive minor space
         f331c029
      
       broke a particular case for booting from partitioned md/raid devices.
       That is the second time this has been broken recently.  The previous
       time was fixed by
      
         block: do_mounts - accept root=<non-existant partition>
         30f2f0eb
      
       Because the data isn't available when an md device is first created
       (we add disks and set it up after creation), the initial partition
       scan finds nothing.  It is not until the device is opened that
       another partition scan happens and finds something.
      
       So at the point where the kernel parameter "root=/dev/md_d0p1" is
       being parsed, md_d0 exists, but md_d0p1 does not.
       However if we let blk_lookup_devt return the correct device number
       even though the device doesn't exist, then the attempt to mount it
       will successfully find the partition.
      
       I have tried in the past to find a way to get the partition table to
       be read as soon as the array is assembled but that proved impossible
       (at the time).  I don't remember the details, and could possibly
       revisit it.  However it would be really nice if blk_lookup_devt
       could be adjusted to again accept non existant partitions.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      41b8c853
    • J
      block: fix bad definition of BIO_RW_SYNC · 93dbb393
      Jens Axboe 提交于
      We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
      and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
      213d9417.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      93dbb393
    • B
      bsg: Fix sense buffer bug in SG_IO · c1c20120
      Boaz Harrosh 提交于
      When submitting requests via SG_IO, which does a sync io, a
      bsg_command is not allocated. So an in-Kernel sense_buffer was not
      set. However when calling blk_execute_rq() with no sense buffer
      one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
      would check if rq->sense_len and a sense was requested by sg_io_v4
      the rq->sense was copy_user() back, but by now it is already mangled
      stack memory.
      
      I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
      The bsg_command->sense is provided in the write/read path like before,
      and on-the-stack buffer is provided when doing SG_IO.
      
      I have also fixed a dprintk message to print rq->errors in hex because
      of the scsi bit-field use of this member. For other block devices it
      does not matter anyway.
      Signed-off-by: NBoaz Harrosh <bharrosh@panasas.com>
      Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c1c20120