1. 29 12月, 2008 9 次提交
  2. 06 12月, 2008 1 次提交
  3. 04 12月, 2008 2 次提交
  4. 03 12月, 2008 4 次提交
    • M
      block: fix setting of max_segment_size and seg_boundary mask · 0e435ac2
      Milan Broz 提交于
      Fix setting of max_segment_size and seg_boundary mask for stacked md/dm
      devices.
      
      When stacking devices (LVM over MD over SCSI) some of the request queue
      parameters are not set up correctly in some cases by default, namely
      max_segment_size and and seg_boundary mask.
      
      If you create MD device over SCSI, these attributes are zeroed.
      
      Problem become when there is over this mapping next device-mapper mapping
      - queue attributes are set in DM this way:
      
      request_queue   max_segment_size  seg_boundary_mask
      SCSI                65536             0xffffffff
      MD RAID1                0                      0
      LVM                 65536                 -1 (64bit)
      
      Unfortunately bio_add_page (resp.  bio_phys_segments) calculates number of
      physical segments according to these parameters.
      
      During the generic_make_request() is segment cout recalculated and can
      increase bio->bi_phys_segments count over the allowed limit.  (After
      bio_clone() in stack operation.)
      
      Thi is specially problem in CCISS driver, where it produce OOPS here
      
          BUG_ON(creq->nr_phys_segments > MAXSGENTRIES);
      
      (MAXSEGENTRIES is 31 by default.)
      
      Sometimes even this command is enough to cause oops:
      
        dd iflag=direct if=/dev/<vg>/<lv> of=/dev/null bs=128000 count=10
      
      This command generates bios with 250 sectors, allocated in 32 4k-pages
      (last page uses only 1024 bytes).
      
      For LVM layer, it allocates bio with 31 segments (still OK for CCISS),
      unfortunatelly on lower layer it is recalculated to 32 segments and this
      violates CCISS restriction and triggers BUG_ON().
      
      The patch tries to fix it by:
      
       * initializing attributes above in queue request constructor
         blk_queue_make_request()
      
       * make sure that blk_queue_stack_limits() inherits setting
      
       (DM uses its own function to set the limits because it
       blk_queue_stack_limits() was introduced later.  It should probably switch
       to use generic stack limit function too.)
      
       * sets the default seg_boundary value in one place (blkdev.h)
      
       * use this mask as default in DM (instead of -1, which differs in 64bit)
      
      Bugs related to this:
      https://bugzilla.redhat.com/show_bug.cgi?id=471639
      http://bugzilla.kernel.org/show_bug.cgi?id=8672Signed-off-by: NMilan Broz <mbroz@redhat.com>
      Reviewed-by: NAlasdair G Kergon <agk@redhat.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Mike Miller <mike.miller@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      0e435ac2
    • T
      block: internal dequeue shouldn't start timer · 53a08807
      Tejun Heo 提交于
      blkdev_dequeue_request() and elv_dequeue_request() are equivalent and
      both start the timeout timer.  Barrier code dequeues the original
      barrier request but doesn't passes the request itself to lower level
      driver, only broken down proxy requests; however, as the original
      barrier code goes through the same dequeue path and timeout timer is
      started on it.  If barrier sequence takes long enough, this timer
      expires but the low level driver has no idea about this request and
      oops follows.
      
      Timeout timer shouldn't have been started on the original barrier
      request as it never goes through actual IO.  This patch unexports
      elv_dequeue_request(), which has no external user anyway, and makes it
      operate on elevator proper w/o adding the timer and make
      blkdev_dequeue_request() call elv_dequeue_request() and add timer.
      Internal users which don't pass the request to driver - barrier code
      and end_that_request_last() - are converted to use
      elv_dequeue_request().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Anderson <andmike@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53a08807
    • C
      block: set disk->node_id before it's being used · bf91db18
      Cheng Renquan 提交于
      disk->node_id will be refered in allocating in disk_expand_part_tbl, so we
      should set it before disk->node_id is refered.
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bf91db18
    • P
      When block layer fails to map iov, it calls bio_unmap_user to undo · 53cc0b29
      Petr Vandrovec 提交于
      mapping.  Which is good if pages were mapped - but if they were provided
      by someone else and just copied then bad things happen - pages are
      released once here, and once by caller, leading to user triggerable BUG
      at include/linux/mm.h:246.
      Signed-off-by: NPetr Vandrovec <petr@vandrovec.name>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      53cc0b29
  5. 26 11月, 2008 2 次提交
  6. 18 11月, 2008 3 次提交
    • J
      block: hold extra reference to bio in blk_rq_map_user_iov() · c26156b2
      Jens Axboe 提交于
      If the size passed in is OK but we end up mapping too many segments,
      we call the unmap path directly like from IO completion. But from IO
      completion we have an extra reference to the bio, so this error case
      goes OOPS when it attempts to free and already free bio.
      
      Fix it by getting an extra reference to the bio before calling the
      unmap failure case.
      Reported-by: NPetr Vandrovec <vandrove@vc.cvut.cz>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c26156b2
    • Z
      block: fix boot failure with CONFIG_DEBUG_BLOCK_EXT_DEVT=y and nash · 561ec68e
      Zhang, Yanmin 提交于
      We run into system boot failure with kernel 2.6.28-rc. We found it on a
      couple of machines, including T61 notebook, nehalem machine, and another
      HPC NX6325 notebook.  All the machines use FedoraCore 8 or FedoraCore 9.
      With kernel prior to 2.6.28-rc, system boot doesn't fail.
      
      I debug it and locate the root cause. Pls. see
      http://bugzilla.kernel.org/show_bug.cgi?id=11899
      https://bugzilla.redhat.com/show_bug.cgi?id=471517
      
      As a matter of fact, there are 2 bugs.
      
      1)root=/dev/sda1, system boot randomly fails. Mostly, boot for 5 times
      and fails once. nash has a bug. Some of its functions misuse return
      value 0.  Sometimes, 0 means timeout and no uevent available. Sometimes,
      0 means nash gets an uevent, but the uevent isn't block-related (for
      exmaple, usb). If by coincidence, kernel tells nash that uevents are
      available, but kernel also set timeout, nash might stops collecting
      other uevents in queue if current uevent isn't block-related.  I work
      out a patch for nash to fix it.
      http://bugzilla.kernel.org/attachment.cgi?id=18858
      
      2) root=LABEL=/, system always can't boot. initrd init reports
      switchroot fails. Here is an executation branch of nash when booting:
          (1) nash read /sys/block/sda/dev; Assume major is 8 (on my desktop)
          (2) nash query /proc/devices with the major number; It found line
      	"8 sd";
          (3) nash use 'sd' to search its own probe table to find device (DISK)
      	type for the device and add it to its own list;
          (4) Later on, it probes all devices in its list to get filesystem
      	labels; scsi register "8 sd" always.
      
      When major is 259, nash fails to find the device(DISK) type. I enables
      CONFIG_DEBUG_BLOCK_EXT_DEVT=y when compiling kernel, so 259 is picked up
      for device /dev/sda1, which causes nash to fail to find device (DISK)
      type.
      
      To fixing issue 2), I create a patch for nash and another patch for
      kernel.
      
      http://bugzilla.kernel.org/attachment.cgi?id=18859
      http://bugzilla.kernel.org/attachment.cgi?id=18837
      
      Below is the patch for kernel 2.6.28-rc4. It registers blkext, a new
      block device in proc/devices.
      
      With 2 patches on nash and 1 patch on kernel, I boot my machines for
      dozens of times without failure.
      
      Signed-off-by Zhang Yanmin <yanmin.zhang@linux.intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      561ec68e
    • T
      block: make add_partition() return pointer to hd_struct · ba32929a
      Tejun Heo 提交于
      Make add_partition() return pointer to the new hd_struct on success
      and ERR_PTR() value on failure.  This change will be used to fix md
      autodetection bug.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      ba32929a
  7. 06 11月, 2008 4 次提交
    • A
      Block: use round_jiffies_up() · 7838c15b
      Alan Stern 提交于
      This patch (as1159b) changes the timeout routines in the block core to
      use round_jiffies_up().  There's no point in rounding the timer
      deadline down, since if it expires too early we will have to restart
      it.
      
      The patch also removes some unnecessary tests when a request is
      removed from the queue's timer list.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7838c15b
    • M
      blk: move blk_delete_timer call in end_that_request_last · e78042e5
      Mike Anderson 提交于
      Move the calling  blk_delete_timer to later in end_that_request_last to
      address an issue where blkdev_dequeue_request may have add a timer for the
      request.
      Signed-off-by: NMike Anderson <andmike@linux.vnet.ibm.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e78042e5
    • T
      block: add timer on blkdev_dequeue_request() not elv_next_request() · 2920ebbd
      Tejun Heo 提交于
      Block queue supports two usage models - one where block driver peeks
      at the front of queue using elv_next_request(), processes it and
      finishes it and the other where block driver peeks at the front of
      queue, dequeue the request using blkdev_dequeue_request() and finishes
      it.  The latter is more flexible as it allows the driver to process
      multiple commands concurrently.
      
      These two inconsistent usage models affect the block layer
      implementation confusing.  For some, elv_next_request() is considered
      the issue point while others consider blkdev_dequeue_request() the
      issue point.
      
      Till now the inconsistency mostly affect only accounting, so it didn't
      really break anything seriously; however, with block layer timeout,
      this inconsistency hits hard.  Block layer considers
      elv_next_request() the issue point and adds timer but SCSI layer
      thinks it was just peeking and when the request can't process the
      command right away, it's just left there without further processing.
      This makes the request dangling on the timer list and, when the timer
      goes off, the request which the SCSI layer and below think is still on
      the block queue ends up in the EH queue, causing various problems - EH
      hang (failed count goes over busy count and EH never wakes up),
      WARN_ON() and oopses as low level driver trying to handle the unknown
      command, etc. depending on the timing.
      
      As SCSI midlayer is the only user of block layer timer at the moment,
      moving blk_add_timer() to elv_dequeue_request() fixes the problem;
      however, this two usage models definitely need to be cleaned up in the
      future.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      2920ebbd
    • F
      43381785
  8. 24 10月, 2008 1 次提交
  9. 23 10月, 2008 2 次提交
  10. 21 10月, 2008 12 次提交