1. 27 7月, 2011 1 次提交
  2. 26 7月, 2011 1 次提交
    • J
      block: fix warning with calling smp_processor_id() in preemptible section · 11ccf116
      Jens Axboe 提交于
      After commit 5757a6d7 introduced an unsafe calling of
      smp_processor_id(), with preempt debuggin turned on we spew a lot of:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514
      caller is __make_request+0x1b8/0x308
      [<c0019f44>] (unwind_backtrace+0x0/0xe8) from [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0)
      [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0) from [<c0223d14>] (__make_request+0x1b8/0x308)
      [<c0223d14>] (__make_request+0x1b8/0x308) from [<c02215ac>] (generic_make_request+0x4dc/0x558)
      [<c02215ac>] (generic_make_request+0x4dc/0x558) from [<c022173c>] (submit_bio+0x114/0x138)
      [<c022173c>] (submit_bio+0x114/0x138) from [<c011f504>] (submit_bh+0x148/0x16c)
      [<c011f504>] (submit_bh+0x148/0x16c) from [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8)
      [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8) from [<c01aff78>] (journal_commit_transaction+0x1198/0x1688)
      [<c01aff78>] (journal_commit_transaction+0x1198/0x1688) from [<c01b4034>] (kjournald+0xb4/0x224)
      [<c01b4034>] (kjournald+0xb4/0x224) from [<c0069ea0>] (kthread+0x8c/0x94)
      [<c0069ea0>] (kthread+0x8c/0x94) from [<c00137f8>] (kernel_thread_exit+0x0/0x8)
      
      Fix this by just using raw_smp_processor_id(), it's just a hint
      after all. There's no pinning of the CPU or accessing per-cpu
      structures involved.
      Reported-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      11ccf116
  3. 24 7月, 2011 2 次提交
  4. 22 7月, 2011 1 次提交
    • J
      [SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5
      James Bottomley 提交于
      USB surprise removal of sr is triggering an oops in
      scsi_dispatch_command().  What seems to be happening is that USB is
      hanging on to a queue reference until the last close of the upper
      device, so the crash is caused by surprise remove of a mounted CD
      followed by attempted unmount.
      
      The problem is that USB doesn't issue its final commands as part of
      the SCSI teardown path, but on last close when the block queue is long
      gone.  The long term fix is probably to make sr do the teardown in the
      same way as sd (so remove all the lower bits on ejection, but keep the
      upper disk alive until last close of user space).  However, the
      current oops can be simply fixed by not allowing any commands to be
      sent to a dead queue.
      
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bfe159a5
  5. 21 7月, 2011 1 次提交
  6. 12 7月, 2011 4 次提交
    • S
      CFQ: add think time check for group · 7700fc4f
      Shaohua Li 提交于
      Currently when the last queue of a group has no request, we don't expire
      the queue to hope request from the group comes soon, so the group doesn't
      miss its share. But if the think time is big, the assumption isn't correct
      and we just waste bandwidth. In such case, we don't do idle.
      
      [global]
      runtime=30
      direct=1
      
      [test1]
      cgroup=test1
      cgroup_weight=1000
      rw=randread
      ioengine=libaio
      size=500m
      runtime=30
      directory=/mnt
      filename=file1
      thinktime=9000
      
      [test2]
      cgroup=test2
      cgroup_weight=1000
      rw=randread
      ioengine=libaio
      size=500m
      runtime=30
      directory=/mnt
      filename=file2
      
      	patched		base
      test1	64k		39k
      test2	548k		540k
      total	604k		578k
      
      group1 gets much better throughput because it waits less time.
      
      To check if the patch changes behavior of queue without think time. I also
      tried to give test1 2ms think time or no think time. The test result is stable.
      The thoughput doesn't change with/without the patch.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7700fc4f
    • S
      CFQ: add think time check for service tree · f5f2b6ce
      Shaohua Li 提交于
      Currently when the last queue of a service tree has no request, we don't
      expire the queue to hope request from the service tree comes soon, so the
      service tree doesn't miss its share. But if the think time is big, the
      assumption isn't correct and we just waste bandwidth. In such case, we
      don't do idle.
      
      [global]
      runtime=10
      direct=1
      
      [test1]
      rw=randread
      ioengine=libaio
      size=500m
      directory=/mnt
      filename=file1
      thinktime=9000
      
      [test2]
      rw=read
      ioengine=libaio
      size=1G
      directory=/mnt
      filename=file2
      
      	patched		base
      test1	41k/s		33k/s
      test2	15868k/s	15789k/s
      total	15902k/s	15817k/s
      
      A slightly better
      
      To check if the patch changes behavior of queue without think time. I also
      tried to give test1 2ms think time or no think time. The test has variation
      even without the patch, but the average throughput doesn't change with/without
      the patch.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f5f2b6ce
    • S
      CFQ: move think time check variables to a separate struct · 383cd721
      Shaohua Li 提交于
      Move the variables to do think time check to a sepatate struct. This is
      to prepare adding think time check for service tree and group. No
      functional change.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      383cd721
    • J
      fixlet: Remove fs_excl from struct task. · 4aede84b
      Justin TerAvest 提交于
      fs_excl is a poor man's priority inheritance for filesystems to hint to
      the block layer that an operation is important. It was never clearly
      specified, not widely adopted, and will not prevent starvation in many
      cases (like across cgroups).
      
      fs_excl was introduced with the time sliced CFQ IO scheduler, to
      indicate when a process held FS exclusive resources and thus needed
      a boost.
      
      It doesn't cover all file systems, and it was never fully complete.
      Lets kill it.
      Signed-off-by: NJustin TerAvest <teravest@google.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4aede84b
  7. 11 7月, 2011 1 次提交
  8. 08 7月, 2011 1 次提交
    • S
      block: avoid building too big plug list · 55c022bb
      Shaohua Li 提交于
      When I test fio script with big I/O depth, I found the total throughput drops
      compared to some relative small I/O depth. The reason is the thread accumulates
      big requests in its plug list and causes some delays (surely this depends
      on CPU speed).
      I thought we'd better have a threshold for requests. When a threshold reaches,
      this means there is no request merge and queue lock contention isn't severe
      when pushing per-task requests to queue, so the main advantages of blk plug
      don't exist. We can force a plug list flush in this case.
      With this, my test throughput actually increases and almost equals to small
      I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
      for big I/O depth.
      The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
      reduce lock contention to me. But I'm open here, 32 is ok in my test too.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      55c022bb
  9. 07 7月, 2011 1 次提交
    • M
      block: eliminate potential for infinite loop in blkdev_issue_discard · 0f799603
      Mike Snitzer 提交于
      Due to the recently identified overflow in read_capacity_16() it was
      possible for max_discard_sectors to be zero but still have discards
      enabled on the associated device's queue.
      
      Eliminate the possibility for blkdev_issue_discard to infinitely loop.
      
      Interestingly this issue wasn't identified until a device, whose
      discard_granularity was 0 due to read_capacity_16 overflow, was consumed
      by blk_stack_limits() to construct limits for a higher-level DM
      multipath device.  The multipath device's resulting limits never had the
      discard limits stacked because blk_stack_limits() will only do so if
      the bottom device's discard_granularity != 0.  This resulted in the
      multipath device's limits.max_discard_sectors being 0.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      0f799603
  10. 02 7月, 2011 1 次提交
    • J
      compat_ioctl: fix warning caused by qemu · 390192b3
      Johannes Stezenbach 提交于
      On Linux x86_64 host with 32bit userspace, running
      qemu or even just "qemu-img create -f qcow2 some.img 1G"
      causes a kernel warning:
      
      ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(00005326){t:'S';sz:0} arg(7fffffff) on some.img
      ioctl32(qemu-img:5296): Unknown cmd fd(3) cmd(801c0204){t:02;sz:28} arg(fff77350) on some.img
      
      ioctl 00005326 is CDROM_DRIVE_STATUS,
      ioctl 801c0204 is FDGETPRM.
      
      The warning appears because the Linux compat-ioctl handler for these
      ioctls only applies to block devices, while qemu also uses the ioctls on
      plain files.
      Signed-off-by: NJohannes Stezenbach <js@sig21.net>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      390192b3
  11. 01 7月, 2011 1 次提交
    • T
      block: flush MEDIA_CHANGE from drivers on close(2) · 85ef06d1
      Tejun Heo 提交于
      Currently, only open(2) is defined as the 'clearing' point.  It has
      two roles - first, it's an acknowledgement from userland indicating
      that the event has been received and kernel can clear pending states
      and proceed to generate more events.  Secondly, it's passed on to
      device drivers as a hint indicating that a synchronization point has
      been reached and it might want to take a deeper look at the device.
      
      The latter currently is only used by sr which uses two different
      mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
      to discover events, where the former is lighter weight and safe to be
      used repeatedly but may not provide full coverage.  Among other
      things, GET_EVENT can't detect media removal while TUR can.
      
      This patch makes close(2) - blkdev_put() - indicate clearing hint for
      MEDIA_CHANGE to drivers.  disk_check_events() is renamed to
      disk_flush_events() and updated to take @mask for events to flush
      which is or'd to ev->clearing and will be passed to the driver on the
      next ->check_events() invocation.
      
      This change makes sr generate MEDIA_CHANGE when media is ejected from
      userland - e.g. with eject(1).
      
      Note: Given the current usage, it seems @clearing hint is needlessly
      complex.  disk_clear_events() can simply clear all events and the hint
      can be boolean @flush.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      85ef06d1
  12. 27 6月, 2011 2 次提交
  13. 20 6月, 2011 3 次提交
  14. 14 6月, 2011 2 次提交
  15. 13 6月, 2011 2 次提交
  16. 10 6月, 2011 3 次提交
  17. 06 6月, 2011 4 次提交
  18. 03 6月, 2011 1 次提交
    • J
      iosched: prevent aliased requests from starving other I/O · 796d5116
      Jeff Moyer 提交于
      Hi, Jens,
      
      If you recall, I posted an RFC patch for this back in July of last year:
      http://lkml.org/lkml/2010/7/13/279
      
      The basic problem is that a process can issue a never-ending stream of
      async direct I/Os to the same sector on a device, thus starving out
      other I/O in the system (due to the way the alias handling works in both
      cfq and deadline).  The solution I proposed back then was to start
      dispatching from the fifo after a certain number of aliases had been
      dispatched.  Vivek asked why we had to treat aliases differently at all,
      and I never had a good answer.  So, I put together a simple patch which
      allows aliases to be added to the rb tree (it adds them to the right,
      though that doesn't matter as the order isn't guaranteed anyway).  I
      think this is the preferred solution, as it doesn't break up time slices
      in CFQ or batches in deadline.  I've tested it, and it does solve the
      starvation issue.  Let me know what you think.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      796d5116
  19. 02 6月, 2011 2 次提交
  20. 01 6月, 2011 1 次提交
  21. 27 5月, 2011 4 次提交
  22. 24 5月, 2011 1 次提交