1. 19 10月, 2011 2 次提交
  2. 28 9月, 2011 1 次提交
    • H
      block: Free queue resources at blk_release_queue() · 777eb1bf
      Hannes Reinecke 提交于
      A kernel crash is observed when a mounted ext3/ext4 filesystem is
      physically removed. The problem is that blk_cleanup_queue() frees up
      some resources eg by calling elevator_exit(), which are not checked for
      in normal operation. So we should rather move these calls to the
      destructor function blk_release_queue() as at that point all remaining
      references are gone. However, in doing so we have to ensure that any
      externally supplied queue_lock is disconnected as the driver might free
      up the lock after the call of blk_cleanup_queue(),
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      777eb1bf
  3. 21 9月, 2011 1 次提交
    • S
      block: document blk-plug · 75df7136
      Suresh Jayaraman 提交于
      Thus spake Andrew Morton:
      
      "And I have the usual maintainability whine.  If someone comes up to
      vmscan.c and sees it calling blk_start_plug(), how are they supposed to
      work out why that call is there?  They go look at the blk_start_plug()
      definition and it is undocumented.  I think we can do better than this?"
      
      Adapted from the LWN article - http://lwn.net/Articles/438256/ by Jens
      Axboe and from an earlier attempt by Shaohua Li to document blk-plug.
      
      [akpm@linux-foundation.org: grammatical and spelling tweaks]
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: NAndrew Morton <akpm@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      75df7136
  4. 15 9月, 2011 1 次提交
    • C
      block: refactor generic_make_request · 27a84d54
      Christoph Hellwig 提交于
      Move all the checks performed on a bio into a new helper, and call it as
      soon as bio is submitted even if it is a re-submission from ->make_request.
      
      We explicitly mark the new helper as beeing non-inlined as the stack
      usage for printing the block device name in the failure case is quite
      high and this a patch where we have to be extremely conservative about
      stack usage.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      27a84d54
  5. 12 9月, 2011 3 次提交
  6. 24 8月, 2011 2 次提交
  7. 16 8月, 2011 1 次提交
    • J
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer 提交于
      Commit ae1b1539, block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4853abaa
  8. 04 8月, 2011 1 次提交
    • A
      fault-injection: add ability to export fault_attr in arbitrary directory · dd48c085
      Akinobu Mita 提交于
      init_fault_attr_dentries() is used to export fault_attr via debugfs.
      But it can only export it in debugfs root directory.
      
      Per Forlin is working on mmc_fail_request which adds support to inject
      data errors after a completed host transfer in MMC subsystem.
      
      The fault_attr for mmc_fail_request should be defined per mmc host and
      export it in debugfs directory per mmc host like
      /sys/kernel/debug/mmc0/mmc_fail_request.
      
      init_fault_attr_dentries() doesn't help for mmc_fail_request.  So this
      introduces fault_create_debugfs_attr() which is able to create a
      directory in the arbitrary directory and replace
      init_fault_attr_dentries().
      
      [akpm@linux-foundation.org: extraneous semicolon, per Randy]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Tested-by: NPer Forlin <per.forlin@linaro.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd48c085
  9. 27 7月, 2011 1 次提交
  10. 26 7月, 2011 1 次提交
    • J
      block: fix warning with calling smp_processor_id() in preemptible section · 11ccf116
      Jens Axboe 提交于
      After commit 5757a6d7 introduced an unsafe calling of
      smp_processor_id(), with preempt debuggin turned on we spew a lot of:
      
      BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/514
      caller is __make_request+0x1b8/0x308
      [<c0019f44>] (unwind_backtrace+0x0/0xe8) from [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0)
      [<c024b4cc>] (debug_smp_processor_id+0xbc/0xf0) from [<c0223d14>] (__make_request+0x1b8/0x308)
      [<c0223d14>] (__make_request+0x1b8/0x308) from [<c02215ac>] (generic_make_request+0x4dc/0x558)
      [<c02215ac>] (generic_make_request+0x4dc/0x558) from [<c022173c>] (submit_bio+0x114/0x138)
      [<c022173c>] (submit_bio+0x114/0x138) from [<c011f504>] (submit_bh+0x148/0x16c)
      [<c011f504>] (submit_bh+0x148/0x16c) from [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8)
      [<c0121ed8>] (__sync_dirty_buffer+0x88/0xd8) from [<c01aff78>] (journal_commit_transaction+0x1198/0x1688)
      [<c01aff78>] (journal_commit_transaction+0x1198/0x1688) from [<c01b4034>] (kjournald+0xb4/0x224)
      [<c01b4034>] (kjournald+0xb4/0x224) from [<c0069ea0>] (kthread+0x8c/0x94)
      [<c0069ea0>] (kthread+0x8c/0x94) from [<c00137f8>] (kernel_thread_exit+0x0/0x8)
      
      Fix this by just using raw_smp_processor_id(), it's just a hint
      after all. There's no pinning of the CPU or accessing per-cpu
      structures involved.
      Reported-by: NMing Lei <tom.leiming@gmail.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      11ccf116
  11. 24 7月, 2011 1 次提交
    • D
      block: strict rq_affinity · 5757a6d7
      Dan Williams 提交于
      Some systems benefit from completions always being steered to the strict
      requester cpu rather than the looser "per-socket" steering that
      blk_cpu_to_group() attempts by default. This is because the first
      CPU in the group mask ends up being completely overloaded with work,
      while the others (including the original submitter) has power left
      to spare.
      
      Allow the strict mode to be set by writing '2' to the sysfs control
      file. This is identical to the scheme used for the nomerges file,
      where '2' is a more aggressive setting than just being turned on.
      
      echo 2 > /sys/block/<bdev>/queue/rq_affinity
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Roland Dreier <roland@purestorage.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      5757a6d7
  12. 22 7月, 2011 1 次提交
    • J
      [SCSI] fix crash in scsi_dispatch_cmd() · bfe159a5
      James Bottomley 提交于
      USB surprise removal of sr is triggering an oops in
      scsi_dispatch_command().  What seems to be happening is that USB is
      hanging on to a queue reference until the last close of the upper
      device, so the crash is caused by surprise remove of a mounted CD
      followed by attempted unmount.
      
      The problem is that USB doesn't issue its final commands as part of
      the SCSI teardown path, but on last close when the block queue is long
      gone.  The long term fix is probably to make sr do the teardown in the
      same way as sd (so remove all the lower bits on ejection, but keep the
      upper disk alive until last close of user space).  However, the
      current oops can be simply fixed by not allowing any commands to be
      sent to a dead queue.
      
      Cc: stable@kernel.org
      Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>
      bfe159a5
  13. 08 7月, 2011 1 次提交
    • S
      block: avoid building too big plug list · 55c022bb
      Shaohua Li 提交于
      When I test fio script with big I/O depth, I found the total throughput drops
      compared to some relative small I/O depth. The reason is the thread accumulates
      big requests in its plug list and causes some delays (surely this depends
      on CPU speed).
      I thought we'd better have a threshold for requests. When a threshold reaches,
      this means there is no request merge and queue lock contention isn't severe
      when pushing per-task requests to queue, so the main advantages of blk plug
      don't exist. We can force a plug list flush in this case.
      With this, my test throughput actually increases and almost equals to small
      I/O depth. Another side effect is irq off time decreases in blk_flush_plug_list()
      for big I/O depth.
      The BLK_MAX_REQUEST_COUNT is choosen arbitarily, but 16 is efficiently to
      reduce lock contention to me. But I'm open here, 32 is ok in my test too.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      55c022bb
  14. 27 5月, 2011 2 次提交
  15. 23 5月, 2011 1 次提交
  16. 21 5月, 2011 2 次提交
  17. 18 5月, 2011 1 次提交
    • S
      block: don't delay blk_run_queue_async · 3ec717b7
      Shaohua Li 提交于
      Let's check a scenario:
      1. blk_delay_queue(q, SCSI_QUEUE_DELAY);
      2. blk_run_queue_async();
      the second one will became a noop, because q->delay_work already has
      WORK_STRUCT_PENDING_BIT set, so the delayed work will still run after
      SCSI_QUEUE_DELAY. But blk_run_queue_async actually hopes the delayed
      work runs immediately.
      
      Fix this by doing a cancel on potentially pending delayed work
      before queuing an immediate run of the workqueue.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      3ec717b7
  18. 19 4月, 2011 4 次提交
  19. 18 4月, 2011 5 次提交
  20. 16 4月, 2011 1 次提交
  21. 15 4月, 2011 3 次提交
  22. 12 4月, 2011 4 次提交
    • J
      block: move queue run on unplug to kblockd · f4af3c3d
      Jens Axboe 提交于
      There are worries that we are now consuming a lot more stack in
      some cases, since we potentially call into IO dispatch from
      schedule() or io_schedule(). We can reduce this problem by moving
      the running of the queue to kblockd, like the old plugging scheme
      did as well.
      
      This may or may not be a good idea from a performance perspective,
      depending on how many tasks have queue plugs running at the same
      time. For even the slightly contended case, doing just a single
      queue run from kblockd instead of multiple runs directly from the
      unpluggers will be faster.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f4af3c3d
    • J
      block: kill queue_sync_plugs() · cf82c798
      Jens Axboe 提交于
      The original use for this dates back to when we had to track write
      requests for serializing around barriers. That's not needed anymore,
      so kill it.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      cf82c798
    • J
      block: readd plug trace event · dc6d36c9
      Jens Axboe 提交于
      This was removed with the queue plug state. But we can easily readd
      by checking if this is the first request going to this queue. It's
      good information to have when tracing to see how effective the
      plugging is.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dc6d36c9
    • J
      block: add callback function for unplug notification · f7566457
      Jens Axboe 提交于
      MD would like to know when a queue is unplugged, so it can flush
      it's bitmap writes. Add such a callback.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      f7566457