1. 19 8月, 2011 1 次提交
    • J
      Revert "cfq: Remove special treatment for metadata rqs." · b53d1ed7
      Jens Axboe 提交于
      We have a kernel build regression since 3.1-rc1, which is about 10%
      regression. The kernel source is in an ext3 filesystem.
      Alex Shi bisect it to commit:
      commit a07405b7
      Author: Justin TerAvest <teravest@google.com>
      Date:   Sun Jul 10 22:09:19 2011 +0200
      
          cfq: Remove special treatment for metadata rqs.
      
      Apparently this is caused by lack metadata preemption, where ext3/ext4
      do use READ_META. I didn't see a way to fix the issue, so suggest
      reverting the patch.
      
      This reverts commit a07405b7.
      
      Reported-by: Alex Shi<alex.shi@intel.com>
      Reported-by: Shaohua Li<shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b53d1ed7
  2. 16 8月, 2011 1 次提交
    • J
      block: fix flush machinery for stacking drivers with differring flush flags · 4853abaa
      Jeff Moyer 提交于
      Commit ae1b1539, block: reimplement
      FLUSH/FUA to support merge, introduced a performance regression when
      running any sort of fsyncing workload using dm-multipath and certain
      storage (in our case, an HP EVA).  The test I ran was fs_mark, and it
      dropped from ~800 files/sec on ext4 to ~100 files/sec.  It turns out
      that dm-multipath always advertised flush+fua support, and passed
      commands on down the stack, where those flags used to get stripped off.
      The above commit changed that behavior:
      
      static inline struct request *__elv_next_request(struct request_queue *q)
      {
              struct request *rq;
      
              while (1) {
      -               while (!list_empty(&q->queue_head)) {
      +               if (!list_empty(&q->queue_head)) {
                              rq = list_entry_rq(q->queue_head.next);
      -                       if (!(rq->cmd_flags & (REQ_FLUSH | REQ_FUA)) ||
      -                           (rq->cmd_flags & REQ_FLUSH_SEQ))
      -                               return rq;
      -                       rq = blk_do_flush(q, rq);
      -                       if (rq)
      -                               return rq;
      +                       return rq;
                      }
      
      Note that previously, a command would come in here, have
      REQ_FLUSH|REQ_FUA set, and then get handed off to blk_do_flush:
      
      struct request *blk_do_flush(struct request_queue *q, struct request *rq)
      {
              unsigned int fflags = q->flush_flags; /* may change, cache it */
              bool has_flush = fflags & REQ_FLUSH, has_fua = fflags & REQ_FUA;
              bool do_preflush = has_flush && (rq->cmd_flags & REQ_FLUSH);
              bool do_postflush = has_flush && !has_fua && (rq->cmd_flags &
              REQ_FUA);
              unsigned skip = 0;
      ...
              if (blk_rq_sectors(rq) && !do_preflush && !do_postflush) {
                      rq->cmd_flags &= ~REQ_FLUSH;
      		if (!has_fua)
      			rq->cmd_flags &= ~REQ_FUA;
      	        return rq;
      	}
      
      So, the flush machinery was bypassed in such cases (q->flush_flags == 0
      && rq->cmd_flags & (REQ_FLUSH|REQ_FUA)).
      
      Now, however, we don't get into the flush machinery at all.  Instead,
      __elv_next_request just hands a request with flush and fua bits set to
      the scsi_request_fn, even if the underlying request_queue does not
      support flush or fua.
      
      The agreed upon approach is to fix the flush machinery to allow
      stacking.  While this isn't used in practice (since there is only one
      request-based dm target, and that target will now reflect the flush
      flags of the underlying device), it does future-proof the solution, and
      make it function as designed.
      
      In order to make this work, I had to add a field to the struct request,
      inside the flush structure (to store the original req->end_io).  Shaohua
      had suggested overloading the union with rb_node and completion_data,
      but the completion data is used by device mapper and can also be used by
      other drivers.  So, I didn't see a way around the additional field.
      
      I tested this patch on an HP EVA with both ext4 and xfs, and it recovers
      the lost performance.  Comments and other testers, as always, are
      appreciated.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4853abaa
  3. 11 8月, 2011 3 次提交
    • S
      block: improve rq_affinity placement · bcf30e75
      Shaohua Li 提交于
      This patch reverts commit 35ae66e0(block: Make rq_affinity = 1
      work as expected). The purpose is to avoid an unnecessary IPI.
      Let's take an example. My test box has cpu 0-7, one socket. Say request is
      added from CPU 1, blk_complete_request() occurs at CPU 7. Without the reverted
      patch, softirq will be done at CPU 7. With it, an IPI will be directed to CPU
      0, and softirq will be done at CPU 0. In this case, doing softirq at CPU 0 and
      CPU 7 have no difference from cache sharing point view and we can avoid an
      ipi if doing it in CPU 7.
      An immediate concern is this is just like QUEUE_FLAG_SAME_FORCE, but actually
      not. blk_complete_request() is running in interrupt handler, and currently
      I/O controller doesn't support multiple interrupts (I checked several LSI
      cards and AHCI), so only one CPU can run blk_complete_request(). This is
      still quite different as QUEUE_FLAG_SAME_FORCE.
      Since only one CPU runs softirq, the only difference with below patch is
      softirq not always runs at the first CPU of a group.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      bcf30e75
    • N
      blktrace: add FLUSH/FUA support · c09c47ca
      Namhyung Kim 提交于
      Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
      FUA follows WRITE, use the same 'F' flag for both cases and
      distinguish them by their (relative) position. The end results
      look like (other flags might be shown also):
      
       - WRITE:            W
       - WRITE_FLUSH:      FW
       - WRITE_FUA:        WF
       - WRITE_FLUSH_FUA:  FWF
      
      Note that we reuse TC_BARRIER due to lack of bit space of act_mask
      so that the older versions of blktrace tools will report flush
      requests as barriers from now on.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c09c47ca
    • M
      Move some REQ flags to the common bio/request area · 8e4bf844
      Matthew Wilcox 提交于
      REQ_SECURE, REQ_FLUSH and REQ_FUA may all be set on a bio as well as
      on a request, so relocate them to the shared part of the enum.
      Signed-off-by: NMatthew Wilcox <willy@linux.intel.com>
      Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8e4bf844
  4. 10 8月, 2011 2 次提交
    • J
      Merge branch 'stable/for-jens' of... · 40bb96ad
      Jens Axboe 提交于
      Merge branch 'stable/for-jens' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen into for-linus
      40bb96ad
    • J
      allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH · fa1bf42f
      Jeff Moyer 提交于
      blk_insert_flush has the following check:
      
      	/*
      	 * If there's data but flush is not necessary, the request can be
      	 * processed directly without going through flush machinery.  Queue
      	 * for normal execution.
      	 */
      	if ((policy & REQ_FSEQ_DATA) &&
      	    !(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
      		list_add_tail(&rq->queuelist, &q->queue_head);
      		return;
      	}
      
      However, blk_flush_policy will not return with policy set to only
      REQ_FSEQ_DATA:
      
      static unsigned int blk_flush_policy(unsigned int fflags, struct request *rq)
      {
      	unsigned int policy = 0;
      
      	if (fflags & REQ_FLUSH) {
      		if (rq->cmd_flags & REQ_FLUSH)
      			policy |= REQ_FSEQ_PREFLUSH;
      		if (blk_rq_sectors(rq))
      			policy |= REQ_FSEQ_DATA;
      		if (!(fflags & REQ_FUA) && (rq->cmd_flags & REQ_FUA))
      			policy |= REQ_FSEQ_POSTFLUSH;
      	}
      	return policy;
      }
      
      Notice that REQ_FSEQ_DATA is only set if REQ_FLUSH is set.  Fix this
      mismatch by moving the setting of REQ_FSEQ_DATA outside of the REQ_FLUSH
      check.
      
      Tejun notes:
      
        Hmmm... yes, this can become a correctness issue if (and only if)
        blk_queue_flush() is called to change q->flush_flags while requests
        are in-flight; otherwise, requests wouldn't reach the function at all.
        Also, I think it would be a generally good idea to always set
        FSEQ_DATA if the request has data.
      
      Cheers,
      Jeff
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fa1bf42f
  5. 09 8月, 2011 1 次提交
  6. 05 8月, 2011 2 次提交
    • V
      cfq-iosched: Add documentation about idling · 4931402a
      Vivek Goyal 提交于
      There are always questions about why CFQ is idling on various conditions.
      Recent ones is Christoph asking again why to idle on REQ_NOIDLE. His
      assertion is that XFS is relying more and more on workqueues and is
      concerned that CFQ idling on IO from every workqueue will impact
      XFS badly.
      
      So he suggested that I add some more documentation about CFQ idling
      and that can provide more clarity on the topic and also gives an
      opprotunity to poke a hole in theory and lead to improvements.
      
      So here is my attempt at that. Any comments are welcome.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4931402a
    • T
      block: Make rq_affinity = 1 work as expected · 35ae66e0
      Tao Ma 提交于
      Commit 5757a6d7 introduced a new rq_affinity = 2 so as to make
      the request completed in the __make_request cpu. But it makes the
      old rq_affinity = 1 not work any more. The root cause is that
      if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu,
      ccpu will be the same as group_cpu, so the completion will be
      excuted in the 'cpu' not 'group_cpu'.
      
      This patch fix problem by simpling removing group_cpu and the codes
      are more explicit now. If ccpu == cpu, we complete in cpu, otherwise
      we raise_blk_irq to ccpu.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      Reviewed-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      35ae66e0
  7. 03 8月, 2011 1 次提交
  8. 02 8月, 2011 5 次提交
  9. 01 8月, 2011 24 次提交