1. 10 9月, 2010 5 次提交
    • T
      block: rename blk-barrier.c to blk-flush.c · 8839a0e0
      Tejun Heo 提交于
      Without ordering requirements, barrier and ordering are minomers.
      Rename block/blk-barrier.c to block/blk-flush.c.  Rename of symbols
      will follow.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8839a0e0
    • T
      block: drop barrier ordering by queue draining · 28e7d184
      Tejun Heo 提交于
      Filesystems will take all the responsibilities for ordering requests
      around commit writes and will only indicate how the commit writes
      themselves should be handled by block layers.  This patch drops
      barrier ordering by queue draining from block layer.  Ordering by
      draining implementation was somewhat invasive to request handling.
      List of notable changes follow.
      
      * Each queue has 1 bit color which is flipped on each barrier issue.
        This is used to track whether a given request is issued before the
        current barrier or not.  REQ_ORDERED_COLOR flag and coloring
        implementation in __elv_add_request() are removed.
      
      * Requests which shouldn't be processed yet for draining were stalled
        by returning -EAGAIN from blk_do_ordered() according to the test
        result between blk_ordered_req_seq() and blk_blk_ordered_cur_seq().
        This logic is removed.
      
      * Draining completion logic in elv_completed_request() removed.
      
      * All barrier sequence requests were queued to request queue and then
        trckled to lower layer according to progress and thus maintaining
        request orders during requeue was necessary.  This is replaced by
        queueing the next request in the barrier sequence only after the
        current one is complete from blk_ordered_complete_seq(), which
        removes the need for multiple proxy requests in struct request_queue
        and the request sorting logic in the ELEVATOR_INSERT_REQUEUE path of
        elv_insert().
      
      * As barriers no longer have ordering constraints, there's no need to
        dump the whole elevator onto the dispatch queue on each barrier.
        Insert barriers at the front instead.
      
      * If other barrier requests come to the front of the dispatch queue
        while one is already in progress, they are stored in
        q->pending_barriers and restored to dispatch queue one-by-one after
        each barrier completion from blk_ordered_complete_seq().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      28e7d184
    • T
      block: misc cleanups in barrier code · dd831006
      Tejun Heo 提交于
      Make the following cleanups in preparation of barrier/flush update.
      
      * blk_do_ordered() declaration is moved from include/linux/blkdev.h to
        block/blk.h.
      
      * blk_do_ordered() now returns pointer to struct request, with %NULL
        meaning "try the next request" and ERR_PTR(-EAGAIN) "try again
        later".  The third case will be dropped with further changes.
      
      * In the initialization of proxy barrier request, data direction is
        already set by init_request_from_bio().  Drop unnecessary explicit
        REQ_WRITE setting and move init_request_from_bio() above REQ_FUA
        flag setting.
      
      * add_request() is collapsed into __make_request().
      
      These changes don't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      dd831006
    • T
      block: deprecate barrier and replace blk_queue_ordered() with blk_queue_flush() · 4913efe4
      Tejun Heo 提交于
      Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
      requests.  Deprecate barrier.  All REQ_HARDBARRIERs are failed with
      -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
      blk_queue_flush().
      
      blk_queue_flush() takes combinations of REQ_FLUSH and FUA.  If a
      device has write cache and can flush it, it should set REQ_FLUSH.  If
      the device can handle FUA writes, it should also set REQ_FUA.
      
      All blk_queue_ordered() users are converted.
      
      * ORDERED_DRAIN is mapped to 0 which is the default value.
      * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
      * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NBoaz Harrosh <bharrosh@panasas.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Pierre Ossman <drzeus@drzeus.cx>
      Cc: Stefan Weinhuber <wein@de.ibm.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      4913efe4
    • T
      block: kill QUEUE_ORDERED_BY_TAG · 6958f145
      Tejun Heo 提交于
      Nobody is making meaningful use of ORDERED_BY_TAG now and queue
      draining for barrier requests will be removed soon which will render
      the advantage of tag ordering moot.  Kill ORDERED_BY_TAG.  The
      following users are affected.
      
      * brd: converted to ORDERED_DRAIN.
      * virtio_blk: ORDERED_TAG path was already marked deprecated.  Removed.
      * xen-blkfront: ORDERED_TAG case dropped.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      6958f145
  2. 12 8月, 2010 1 次提交
  3. 09 8月, 2010 2 次提交
  4. 08 8月, 2010 19 次提交
  5. 24 6月, 2010 1 次提交
  6. 21 6月, 2010 1 次提交
  7. 19 6月, 2010 1 次提交
    • V
      cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n · e98ef89b
      Vivek Goyal 提交于
      Hi Jens,
      
      Few days back Ingo noticed a CFQ boot time warning. This patch fixes it.
      The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really
      be making blkio stat related calls.
      
      > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e5. With
      > some
      > configs i get bad spinlock warnings during bootup:
      >
      > [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750
      > usecs
      > [   28.972003] calling  b44_init+0x0/0x55 @ 1
      > [   28.976009] bus: 'pci': add driver b44
      > [   28.976374]  sda:
      > [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
      > [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, +.owner_cpu: 0
      > [   28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
      > [   28.980000] Call Trace:
      > [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
      > [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
      > [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
      > [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
      > [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
      > [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
      > [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
      > [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e98ef89b
  8. 18 6月, 2010 1 次提交
    • J
      cfq: Don't allow queue merges for queues that have no process references · c10b61f0
      Jeff Moyer 提交于
      Hi,
      
      A user reported a kernel bug when running a particular program that did
      the following:
      
      created 32 threads
      - each thread took a mutex, grabbed a global offset, added a buffer size
        to that offset, released the lock
      - read from the given offset in the file
      - created a new thread to do the same
      - exited
      
      The result is that cfq's close cooperator logic would trigger, as the
      threads were issuing I/O within the mean seek distance of one another.
      This workload managed to routinely trigger a use after free bug when
      walking the list of merge candidates for a particular cfqq
      (cfqq->new_cfqq).  The logic used for merging queues looks like this:
      
      static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq)
      {
      	int process_refs, new_process_refs;
      	struct cfq_queue *__cfqq;
      
      	/* Avoid a circular list and skip interim queue merges */
      	while ((__cfqq = new_cfqq->new_cfqq)) {
      		if (__cfqq == cfqq)
      			return;
      		new_cfqq = __cfqq;
      	}
      
      	process_refs = cfqq_process_refs(cfqq);
      	/*
      	 * If the process for the cfqq has gone away, there is no
      	 * sense in merging the queues.
      	 */
      	if (process_refs == 0)
      		return;
      
      	/*
      	 * Merge in the direction of the lesser amount of work.
      	 */
      	new_process_refs = cfqq_process_refs(new_cfqq);
      	if (new_process_refs >= process_refs) {
      		cfqq->new_cfqq = new_cfqq;
      		atomic_add(process_refs, &new_cfqq->ref);
      	} else {
      		new_cfqq->new_cfqq = cfqq;
      		atomic_add(new_process_refs, &cfqq->ref);
      	}
      }
      
      When a merge candidate is found, we add the process references for the
      queue with less references to the queue with more.  The actual merging
      of queues happens when a new request is issued for a given cfqq.  In the
      case of the test program, it only does a single pread call to read in
      1MB, so the actual merge never happens.
      
      Normally, this is fine, as when the queue exits, we simply drop the
      references we took on the other cfqqs in the merge chain:
      
      	/*
      	 * If this queue was scheduled to merge with another queue, be
      	 * sure to drop the reference taken on that queue (and others in
      	 * the merge chain).  See cfq_setup_merge and cfq_merge_cfqqs.
      	 */
      	__cfqq = cfqq->new_cfqq;
      	while (__cfqq) {
      		if (__cfqq == cfqq) {
      			WARN(1, "cfqq->new_cfqq loop detected\n");
      			break;
      		}
      		next = __cfqq->new_cfqq;
      		cfq_put_queue(__cfqq);
      		__cfqq = next;
      	}
      
      However, there is a hole in this logic.  Consider the following (and
      keep in mind that each I/O keeps a reference to the cfqq):
      
      q1->new_cfqq = q2   // q2 now has 2 process references
      q3->new_cfqq = q2   // q2 now has 3 process references
      
      // the process associated with q2 exits
      // q2 now has 2 process references
      
      // queue 1 exits, drops its reference on q2
      // q2 now has 1 process reference
      
      // q3 exits, so has 0 process references, and hence drops its references
      // to q2, which leaves q2 also with 0 process references
      
      q4 comes along and wants to merge with q3
      
      q3->new_cfqq still points at q2!  We follow that link and end up at an
      already freed cfqq.
      
      So, the fix is to not follow a merge chain if the top-most queue does
      not have a process reference, otherwise any queue in the chain could be
      already freed.  I also changed the logic to disallow merging with a
      queue that does not have any process references.  Previously, we did
      this check for one of the merge candidates, but not the other.  That
      doesn't really make sense.
      
      Without the attached patch, my system would BUG within a couple of
      seconds of running the reproducer program.  With the patch applied, my
      system ran the program for over an hour without issues.
      
      This addresses the following bugzilla:
          https://bugzilla.kernel.org/show_bug.cgi?id=16217
      
      Thanks a ton to Phil Carns for providing the bug report and an excellent
      reproducer.
      
      [ Note for stable: this applies to 2.6.32/33/34 ].
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NPhil Carns <carns@mcs.anl.gov>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c10b61f0
  9. 17 6月, 2010 1 次提交
    • C
      block: fix DISCARD_BARRIER requests · fbbf0556
      Christoph Hellwig 提交于
      Filesystems assume that DISCARD_BARRIER are full barriers, so that they
      don't have to track in-progress discard operation when submitting new I/O.
      But currently we only treat them as elevator barriers, which don't
      actually do the nessecary queue drains.
      
      Also remove the unlikely around both the DISCARD and BARRIER requests -
      the happen far too often for a static mispredict.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fbbf0556
  10. 04 6月, 2010 2 次提交
  11. 25 5月, 2010 1 次提交
    • S
      cfq-iosched: fix an oops caused by slab leak · d02a2c07
      Shaohua Li 提交于
      I got below oops when unloading cfq-iosched. Considering scenario:
      queue A merge to B, C merge to D and B will be merged to D. Before B is merged
      to D, we do split B. We should put B's reference for D.
      
      [  807.768536] =============================================================================
      [  807.768539] BUG cfq_queue: Objects remaining on kmem_cache_close()
      [  807.768541] -----------------------------------------------------------------------------
      [  807.768543]
      [  807.768546] INFO: Slab 0xffffea0003e6b4e0 objects=26 used=1 fp=0xffff88011d584fd8 flags=0x200000000004082
      [  807.768550] Pid: 5946, comm: rmmod Tainted: G        W   2.6.34-07097-gf4b87dee-dirty #724
      [  807.768552] Call Trace:
      [  807.768560]  [<ffffffff81104e8d>] slab_err+0x8f/0x9d
      [  807.768564]  [<ffffffff811059e1>] ? flush_cpu_slab+0x0/0x93
      [  807.768569]  [<ffffffff8164be52>] ? add_preempt_count+0xe/0xca
      [  807.768572]  [<ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
      [  807.768577]  [<ffffffff81648871>] ? _raw_spin_unlock+0x15/0x30
      [  807.768580]  [<ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
      [  807.768584]  [<ffffffff811061bc>] list_slab_objects+0x9b/0x19f
      [  807.768588]  [<ffffffff8164bf0a>] ? add_preempt_count+0xc6/0xca
      [  807.768591]  [<ffffffff81109e27>] kmem_cache_destroy+0x13f/0x21d
      [  807.768597]  [<ffffffffa000ff13>] cfq_slab_kill+0x1a/0x43 [cfq_iosched]
      [  807.768601]  [<ffffffffa000ffcf>] cfq_exit+0x93/0x9e [cfq_iosched]
      [  807.768606]  [<ffffffff810973a2>] sys_delete_module+0x1b1/0x219
      [  807.768612]  [<ffffffff8102fb5b>] system_call_fastpath+0x16/0x1b
      [  807.768618] INFO: Object 0xffff88011d584618 @offset=1560
      [  807.768622] INFO: Allocated in cfq_get_queue+0x11e/0x274 [cfq_iosched] age=7173 cpu=1 pid=5496
      [  807.768626] =============================================================================
      
      Cc: stable@kernel.org
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d02a2c07
  12. 24 5月, 2010 3 次提交
  13. 22 5月, 2010 1 次提交
  14. 11 5月, 2010 1 次提交
    • M
      block: allow initialization of previously allocated request_queue · 01effb0d
      Mike Snitzer 提交于
      blk_init_queue() allocates the request_queue structure and then
      initializes it as needed (request_fn, elevator, etc).
      
      Split initialization out to blk_init_allocated_queue_node.
      Introduce blk_init_allocated_queue wrapper function to model existing
      blk_init_queue and blk_init_queue_node interfaces.
      
      Export elv_register_queue to allow a newly added elevator to be
      registered with sysfs.  Export elv_unregister_queue for symmetry.
      
      These changes allow DM to initialize a device's request_queue with more
      precision.  In particular, DM no longer unconditionally initializes a
      full request_queue (elevator et al).  It only does so for a
      request-based DM device.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      01effb0d