1. 14 1月, 2011 2 次提交
  2. 07 1月, 2011 2 次提交
  3. 17 12月, 2010 1 次提交
  4. 13 12月, 2010 1 次提交
  5. 01 12月, 2010 2 次提交
  6. 09 11月, 2010 1 次提交
  7. 08 11月, 2010 3 次提交
    • S
      cfq-iosched: don't idle if a deep seek queue is slow · 8e1ac665
      Shaohua Li 提交于
      If a deep seek queue slowly deliver requests but disk is much faster, idle
      for the queue just wastes disk throughput. If the queue delevers all requests
      before half its slice is used, the patch disable idle for it.
      In my test, application delivers 32 requests one time, the disk can accept
      128 requests at maxium and disk is fast. without the patch, the throughput
      is just around 30m/s, while with it, the speed is about 80m/s. The disk is
      a SSD, but is detected as a rotational disk. I can configure it as SSD, but
      I thought the deep seek queue logic should be fixed too, for example,
      considering a fast raid.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8e1ac665
    • S
      cfq-iosched: schedule dispatch for noidle queue · d2d59e18
      Shaohua Li 提交于
      A queue is idle at cfq_dispatch_requests(), but it gets noidle later. Unless
      other task explictly does unplug or all requests are drained, we will not
      deliever requests to the disk even cfq_arm_slice_timer doesn't make the
      queue idle. For example, cfq_should_idle() returns true because of
      service_tree->count == 1, and then other queues are added. Note, I didn't
      see obvious performance impacts so far with the patch, but just thought
      this could be a problem.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d2d59e18
    • S
      cfq-iosched: do cleanup · c1e44756
      Shaohua Li 提交于
      Some functions should return boolean.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c1e44756
  8. 02 11月, 2010 1 次提交
  9. 22 10月, 2010 1 次提交
    • V
      cfq-iosched: Fix a gcc 4.5 warning and put some comments · b4627321
      Vivek Goyal 提交于
      - Andi encountedred following warning with gcc 4.5
      
        linux/block/cfq-iosched.c: In function ‘cfq_dispatch_requests’:
        linux/block/cfq-iosched.c:2156:3: warning: array subscript is above array
        bounds
      
      - Warning happens due to following code.
      
        slice = group_slice * count /
      		max_t(unsigned, cfqg->busy_queues_avg[cfqd->serving_prio],
      		cfq_group_busy_queues_wl(cfqd->serving_prio, cfqd, cfqg));
      
        gcc is complaining about cfqg->busy_queues_avg[] being indexed by CFQ
        prio classes (RT, BE and IDLE) while the array size is only 2.
      
      - At run time, we never access cfqg->busy_queues_avg[IDLE] and return from
        function before this code hits.
      
      - To fix warning increase the array size though it will remain unused. This
        patch also puts some comments to clarify some of the confusions.
      
      - I have taken Jens's patch and modified it a bit.
      
      - Compile tested with gcc 4.4 and boot tested. I don't have gcc 4.5
        running, Andi can you please test it with gcc 4.5 to make sure it
        worked.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b4627321
  10. 01 10月, 2010 1 次提交
    • V
      blkio: Recalculate the throttled bio dispatch time upon throttle limit change · fe071437
      Vivek Goyal 提交于
      o Currently any cgroup throttle limit changes are processed asynchronousy and
        the change does not take affect till a new bio is dispatched from same group.
      
      o It might happen that a user sets a redicuously low limit on throttling.
        Say 1 bytes per second on reads. In such cases simple operations like mount
        a disk can wait for a very long time.
      
      o Once bio is throttled, there is no easy way to come out of that wait even if
        user increases the read limit later.
      
      o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
        the bio dispatch time according to new limits.
      
      o Can't take queueu lock under blkcg_lock, hence after the change I wake
        up the dispatch thread again which recalculates the time. So there are some
        variables being synchronized across two threads without lock and I had to
        make use of barriers. Hoping I have used barriers correctly. Any review of
        memory barrier code especially will help.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fe071437
  11. 21 9月, 2010 1 次提交
    • V
      cfq-iosched: fix a kernel OOPs when usb key is inserted · 180be2a0
      Vivek Goyal 提交于
      Mike reported a kernel crash when a usb key hotplug is performed while all
      kernel thrads are not in a root cgroup and are running in one of the child
      cgroups of blkio controller.
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000002c
      	IP: [<c11c7b08>] cfq_get_queue+0x232/0x412
      	*pde = 00000000
      	Oops: 0000 [#1] PREEMPT
      	last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/host3/scsi_host/host3/uevent
      
      	[..]
      	Pid: 30039, comm: scsi_scan_3 Not tainted 2.6.35.2-fg.roam #1 Volvi2                         /Aspire 4315
      	EIP: 0060:[<c11c7b08>] EFLAGS: 00010086 CPU: 0
      	EIP is at cfq_get_queue+0x232/0x412
      	EAX: f705f9c0 EBX: e977abac ECX: 00000000 EDX: 00000000
      	ESI: f00da400 EDI: f00da4ec EBP: e977a800 ESP: dff8fd00
      	 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      	Process scsi_scan_3 (pid: 30039, ti=dff8e000 task=f6b6c9a0 task.ti=dff8e000)
      	Stack:
      	 00000000 00000000 00000001 01ff0000 f00da508 00000000 f00da524 f00da540
      	<0> e7994940 dd631750 f705f9c0 e977a820 e977ac44 f00da4d0 00000001 f6b6c9a0
      	<0> 00000010 00008010 0000000b 00000000 00000001 e977a800 dd76fac0 00000246
      	Call Trace:
      	 [<c11c7f10>] ? cfq_set_request+0x228/0x34c
      	 [<c11c7ce8>] ? cfq_set_request+0x0/0x34c
      	 [<c11bb3b9>] ? elv_set_request+0xf/0x1c
      	 [<c11bdd51>] ? get_request+0x1ad/0x22f
      	 [<c11bddf2>] ? get_request_wait+0x1f/0x11a
      	 [<c11d013b>] ? kvasprintf+0x33/0x3b
      	 [<c127b537>] ? scsi_execute+0x1d/0x103
      	 [<c127b675>] ? scsi_execute_req+0x58/0x83
      	 [<c127c391>] ? scsi_probe_and_add_lun+0x188/0x7c2
      	 [<c12718c6>] ? attribute_container_add_device+0x15/0xfa
      	 [<c11c95d1>] ? kobject_get+0xf/0x13
      	 [<c126d1db>] ? get_device+0x10/0x14
      	 [<c127be93>] ? scsi_alloc_target+0x217/0x24d
      	 [<c127cbd8>] ? __scsi_scan_target+0x95/0x480
      	 [<c10204eb>] ? dequeue_entity+0x14/0x1fe
      	 [<c1020491>] ? update_curr+0x165/0x1ab
      	 [<c1020491>] ? update_curr+0x165/0x1ab
      	 [<c127d00d>] ? scsi_scan_channel+0x4a/0x76
      	 [<c127d0b0>] ? scsi_scan_host_selected+0x77/0xad
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c127d137>] ? do_scsi_scan_host+0x51/0x56
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c127d14a>] ? do_scan_async+0xe/0x11a
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c10354c5>] ? kthread+0x5e/0x63
      	 [<c1035467>] ? kthread+0x0/0x63
      	 [<c1002af6>] ? kernel_thread_helper+0x6/0x10
      	Code: 44 24 1c 54 83 44 24 18 54 83 fa 03 75 94 8b 06 c7 86 64 02 00 00 01 00 00 00 83 e0 03 09 f0 89 06 8b 44 24 28 8b 90 58 01 00 00 <8b> 42 2c 85 c0 75 03 8b 42 08 8d 54 24 48 52 8d 4c 24 50 51 68
      	EIP: [<c11c7b08>] cfq_get_queue+0x232/0x412 SS:ESP 0068:dff8fd00
      	CR2: 000000000000002c
      	---[ end trace 9a88306573f69b12 ]---
      
      The problem here is that we don't have bdi->dev information available when
      thread does some IO.  Hence when dev_name() tries to access bdi->dev, it
      crashes.
      
      This problem does not happen if kernel threads are in root group as root
      group is statically allocated at device initialization time and we don't
      hit this piece of code.
      
      Fix it by delaying the filling of major and minor number information of
      device in blk_group.  Initially a blk_group is created with 0 as device
      information and this information is filled later once some more IO comes
      in from same group.
      Reported-by: NMike Kazantsev <mk.fraggod@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      180be2a0
  12. 20 9月, 2010 1 次提交
    • C
      cfq: improve fsync performance for small files · 749ef9f8
      Corrado Zoccolo 提交于
      Fsync performance for small files achieved by cfq on high-end disks is
      lower than what deadline can achieve, due to idling introduced between
      the sync write happening in process context and the journal commit.
      
      Moreover, when competing with a sequential reader, a process writing
      small files and fsync-ing them is starved.
      
      This patch fixes the two problems by:
      - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
        flag set,
      - force all queues that have REQ_NOIDLE requests to be put in the noidle
        tree.
      
      Having the queue associated to the fsync-ing process and the one associated
       to journal commits in the noidle tree allows:
      - switching between them without idling,
      - fairness vs. competing idling queues, since they will be serviced only
        after the noidle tree expires its slice.
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Tested-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      749ef9f8
  13. 16 9月, 2010 1 次提交
  14. 23 8月, 2010 4 次提交
  15. 08 8月, 2010 2 次提交
  16. 19 6月, 2010 1 次提交
    • V
      cfq-iosched: Fixed boot warning with BLK_CGROUP=y and CFQ_GROUP_IOSCHED=n · e98ef89b
      Vivek Goyal 提交于
      Hi Jens,
      
      Few days back Ingo noticed a CFQ boot time warning. This patch fixes it.
      The issue here is that with CFQ_GROUP_IOSCHED=n, CFQ should not really
      be making blkio stat related calls.
      
      > Hm, it's still not entirely fixed, as of 2.6.35-rc2-00131-g7908a9e5. With
      > some
      > configs i get bad spinlock warnings during bootup:
      >
      > [   28.968013] initcall net_olddevs_init+0x0/0x82 returned 0 after 93750
      > usecs
      > [   28.972003] calling  b44_init+0x0/0x55 @ 1
      > [   28.976009] bus: 'pci': add driver b44
      > [   28.976374]  sda:
      > [   28.978157] BUG: spinlock bad magic on CPU#1, async/0/117
      > [   28.980000]  lock: 7e1c5bbc, .magic: 00000000, .owner: <none>/-1, +.owner_cpu: 0
      > [   28.980000] Pid: 117, comm: async/0 Not tainted +2.6.35-rc2-tip-01092-g010e7ef-dirty #8183
      > [   28.980000] Call Trace:
      > [   28.980000]  [<41ba6d55>] ? printk+0x20/0x24
      > [   28.980000]  [<4134b7b7>] spin_bug+0x7c/0x87
      > [   28.980000]  [<4134b853>] do_raw_spin_lock+0x1e/0x123
      > [   28.980000]  [<41ba92ca>] ? _raw_spin_lock_irqsave+0x12/0x20
      > [   28.980000]  [<41ba92d2>] _raw_spin_lock_irqsave+0x1a/0x20
      > [   28.980000]  [<4133476f>] blkiocg_update_io_add_stats+0x25/0xfb
      > [   28.980000]  [<41335dae>] ? cfq_prio_tree_add+0xb1/0xc1
      > [   28.980000]  [<41337bc7>] cfq_insert_request+0x8c/0x425
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      e98ef89b
  17. 18 6月, 2010 1 次提交
    • J
      cfq: Don't allow queue merges for queues that have no process references · c10b61f0
      Jeff Moyer 提交于
      Hi,
      
      A user reported a kernel bug when running a particular program that did
      the following:
      
      created 32 threads
      - each thread took a mutex, grabbed a global offset, added a buffer size
        to that offset, released the lock
      - read from the given offset in the file
      - created a new thread to do the same
      - exited
      
      The result is that cfq's close cooperator logic would trigger, as the
      threads were issuing I/O within the mean seek distance of one another.
      This workload managed to routinely trigger a use after free bug when
      walking the list of merge candidates for a particular cfqq
      (cfqq->new_cfqq).  The logic used for merging queues looks like this:
      
      static void cfq_setup_merge(struct cfq_queue *cfqq, struct cfq_queue *new_cfqq)
      {
      	int process_refs, new_process_refs;
      	struct cfq_queue *__cfqq;
      
      	/* Avoid a circular list and skip interim queue merges */
      	while ((__cfqq = new_cfqq->new_cfqq)) {
      		if (__cfqq == cfqq)
      			return;
      		new_cfqq = __cfqq;
      	}
      
      	process_refs = cfqq_process_refs(cfqq);
      	/*
      	 * If the process for the cfqq has gone away, there is no
      	 * sense in merging the queues.
      	 */
      	if (process_refs == 0)
      		return;
      
      	/*
      	 * Merge in the direction of the lesser amount of work.
      	 */
      	new_process_refs = cfqq_process_refs(new_cfqq);
      	if (new_process_refs >= process_refs) {
      		cfqq->new_cfqq = new_cfqq;
      		atomic_add(process_refs, &new_cfqq->ref);
      	} else {
      		new_cfqq->new_cfqq = cfqq;
      		atomic_add(new_process_refs, &cfqq->ref);
      	}
      }
      
      When a merge candidate is found, we add the process references for the
      queue with less references to the queue with more.  The actual merging
      of queues happens when a new request is issued for a given cfqq.  In the
      case of the test program, it only does a single pread call to read in
      1MB, so the actual merge never happens.
      
      Normally, this is fine, as when the queue exits, we simply drop the
      references we took on the other cfqqs in the merge chain:
      
      	/*
      	 * If this queue was scheduled to merge with another queue, be
      	 * sure to drop the reference taken on that queue (and others in
      	 * the merge chain).  See cfq_setup_merge and cfq_merge_cfqqs.
      	 */
      	__cfqq = cfqq->new_cfqq;
      	while (__cfqq) {
      		if (__cfqq == cfqq) {
      			WARN(1, "cfqq->new_cfqq loop detected\n");
      			break;
      		}
      		next = __cfqq->new_cfqq;
      		cfq_put_queue(__cfqq);
      		__cfqq = next;
      	}
      
      However, there is a hole in this logic.  Consider the following (and
      keep in mind that each I/O keeps a reference to the cfqq):
      
      q1->new_cfqq = q2   // q2 now has 2 process references
      q3->new_cfqq = q2   // q2 now has 3 process references
      
      // the process associated with q2 exits
      // q2 now has 2 process references
      
      // queue 1 exits, drops its reference on q2
      // q2 now has 1 process reference
      
      // q3 exits, so has 0 process references, and hence drops its references
      // to q2, which leaves q2 also with 0 process references
      
      q4 comes along and wants to merge with q3
      
      q3->new_cfqq still points at q2!  We follow that link and end up at an
      already freed cfqq.
      
      So, the fix is to not follow a merge chain if the top-most queue does
      not have a process reference, otherwise any queue in the chain could be
      already freed.  I also changed the logic to disallow merging with a
      queue that does not have any process references.  Previously, we did
      this check for one of the merge candidates, but not the other.  That
      doesn't really make sense.
      
      Without the attached patch, my system would BUG within a couple of
      seconds of running the reproducer program.  With the patch applied, my
      system ran the program for over an hour without issues.
      
      This addresses the following bugzilla:
          https://bugzilla.kernel.org/show_bug.cgi?id=16217
      
      Thanks a ton to Phil Carns for providing the bug report and an excellent
      reproducer.
      
      [ Note for stable: this applies to 2.6.32/33/34 ].
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Reported-by: NPhil Carns <carns@mcs.anl.gov>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c10b61f0
  18. 25 5月, 2010 1 次提交
    • S
      cfq-iosched: fix an oops caused by slab leak · d02a2c07
      Shaohua Li 提交于
      I got below oops when unloading cfq-iosched. Considering scenario:
      queue A merge to B, C merge to D and B will be merged to D. Before B is merged
      to D, we do split B. We should put B's reference for D.
      
      [  807.768536] =============================================================================
      [  807.768539] BUG cfq_queue: Objects remaining on kmem_cache_close()
      [  807.768541] -----------------------------------------------------------------------------
      [  807.768543]
      [  807.768546] INFO: Slab 0xffffea0003e6b4e0 objects=26 used=1 fp=0xffff88011d584fd8 flags=0x200000000004082
      [  807.768550] Pid: 5946, comm: rmmod Tainted: G        W   2.6.34-07097-gf4b87dee-dirty #724
      [  807.768552] Call Trace:
      [  807.768560]  [<ffffffff81104e8d>] slab_err+0x8f/0x9d
      [  807.768564]  [<ffffffff811059e1>] ? flush_cpu_slab+0x0/0x93
      [  807.768569]  [<ffffffff8164be52>] ? add_preempt_count+0xe/0xca
      [  807.768572]  [<ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
      [  807.768577]  [<ffffffff81648871>] ? _raw_spin_unlock+0x15/0x30
      [  807.768580]  [<ffffffff8164bd9c>] ? sub_preempt_count+0xe/0xb6
      [  807.768584]  [<ffffffff811061bc>] list_slab_objects+0x9b/0x19f
      [  807.768588]  [<ffffffff8164bf0a>] ? add_preempt_count+0xc6/0xca
      [  807.768591]  [<ffffffff81109e27>] kmem_cache_destroy+0x13f/0x21d
      [  807.768597]  [<ffffffffa000ff13>] cfq_slab_kill+0x1a/0x43 [cfq_iosched]
      [  807.768601]  [<ffffffffa000ffcf>] cfq_exit+0x93/0x9e [cfq_iosched]
      [  807.768606]  [<ffffffff810973a2>] sys_delete_module+0x1b1/0x219
      [  807.768612]  [<ffffffff8102fb5b>] system_call_fastpath+0x16/0x1b
      [  807.768618] INFO: Object 0xffff88011d584618 @offset=1560
      [  807.768622] INFO: Allocated in cfq_get_queue+0x11e/0x274 [cfq_iosched] age=7173 cpu=1 pid=5496
      [  807.768626] =============================================================================
      
      Cc: stable@kernel.org
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      d02a2c07
  19. 24 5月, 2010 2 次提交
    • K
      cfq-iosched: compact io_context radix_tree · 80b15c73
      Konstantin Khlebnikov 提交于
      Use small consequent indexes as radix tree keys instead of sparse cfqd address.
      
      This change will reduce radix tree depth from 11 (6 for 32-bit hosts)
      to 1 if host have <=64 disks under cfq control, or to 0 if there only one disk.
      So, this patch save 10*560 bytes for each process (5*296 for 32-bit hosts)
      
      For each cfqd allocate cic index from ida.
      To unlink dead cic from tree without cfqd access store index into ->key.
      (bit 0 -- dead mark, bits 1..30 -- index: ida produce id in range 0..2^31-1)
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      80b15c73
    • K
      cfq-iosched: remove dead_key from cfq_io_context · bca4b914
      Konstantin Khlebnikov 提交于
      Remove ->dead_key field from cfq_io_context to shrink its size to 128 bytes.
      (64 bytes for 32-bit hosts)
      
      Use lower bit in ->key as dead-mark, instead of moving key to separate field.
      After this for dead cfq_io_context we got cic->key != cfqd automatically.
      Thus, io_context's last-hit cache should work without changing.
      
      Now to check ->key for non-dead state compare it with cfqd,
      instead of checking ->key for non-null value as it was before.
      
      Plus remove obsolete race protection in cfq_cic_lookup.
      This race gone after v2.6.24-1728-g4ac845a2Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      bca4b914
  20. 22 5月, 2010 1 次提交
  21. 06 5月, 2010 1 次提交
    • V
      blk-cgroup: Fix RCU correctness warning in cfq_init_queue() · dcf097b2
      Vivek Goyal 提交于
      It is necessary to be in an RCU read-side critical section when invoking
      css_id(), so this patch adds one to blkiocg_add_blkio_group().  This is
      actually a false positive, because this is called at initialization time
      and hence always refers to the root cgroup, which cannot go away.
      
      [  103.790505] ===================================================
      [  103.790509] [ INFO: suspicious rcu_dereference_check() usage. ]
      [  103.790511] ---------------------------------------------------
      [  103.790514] kernel/cgroup.c:4432 invoked rcu_dereference_check() without protection!
      [  103.790517]
      [  103.790517] other info that might help us debug this:
      [  103.790519]
      [  103.790521]
      [  103.790521] rcu_scheduler_active = 1, debug_locks = 1
      [  103.790524] 4 locks held by bash/4422:
      [  103.790526]  #0:  (&buffer->mutex){+.+.+.}, at: [<ffffffff8114befa>] sysfs_write_file+0x3c/0x144
      [  103.790537]  #1:  (s_active#102){.+.+.+}, at: [<ffffffff8114bfa5>] sysfs_write_file+0xe7/0x144
      [  103.790544]  #2:  (&q->sysfs_lock){+.+.+.}, at: [<ffffffff812263b1>] queue_attr_store+0x49/0x8f
      [  103.790552]  #3:  (&(&blkcg->lock)->rlock){......}, at: [<ffffffff8122e4db>] blkiocg_add_blkio_group+0x2b/0xad
      [  103.790560]
      [  103.790561] stack backtrace:
      [  103.790564] Pid: 4422, comm: bash Not tainted 2.6.34-rc4-blkio-second-crash #81
      [  103.790567] Call Trace:
      [  103.790572]  [<ffffffff81068f57>] lockdep_rcu_dereference+0x9d/0xa5
      [  103.790577]  [<ffffffff8107fac1>] css_id+0x44/0x57
      [  103.790581]  [<ffffffff8122e503>] blkiocg_add_blkio_group+0x53/0xad
      [  103.790586]  [<ffffffff81231936>] cfq_init_queue+0x139/0x32c
      [  103.790591]  [<ffffffff8121f2d0>] elv_iosched_store+0xbf/0x1bf
      [  103.790595]  [<ffffffff812263d8>] queue_attr_store+0x70/0x8f
      [  103.790599]  [<ffffffff8114bfa5>] ? sysfs_write_file+0xe7/0x144
      [  103.790603]  [<ffffffff8114bfc6>] sysfs_write_file+0x108/0x144
      [  103.790609]  [<ffffffff810f527f>] vfs_write+0xae/0x10b
      [  103.790612]  [<ffffffff81069863>] ? trace_hardirqs_on_caller+0x10c/0x130
      [  103.790616]  [<ffffffff810f539c>] sys_write+0x4a/0x6e
      [  103.790622]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [  103.790625]
      Located-by: NMiles Lane <miles.lane@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      dcf097b2
  22. 29 4月, 2010 1 次提交
  23. 27 4月, 2010 2 次提交
    • V
      blk-cgroup: config options re-arrangement · afc24d49
      Vivek Goyal 提交于
      This patch fixes few usability and configurability issues.
      
      o All the cgroup based controller options are configurable from
        "Genral Setup/Control Group Support/" menu. blkio is the only exception.
        Hence make this option visible in above menu and make it configurable from
        there to bring it inline with rest of the cgroup based controllers.
      
      o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.
      
        This option currently does two things.
      
        - Enable printing of cgroup paths in blktrace
        - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
          files in cgroup.
      
        If we are using group scheduling, blktrace data is of not really much use
        if cgroup information is not present. To get this data, currently one has to
        also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
        all the additional debug stat files which is not desired.
      
        Hence, this patch moves printing of cgroup paths under
        CONFIG_CFQ_GROUP_IOSCHED.
      
        This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
        the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
        can be enabled through config menu.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NDivyesh Shah <dpshah@google.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      afc24d49
    • V
      blkio: Fix another BUG_ON() crash due to cfqq movement across groups · e5ff082e
      Vivek Goyal 提交于
      o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
        assuming that upon slice expiry, group can't be marked empty already (except
        forced dispatch).
      
        But this assumption is broken if cfqq can move (group_isolation=0) across
        groups after receiving a request.
      
        I think most likely in this case we got a request in a cfqq and accounted
        the rq in one group, later while adding the cfqq to tree, we moved the queue
        to a different group which was already marked empty and after dispatch from
        slice we found group already marked empty and raised alarm.
      
        This patch does not error out if group is already marked empty. This can
        introduce some empty_time stat error only in case of group_isolation=0. This
        is better than crashing. In case of group_isolation=1 we should still get
        same stats as before this patch.
      
      [  222.308546] ------------[ cut here ]------------
      [  222.309311] kernel BUG at block/blk-cgroup.c:236!
      [  222.309311] invalid opcode: 0000 [#1] SMP
      [  222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
      [  222.309311] CPU 1
      [  222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      [  222.309311]
      [  222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
      [  222.309311] RIP: 0010:[<ffffffff8121ad88>]  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
      [  222.309311] RSP: 0018:ffff8800ba6e79f8  EFLAGS: 00010002
      [  222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
      [  222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
      [  222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
      [  222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
      [  222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
      [  222.309311] FS:  00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
      [  222.309311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
      [  222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
      [  222.309311] Stack:
      [  222.309311]  0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
      [  222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
      [  222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
      [  222.309311] Call Trace:
      [  222.309311]  [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec
      [  222.309311]  [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8
      [  222.309311]  [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10
      [  222.309311]  [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b
      [  222.309311]  [<ffffffff81210461>] blk_peek_request+0x191/0x1a7
      [  222.309311]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
      [  222.309311]  [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35
      [  222.309311]  [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37
      [  222.309311]  [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c
      [  222.309311]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
      [  222.309311]  [<ffffffff8120ca37>] blk_unplug+0x29/0x2d
      [  222.309311]  [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14
      [  222.309311]  [<ffffffff81109a7a>] block_sync_page+0x35/0x39
      [  222.309311]  [<ffffffff810ae616>] sync_page+0x41/0x4a
      [  222.309311]  [<ffffffff810ae62d>] sync_page_killable+0xe/0x35
      [  222.309311]  [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f
      [  222.309311]  [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d
      [  222.309311]  [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33
      [  222.309311]  [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e
      [  222.309311]  [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0
      [  222.309311]  [<ffffffff810ea044>] do_sync_read+0xcb/0x108
      [  222.309311]  [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18
      [  222.309311]  [<ffffffff810ea6ab>] vfs_read+0xab/0x108
      [  222.309311]  [<ffffffff810ea7c8>] sys_read+0x4a/0x6e
      [  222.309311]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [  222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
      [  222.309311] RIP  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
      [  222.309311]  RSP <ffff8800ba6e79f8>
      [  222.309311] ---[ end trace 32b4f71dffc15712 ]---
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NDivyesh Shah <dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      e5ff082e
  24. 21 4月, 2010 1 次提交
    • V
      blkio: Fix blkio crash during rq stat update · 7f1dc8a2
      Vivek Goyal 提交于
      blkio + cfq was crashing even when two sequential readers were put in two
      separate cgroups (group_isolation=0).
      
      The reason being that cfqq can migrate across groups based on its being
      sync-noidle or not, it can happen that at request insertion time, cfqq
      belonged to one cfqg and at request dispatch time, it belonged to root
      group. In this case request stats per cgroup can go wrong and it also runs
      into BUG_ON().
      
      This patch implements rq stashing away a cfq group pointer and not relying
      on cfqq->cfqg pointer alone for rq stat accounting.
      
      [   65.163523] ------------[ cut here ]------------
      [   65.164301] kernel BUG at block/blk-cgroup.c:117!
      [   65.164301] invalid opcode: 0000 [#1] SMP
      [   65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
      [   65.164301] CPU 1
      [   65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      [   65.164301]
      [   65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
      [   65.164301] RIP: 0010:[<ffffffff8121924f>]  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301] RSP: 0018:ffff8800ba5a79e8  EFLAGS: 00010046
      [   65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
      [   65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
      [   65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
      [   65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
      [   65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
      [   65.164301] FS:  00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
      [   65.164301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
      [   65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
      [   65.164301] Stack:
      [   65.164301]  ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
      [   65.164301] Call Trace:
      [   65.164301]  [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
      [   65.164301]  [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
      [   65.164301]  [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
      [   65.164301]  [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
      [   65.164301]  [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
      [   65.164301]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
      [   65.164301]  [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
      [   65.164301]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
      [   65.164301]  [<ffffffff8120b063>] blk_unplug+0x29/0x2d
      [   65.164301]  [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
      [   65.164301]  [<ffffffff81108a82>] block_sync_page+0x35/0x39
      [   65.164301]  [<ffffffff810ad64e>] sync_page+0x41/0x4a
      [   65.164301]  [<ffffffff810ad665>] sync_page_killable+0xe/0x35
      [   65.164301]  [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
      [   65.164301]  [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
      [   65.164301]  [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
      [   65.164301]  [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
      [   65.164301]  [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
      [   65.164301]  [<ffffffff810e906c>] do_sync_read+0xcb/0x108
      [   65.164301]  [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
      [   65.164301]  [<ffffffff810e96d3>] vfs_read+0xab/0x108
      [   65.164301]  [<ffffffff810e97f0>] sys_read+0x4a/0x6e
      [   65.164301]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [   65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
      [   65.164301] RIP  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301]  RSP <ffff8800ba5a79e8>
      [   65.164301] ---[ end trace 1b2b828753032e68 ]---
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7f1dc8a2
  25. 16 4月, 2010 1 次提交
  26. 14 4月, 2010 2 次提交
    • D
      blkio: Fix compile errors · 28baf442
      Divyesh Shah 提交于
      Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
      CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.
      Signed-off-by: NDivyesh Shah <dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      28baf442
    • D
      block: Update to io-controller stats · a11cdaa7
      Divyesh Shah 提交于
      Changelog from v1:
      o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
        dispatch time.
      
      Changelog from original patchset: (in response to Vivek Goyal's comments)
      o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
      o rename blkiocg_update_set_active_queue_stats() to
        blkiocg_update_avg_queue_size_stats()
      o s/request/io/ in blkiocg_update_request_add_stats() and
        blkiocg_update_request_remove_stats()
      o Call cfq_del_timer() at request dispatch() instead of
        blkiocg_update_idle_time_stats()
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      a11cdaa7
  27. 13 4月, 2010 1 次提交
    • G
      io-controller: Add a new interface "weight_device" for IO-Controller · 34d0f179
      Gui Jianfeng 提交于
      Currently, IO Controller makes use of blkio.weight to assign weight for
      all devices. Here a new user interface "blkio.weight_device" is introduced to
      assign different weights for different devices. blkio.weight becomes the
      default value for devices which are not configured by "blkio.weight_device"
      
      You can use the following format to assigned specific weight for a given
      device:
      #echo "major:minor weight" > blkio.weight_device
      
      major:minor represents device number.
      
      And you can remove weight for a given device as following:
      #echo "major:minor 0" > blkio.weight_device
      
      V1->V2 changes:
      - use user interface "weight_device" instead of "policy" suggested by Vivek
      - rename some struct suggested by Vivek
      - rebase to 2.6-block "for-linus" branch
      - remove an useless list_empty check pointed out by Li Zefan
      - some trivial typo fix
      
      V2->V3 changes:
      - Move policy_*_node() functions up to get rid of forward declarations
      - rename related functions by adding prefix "blkio_"
      Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      34d0f179
  28. 09 4月, 2010 1 次提交
    • D
      cfq-iosched: Fix the incorrect timeslice accounting with forced_dispatch · 3440c49f
      Divyesh Shah 提交于
      When CFQ dispatches requests forcefully due to a barrier or changing iosched,
      it runs through all cfqq's dispatching requests and then expires each queue.
      However, it does not activate a cfqq before flushing its IOs resulting in
      using stale values for computing slice_used.
      This patch fixes it by calling activate queue before flushing reuqests from
      each queue.
      
      This is useful mostly for barrier requests because when the iosched is changing
      it really doesnt matter if we have incorrect accounting since we're going to
      break down all structures anyway.
      
      We also now expire the current timeslice before moving on with the dispatch
      to accurately account slice used for that cfqq.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3440c49f