1. 16 5月, 2011 1 次提交
    • V
      blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup · 70087dc3
      Vivek Goyal 提交于
      Currentlly we first map the task to cgroup and then cgroup to
      blkio_cgroup. There is a more direct way to get to blkio_cgroup
      from task using task_subsys_state(). Use that.
      
      The real reason for the fix is that it also avoids a race in generic
      cgroup code. During remount/umount rebind_subsystems() is called and
      it can do following with and rcu protection.
      
      cgrp->subsys[i] = NULL;
      
      That means if somebody got hold of cgroup under rcu and then it tried
      to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
      is wrong. I was running into this race condition with ltp running on a
      upstream derived kernel and that lead to crash.
      
      So ideally we should also fix cgroup generic code to wait for rcu
      grace period before setting pointer to NULL. Li Zefan is not very keen
      on introducing synchronize_wait() as he thinks it will slow
      down moun/remount/umount operations.
      
      So for the time being atleast fix the kernel crash by taking a more
      direct route to blkio_cgroup.
      
      One tester had reported a crash while running LTP on a derived kernel
      and with this fix crash is no more seen while the test has been
      running for over 6 days.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      70087dc3
  2. 19 4月, 2011 1 次提交
  3. 18 4月, 2011 1 次提交
  4. 31 3月, 2011 1 次提交
  5. 23 3月, 2011 3 次提交
  6. 17 3月, 2011 1 次提交
  7. 12 3月, 2011 1 次提交
  8. 10 3月, 2011 1 次提交
  9. 07 3月, 2011 3 次提交
  10. 02 3月, 2011 2 次提交
    • T
      block: add @force_kblockd to __blk_run_queue() · 1654e741
      Tejun Heo 提交于
      __blk_run_queue() automatically either calls q->request_fn() directly
      or schedules kblockd depending on whether the function is recursed.
      blk-flush implementation needs to be able to explicitly choose
      kblockd.  Add @force_kblockd.
      
      All the current users are converted to specify %false for the
      parameter and this patch doesn't introduce any behavior change.
      
      stable: This is prerequisite for fixing ide oops caused by the new
              blk-flush implementation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: stable@kernel.org
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      1654e741
    • J
      cfq-iosched: Always provide group isolation. · 0bbfeb83
      Justin TerAvest 提交于
      Effectively, make group_isolation=1 the default and remove the tunable.
      The setting group_isolation=0 was because by default we idle on
      sync-noidle tree and on fast devices, this can be very harmful for
      throughput.
      
      However, this problem can also be addressed by tuning slice_idle and
      possibly group_idle on faster storage devices.
      
      This change simplifies the CFQ code by removing the feature entirely.
      Signed-off-by: NJustin TerAvest <teravest@google.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      0bbfeb83
  11. 11 2月, 2011 1 次提交
  12. 09 2月, 2011 1 次提交
    • J
      cfq-iosched: Don't wait if queue already has requests. · 02a8f01b
      Justin TerAvest 提交于
      Commit 7667aa06 added logic to wait for
      the last queue of the group to become busy (have at least one request),
      so that the group does not lose out for not being continuously
      backlogged. The commit did not check for the condition that the last
      queue already has some requests. As a result, if the queue already has
      requests, wait_busy is set. Later on, cfq_select_queue() checks the
      flag, and decides that since the queue has a request now and wait_busy
      is set, the queue is expired.  This results in early expiration of the
      queue.
      
      This patch fixes the problem by adding a check to see if queue already
      has requests. If it does, wait_busy is not set. As a result, time slices
      do not expire early.
      
      The queues with more than one request are usually buffered writers.
      Testing shows improvement in isolation between buffered writers.
      
      Cc: stable@kernel.org
      Signed-off-by: NJustin TerAvest <teravest@google.com>
      Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02a8f01b
  13. 19 1月, 2011 1 次提交
  14. 14 1月, 2011 2 次提交
  15. 07 1月, 2011 2 次提交
  16. 17 12月, 2010 1 次提交
  17. 13 12月, 2010 1 次提交
  18. 01 12月, 2010 2 次提交
  19. 09 11月, 2010 1 次提交
  20. 08 11月, 2010 3 次提交
    • S
      cfq-iosched: don't idle if a deep seek queue is slow · 8e1ac665
      Shaohua Li 提交于
      If a deep seek queue slowly deliver requests but disk is much faster, idle
      for the queue just wastes disk throughput. If the queue delevers all requests
      before half its slice is used, the patch disable idle for it.
      In my test, application delivers 32 requests one time, the disk can accept
      128 requests at maxium and disk is fast. without the patch, the throughput
      is just around 30m/s, while with it, the speed is about 80m/s. The disk is
      a SSD, but is detected as a rotational disk. I can configure it as SSD, but
      I thought the deep seek queue logic should be fixed too, for example,
      considering a fast raid.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      8e1ac665
    • S
      cfq-iosched: schedule dispatch for noidle queue · d2d59e18
      Shaohua Li 提交于
      A queue is idle at cfq_dispatch_requests(), but it gets noidle later. Unless
      other task explictly does unplug or all requests are drained, we will not
      deliever requests to the disk even cfq_arm_slice_timer doesn't make the
      queue idle. For example, cfq_should_idle() returns true because of
      service_tree->count == 1, and then other queues are added. Note, I didn't
      see obvious performance impacts so far with the patch, but just thought
      this could be a problem.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d2d59e18
    • S
      cfq-iosched: do cleanup · c1e44756
      Shaohua Li 提交于
      Some functions should return boolean.
      Signed-off-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      c1e44756
  21. 02 11月, 2010 1 次提交
  22. 22 10月, 2010 1 次提交
    • V
      cfq-iosched: Fix a gcc 4.5 warning and put some comments · b4627321
      Vivek Goyal 提交于
      - Andi encountedred following warning with gcc 4.5
      
        linux/block/cfq-iosched.c: In function ‘cfq_dispatch_requests’:
        linux/block/cfq-iosched.c:2156:3: warning: array subscript is above array
        bounds
      
      - Warning happens due to following code.
      
        slice = group_slice * count /
      		max_t(unsigned, cfqg->busy_queues_avg[cfqd->serving_prio],
      		cfq_group_busy_queues_wl(cfqd->serving_prio, cfqd, cfqg));
      
        gcc is complaining about cfqg->busy_queues_avg[] being indexed by CFQ
        prio classes (RT, BE and IDLE) while the array size is only 2.
      
      - At run time, we never access cfqg->busy_queues_avg[IDLE] and return from
        function before this code hits.
      
      - To fix warning increase the array size though it will remain unused. This
        patch also puts some comments to clarify some of the confusions.
      
      - I have taken Jens's patch and modified it a bit.
      
      - Compile tested with gcc 4.4 and boot tested. I don't have gcc 4.5
        running, Andi can you please test it with gcc 4.5 to make sure it
        worked.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      b4627321
  23. 01 10月, 2010 1 次提交
    • V
      blkio: Recalculate the throttled bio dispatch time upon throttle limit change · fe071437
      Vivek Goyal 提交于
      o Currently any cgroup throttle limit changes are processed asynchronousy and
        the change does not take affect till a new bio is dispatched from same group.
      
      o It might happen that a user sets a redicuously low limit on throttling.
        Say 1 bytes per second on reads. In such cases simple operations like mount
        a disk can wait for a very long time.
      
      o Once bio is throttled, there is no easy way to come out of that wait even if
        user increases the read limit later.
      
      o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
        the bio dispatch time according to new limits.
      
      o Can't take queueu lock under blkcg_lock, hence after the change I wake
        up the dispatch thread again which recalculates the time. So there are some
        variables being synchronized across two threads without lock and I had to
        make use of barriers. Hoping I have used barriers correctly. Any review of
        memory barrier code especially will help.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fe071437
  24. 21 9月, 2010 1 次提交
    • V
      cfq-iosched: fix a kernel OOPs when usb key is inserted · 180be2a0
      Vivek Goyal 提交于
      Mike reported a kernel crash when a usb key hotplug is performed while all
      kernel thrads are not in a root cgroup and are running in one of the child
      cgroups of blkio controller.
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000002c
      	IP: [<c11c7b08>] cfq_get_queue+0x232/0x412
      	*pde = 00000000
      	Oops: 0000 [#1] PREEMPT
      	last sysfs file: /sys/devices/pci0000:00/0000:00:1d.7/usb2/2-1/2-1:1.0/host3/scsi_host/host3/uevent
      
      	[..]
      	Pid: 30039, comm: scsi_scan_3 Not tainted 2.6.35.2-fg.roam #1 Volvi2                         /Aspire 4315
      	EIP: 0060:[<c11c7b08>] EFLAGS: 00010086 CPU: 0
      	EIP is at cfq_get_queue+0x232/0x412
      	EAX: f705f9c0 EBX: e977abac ECX: 00000000 EDX: 00000000
      	ESI: f00da400 EDI: f00da4ec EBP: e977a800 ESP: dff8fd00
      	 DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
      	Process scsi_scan_3 (pid: 30039, ti=dff8e000 task=f6b6c9a0 task.ti=dff8e000)
      	Stack:
      	 00000000 00000000 00000001 01ff0000 f00da508 00000000 f00da524 f00da540
      	<0> e7994940 dd631750 f705f9c0 e977a820 e977ac44 f00da4d0 00000001 f6b6c9a0
      	<0> 00000010 00008010 0000000b 00000000 00000001 e977a800 dd76fac0 00000246
      	Call Trace:
      	 [<c11c7f10>] ? cfq_set_request+0x228/0x34c
      	 [<c11c7ce8>] ? cfq_set_request+0x0/0x34c
      	 [<c11bb3b9>] ? elv_set_request+0xf/0x1c
      	 [<c11bdd51>] ? get_request+0x1ad/0x22f
      	 [<c11bddf2>] ? get_request_wait+0x1f/0x11a
      	 [<c11d013b>] ? kvasprintf+0x33/0x3b
      	 [<c127b537>] ? scsi_execute+0x1d/0x103
      	 [<c127b675>] ? scsi_execute_req+0x58/0x83
      	 [<c127c391>] ? scsi_probe_and_add_lun+0x188/0x7c2
      	 [<c12718c6>] ? attribute_container_add_device+0x15/0xfa
      	 [<c11c95d1>] ? kobject_get+0xf/0x13
      	 [<c126d1db>] ? get_device+0x10/0x14
      	 [<c127be93>] ? scsi_alloc_target+0x217/0x24d
      	 [<c127cbd8>] ? __scsi_scan_target+0x95/0x480
      	 [<c10204eb>] ? dequeue_entity+0x14/0x1fe
      	 [<c1020491>] ? update_curr+0x165/0x1ab
      	 [<c1020491>] ? update_curr+0x165/0x1ab
      	 [<c127d00d>] ? scsi_scan_channel+0x4a/0x76
      	 [<c127d0b0>] ? scsi_scan_host_selected+0x77/0xad
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c127d137>] ? do_scsi_scan_host+0x51/0x56
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c127d14a>] ? do_scan_async+0xe/0x11a
      	 [<c127d13c>] ? do_scan_async+0x0/0x11a
      	 [<c10354c5>] ? kthread+0x5e/0x63
      	 [<c1035467>] ? kthread+0x0/0x63
      	 [<c1002af6>] ? kernel_thread_helper+0x6/0x10
      	Code: 44 24 1c 54 83 44 24 18 54 83 fa 03 75 94 8b 06 c7 86 64 02 00 00 01 00 00 00 83 e0 03 09 f0 89 06 8b 44 24 28 8b 90 58 01 00 00 <8b> 42 2c 85 c0 75 03 8b 42 08 8d 54 24 48 52 8d 4c 24 50 51 68
      	EIP: [<c11c7b08>] cfq_get_queue+0x232/0x412 SS:ESP 0068:dff8fd00
      	CR2: 000000000000002c
      	---[ end trace 9a88306573f69b12 ]---
      
      The problem here is that we don't have bdi->dev information available when
      thread does some IO.  Hence when dev_name() tries to access bdi->dev, it
      crashes.
      
      This problem does not happen if kernel threads are in root group as root
      group is statically allocated at device initialization time and we don't
      hit this piece of code.
      
      Fix it by delaying the filling of major and minor number information of
      device in blk_group.  Initially a blk_group is created with 0 as device
      information and this information is filled later once some more IO comes
      in from same group.
      Reported-by: NMike Kazantsev <mk.fraggod@gmail.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      180be2a0
  25. 20 9月, 2010 1 次提交
    • C
      cfq: improve fsync performance for small files · 749ef9f8
      Corrado Zoccolo 提交于
      Fsync performance for small files achieved by cfq on high-end disks is
      lower than what deadline can achieve, due to idling introduced between
      the sync write happening in process context and the journal commit.
      
      Moreover, when competing with a sequential reader, a process writing
      small files and fsync-ing them is starved.
      
      This patch fixes the two problems by:
      - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
        flag set,
      - force all queues that have REQ_NOIDLE requests to be put in the noidle
        tree.
      
      Having the queue associated to the fsync-ing process and the one associated
       to journal commits in the noidle tree allows:
      - switching between them without idling,
      - fairness vs. competing idling queues, since they will be serviced only
        after the noidle tree expires its slice.
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Tested-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NCorrado Zoccolo <czoccolo@gmail.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      749ef9f8
  26. 16 9月, 2010 1 次提交
  27. 23 8月, 2010 4 次提交