1. 16 5月, 2011 1 次提交
    • V
      blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup · 70087dc3
      Vivek Goyal 提交于
      Currentlly we first map the task to cgroup and then cgroup to
      blkio_cgroup. There is a more direct way to get to blkio_cgroup
      from task using task_subsys_state(). Use that.
      
      The real reason for the fix is that it also avoids a race in generic
      cgroup code. During remount/umount rebind_subsystems() is called and
      it can do following with and rcu protection.
      
      cgrp->subsys[i] = NULL;
      
      That means if somebody got hold of cgroup under rcu and then it tried
      to do cgroup->subsys[] to get to blkio_cgroup, it would get NULL which
      is wrong. I was running into this race condition with ltp running on a
      upstream derived kernel and that lead to crash.
      
      So ideally we should also fix cgroup generic code to wait for rcu
      grace period before setting pointer to NULL. Li Zefan is not very keen
      on introducing synchronize_wait() as he thinks it will slow
      down moun/remount/umount operations.
      
      So for the time being atleast fix the kernel crash by taking a more
      direct route to blkio_cgroup.
      
      One tester had reported a crash while running LTP on a derived kernel
      and with this fix crash is no more seen while the test has been
      running for over 6 days.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      70087dc3
  2. 06 4月, 2011 1 次提交
  3. 31 3月, 2011 1 次提交
  4. 23 3月, 2011 1 次提交
  5. 10 3月, 2011 2 次提交
  6. 08 3月, 2011 2 次提交
  7. 03 3月, 2011 1 次提交
    • V
      block: Move blk_throtl_exit() call to blk_cleanup_queue() · da527770
      Vivek Goyal 提交于
      Move blk_throtl_exit() in blk_cleanup_queue() as blk_throtl_exit() is
      written in such a way that it needs queue lock. In blk_release_queue()
      there is no gurantee that ->queue_lock is still around.
      
      Initially blk_throtl_exit() was in blk_cleanup_queue() but Ingo reported
      one problem.
      
        https://lkml.org/lkml/2010/10/23/86
      
        And a quick fix moved blk_throtl_exit() to blk_release_queue().
      
              commit 7ad58c02
              Author: Jens Axboe <jaxboe@fusionio.com>
              Date:   Sat Oct 23 20:40:26 2010 +0200
      
              block: fix use-after-free bug in blk throttle code
      
      This patch reverts above change and does not try to shutdown the
      throtl work in blk_sync_queue(). By avoiding call to
      throtl_shutdown_timer_wq() from blk_sync_queue(), we should also avoid
      the problem reported by Ingo.
      
      blk_sync_queue() seems to be used only by md driver and it seems to be
      using it to make sure q->unplug_fn is not called as md registers its
      own unplug functions and it is about to free up the data structures
      used by unplug_fn(). Block throttle does not call back into unplug_fn()
      or into md. So there is no need to cancel blk throttle work.
      
      In fact I think cancelling block throttle work is bad because it might
      happen that some bios are throttled and scheduled to be dispatched later
      with the help of pending work and if work is cancelled, these bios might
      never be dispatched.
      
      Block layer also uses blk_sync_queue() during blk_cleanup_queue() and
      blk_release_queue() time. That should be safe as we are also calling
      blk_throtl_exit() which should make sure all the throttling related
      data structures are cleaned up.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      da527770
  8. 02 3月, 2011 1 次提交
    • V
      blk-throttle: Do not use kblockd workqueue for throtl work · 450adcbe
      Vivek Goyal 提交于
      o Dominik Klein reported a system hang issue while doing some blkio
        throttling testing.
      
        https://lkml.org/lkml/2011/2/24/173
      
      o Some tracing revealed that CFQ was not dispatching any more jobs as
        queue unplug was not happening. And queue unplug was not happening
        because unplug work was not being called as there was one throttling
        work on same cpu which as not finished yet. And throttling work had not
        finished as it was tyring to dispatch a bio to CFQ but all the request
        descriptors were consume to it was put to sleep.
      
      o So basically it is a cyclic dependecny between CFQ unplug work and
        throtl dispatch work. Tejun suggested that use separate workqueue for
        such cases.
      
      o This patch uses a separate workqueue for throttle related work and
        does not rely on kblockd workqueue anymore.
      
      Cc: stable@kernel.org
      Reported-by: NDominik Klein <dk@in-telegence.net>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      450adcbe
  9. 19 1月, 2011 1 次提交
  10. 02 12月, 2010 2 次提交
    • V
      blk-throttle: Correct the placement of smp_rmb() · 04a6b516
      Vivek Goyal 提交于
      o I was discussing what are the variable being updated without spin lock and
        why do we need barriers and Oleg pointed out that location of smp_rmb()
        should be between read of td->limits_changed and tg->limits_changed. This
        patch fixes it.
      
      o Following is one possible sequence of events. Say cpu0 is executing
        throtl_update_blkio_group_read_bps() and cpu1 is executing
        throtl_process_limit_change().
      
       cpu0                                                cpu1
      
       tg->limits_changed = true;
       smp_mb__before_atomic_inc();
       atomic_inc(&td->limits_changed);
      
                                           if (!atomic_read(&td->limits_changed))
                                                   return;
      
                                           if (tg->limits_changed)
                                                   do_something;
      
       If cpu0 has updated tg->limits_changed and td->limits_changed, we want to
       make sure that if update to td->limits_changed is visible on cpu1, then
       update to tg->limits_changed should also be visible.
      
       Oleg pointed out to ensure that we need to insert an smp_rmb() between
       td->limits_changed read and tg->limits_changed read.
      
      o I had erroneously put smp_rmb() before atomic_read(&td->limits_changed).
        This patch fixes it.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      04a6b516
    • V
      blk-throttle: Trim/adjust slice_end once a bio has been dispatched · d1ae8ffd
      Vivek Goyal 提交于
      o During some testing I did following and noticed throttling stops working.
      
              - Put a very low limit on a cgroup, say 1 byte per second.
              - Start some reads, this will set slice_end to a very high value.
              - Change the limit to higher value say 1MB/s
              - Now IO unthrottles and finishes as expected.
              - Try to do the read again but IO is not limited to 1MB/s as expected.
      
      o What is happening.
              - Initially low value of limit sets slice_end to a very high value.
              - During updation of limit, slice_end is not being truncated.
              - Very high value of slice_end leads to keeping the existing slice
                valid for a very long time and new slice does not start.
              - tg_may_dispatch() is called in blk_throtle_bio(), and trim_slice()
                is not called in this path. So slice_start is some old value and
                practically we are able to do huge amount of IO.
      
      o There are many ways it can be fixed. I have fixed it by trying to
        adjust/cleanup slice_end in trim_slice(). Generally we extend slices if bio
        is big and can't be dispatched in one slice. After dispatch of bio, readjust
        the slice_end to make sure we don't end up with huge values.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      d1ae8ffd
  11. 16 11月, 2010 1 次提交
  12. 02 10月, 2010 2 次提交
  13. 01 10月, 2010 3 次提交
    • V
      blkio-throttle: Fix link failure failure on i386 · 3aad5d3e
      Vivek Goyal 提交于
      o Randy Dunlap reported following linux-next failure. This patch fixes it.
      
      on i386:
      
      blk-throttle.c:(.text+0x1abb8): undefined reference to `__udivdi3'
      blk-throttle.c:(.text+0x1b1dc): undefined reference to `__udivdi3'
      
      o bytes_per_second interface is 64bit and I was continuing to do 64 bit
        division even on 32bit platform without help of special macros/functions
        hence the failure.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Reported-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      3aad5d3e
    • V
      blkio: Recalculate the throttled bio dispatch time upon throttle limit change · fe071437
      Vivek Goyal 提交于
      o Currently any cgroup throttle limit changes are processed asynchronousy and
        the change does not take affect till a new bio is dispatched from same group.
      
      o It might happen that a user sets a redicuously low limit on throttling.
        Say 1 bytes per second on reads. In such cases simple operations like mount
        a disk can wait for a very long time.
      
      o Once bio is throttled, there is no easy way to come out of that wait even if
        user increases the read limit later.
      
      o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
        the bio dispatch time according to new limits.
      
      o Can't take queueu lock under blkcg_lock, hence after the change I wake
        up the dispatch thread again which recalculates the time. So there are some
        variables being synchronized across two threads without lock and I had to
        make use of barriers. Hoping I have used barriers correctly. Any review of
        memory barrier code especially will help.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      fe071437
    • V
      blkio: Add root group to td->tg_list · 02977e4a
      Vivek Goyal 提交于
      o Currently all the dynamically allocated groups, except root grp is added
        to td->tg_list. This was not a problem so far but in next patch I will
        travel through td->tg_list to process any updates of limits on the group.
        If root group is not in tg_list, then root group's updates are not
        processed.
      
      o It is better to root group also to tg_list instead of doing special
        processing for it during limit updates.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02977e4a
  14. 16 9月, 2010 2 次提交