• P
    sched: Cleanup bandwidth timers · 77a4d1a1
    Peter Zijlstra 提交于
    Roman reported a 3 cpu lockup scenario involving __start_cfs_bandwidth().
    
    The more I look at that code the more I'm convinced its crack, that
    entire __start_cfs_bandwidth() thing is brain melting, we don't need to
    cancel a timer before starting it, *hrtimer_start*() will happily remove
    the timer for you if its still enqueued.
    
    Removing that, removes a big part of the problem, no more ugly cancel
    loop to get stuck in.
    
    So now, if I understand things right, the entire reason you have this
    cfs_b->lock guarded ->timer_active nonsense is to make sure we don't
    accidentally lose the timer.
    
    It appears to me that it should be possible to guarantee that same by
    unconditionally (re)starting the timer when !queued. Because regardless
    what hrtimer::function will return, if we beat it to (re)enqueue the
    timer, it doesn't matter.
    
    Now, because hrtimers don't come with any serialization guarantees we
    must ensure both handler and (re)start loop serialize their access to
    the hrtimer to avoid both trying to forward the timer at the same
    time.
    
    Update the rt bandwidth timer to match.
    
    This effectively reverts: 09dc4ab0 ("sched/fair: Fix
    tg_set_cfs_bandwidth() deadlock on rq->lock").
    Reported-by: NRoman Gushchin <klamm@yandex-team.ru>
    Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: NBen Segall <bsegall@google.com>
    Cc: Paul Turner <pjt@google.com>
    Link: http://lkml.kernel.org/r/20150415095011.804589208@infradead.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
    77a4d1a1
rt.c 53.0 KB