• X
    sched/fair: Fix bandwidth timer clock drift condition · 512ac999
    Xunlei Pang 提交于
    I noticed that cgroup task groups constantly get throttled even
    if they have low CPU usage, this causes some jitters on the response
    time to some of our business containers when enabling CPU quotas.
    
    It's very simple to reproduce:
    
      mkdir /sys/fs/cgroup/cpu/test
      cd /sys/fs/cgroup/cpu/test
      echo 100000 > cpu.cfs_quota_us
      echo $$ > tasks
    
    then repeat:
    
      cat cpu.stat | grep nr_throttled  # nr_throttled will increase steadily
    
    After some analysis, we found that cfs_rq::runtime_remaining will
    be cleared by expire_cfs_rq_runtime() due to two equal but stale
    "cfs_{b|q}->runtime_expires" after period timer is re-armed.
    
    The current condition to judge clock drift in expire_cfs_rq_runtime()
    is wrong, the two runtime_expires are actually the same when clock
    drift happens, so this condtion can never hit. The orginal design was
    correctly done by this commit:
    
      a9cf55b2 ("sched: Expire invalid runtime")
    
    ... but was changed to be the current implementation due to its locking bug.
    
    This patch introduces another way, it adds a new field in both structures
    cfs_rq and cfs_bandwidth to record the expiration update sequence, and
    uses them to figure out if clock drift happens (true if they are equal).
    Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
    Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
    Reviewed-by: NBen Segall <bsegall@google.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Fixes: 51f2176d ("sched/fair: Fix unlocked reads of some cfs_b->quota/period")
    Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
    512ac999
sched.h 57.0 KB