1. 04 12月, 2015 6 次提交
    • F
      sched/cputime: Introduce vtime accounting check for readers · e5925394
      Frederic Weisbecker 提交于
      Readers need to know if vtime runs at all on some CPU somewhere, this
      is a fast-path check to determine if we need to check further the need
      to add up any tickless cputime delta.
      
      This fast path check uses context tracking state because vtime is tied
      to context tracking as of now. This check appears to be confusing though
      so lets use a vtime function that deals with context tracking details
      in vtime implementation instead.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e5925394
    • F
      sched/cputime: Rename vtime_accounting_enabled() to vtime_accounting_cpu_enabled() · 55dbdcfa
      Frederic Weisbecker 提交于
      vtime_accounting_enabled() checks if vtime is running on the current CPU
      and is as such a misnomer. Lets rename it to a function that reflect its
      locality. We are going to need the current name for a function that tells
      if vtime runs at all on some CPU.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55dbdcfa
    • F
      sched/cputime: Correctly handle task guest time on housekeepers · cab245d6
      Frederic Weisbecker 提交于
      When a task runs on a housekeeper (a CPU running with the periodic tick
      with neighbours running tickless), it doesn't account cputime using vtime
      but relies on the tick. Such a task has its vtime_snap_whence value set
      to VTIME_INACTIVE.
      
      Readers won't handle that correctly though. As long as vtime is running
      on some CPU, readers incorretly assume that vtime runs on all CPUs and
      always compute the tickless cputime delta, which is only junk on
      housekeepers.
      
      So lets fix this with checking that the target runs on a vtime CPU through
      the appropriate state check before computing the tickless delta.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cab245d6
    • F
      sched/cputime: Clarify vtime symbols and document them · 7098c1ea
      Frederic Weisbecker 提交于
      VTIME_SLEEPING state happens either when:
      
      1) The task is sleeping and no tickless delta is to be added on the task
         cputime stats.
      2) The CPU isn't running vtime at all, so the same properties of 1) applies.
      
      Lets rename the vtime symbol to reflect both states.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7098c1ea
    • H
      sched/cputime: Remove extra cost in task_cputime() · 7877a0ba
      Hiroshi Shimamoto 提交于
      There is an extra cost in task_cputime() and task_cputime_scaled() when
      nohz_full is not activated. When vtime accounting is not enabled, we
      don't need to get deltas of utime and stime under vtime seqlock.
      
      This patch removes that cost with adding a shortcut route if vtime
      accounting is not enabled.
      
      Use context_tracking_is_enabled() to check if vtime is accounting on
      some cpu, in which case only we need to check the tickless cputime delta.
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7877a0ba
    • H
      sched/cputime: Fix invalid gtime in proc · 2541117b
      Hiroshi Shimamoto 提交于
      /proc/stats shows invalid gtime when the thread is running in guest.
      When vtime accounting is not enabled, we cannot get a valid delta.
      The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap
      is only updated when vtime accounting is runtime enabled.
      
      This patch makes task_gtime() just return gtime without computing the
      buggy non-existing tickless delta when vtime accounting is not enabled.
      
      Use context_tracking_is_enabled() to check if vtime is accounting on
      some cpu, in which case only we need to check the tickless delta. This
      way we fix the gtime value regression on machines not running nohz full.
      
      The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and
      CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full.
      
      I ran and stop a busy loop in VM and see the gtime in host.
      Dump the 43rd field which shows the gtime in every second:
      
      	 # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done
      	S 4348
      	R 7064566
      	R 7064766
      	R 7064967
      	R 7065168
      	S 4759
      	S 4759
      
      During running busy loop, it returns large value.
      
      After applying this patch, we can see right gtime.
      
      	 # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done
      	S 5338
      	R 5365
      	R 5465
      	R 5566
      	R 5666
      	S 5726
      	S 5726
      Signed-off-by: NHiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2541117b
  2. 01 10月, 2015 1 次提交
  3. 03 8月, 2015 1 次提交
  4. 08 5月, 2015 1 次提交
  5. 03 10月, 2014 1 次提交
  6. 19 9月, 2014 1 次提交
  7. 08 9月, 2014 2 次提交
    • R
      sched, time: Atomically increment stime & utime · eb1b4af0
      Rik van Riel 提交于
      The functions task_cputime_adjusted and thread_group_cputime_adjusted()
      can be called locklessly, as well as concurrently on many different CPUs.
      
      This can occasionally lead to the utime and stime reported by times(), and
      other syscalls like it, going backward. The cause for this appears to be
      multiple threads racing in cputime_adjust(), both with values for utime or
      stime that is larger than the original, but each with a different value.
      
      Sometimes the larger value gets saved first, only to be immediately
      overwritten with a smaller value by another thread.
      
      Using atomic exchange prevents that problem, and ensures time
      progresses monotonically.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: umgwanakikbuti@gmail.com
      Cc: fweisbec@gmail.com
      Cc: akpm@linux-foundation.org
      Cc: srao@redhat.com
      Cc: lwoodman@redhat.com
      Cc: atheurer@redhat.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb1b4af0
    • R
      time, signal: Protect resource use statistics with seqlock · e78c3496
      Rik van Riel 提交于
      Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
      issues on large systems, due to both functions being serialized with a
      lock.
      
      The lock protects against reporting a wrong value, due to a thread in the
      task group exiting, its statistics reporting up to the signal struct, and
      that exited task's statistics being counted twice (or not at all).
      
      Protecting that with a lock results in times() and clock_gettime() being
      completely serialized on large systems.
      
      This can be fixed by using a seqlock around the events that gather and
      propagate statistics. As an additional benefit, the protection code can
      be moved into thread_group_cputime(), slightly simplifying the calling
      functions.
      
      In the case of posix_cpu_clock_get_task() things can be simplified a
      lot, because the calling function already ensures that the task sticks
      around, and the rest is now taken care of in thread_group_cputime().
      
      This way the statistics reporting code can run lockless.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Daeseok Youn <daeseok.youn@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guillaume Morin <guillaume@morinfr.org>
      Cc: Ionut Alexa <ionut.m.alexa@gmail.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Li Zefan <lizefan@huawei.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: umgwanakikbuti@gmail.com
      Cc: fweisbec@gmail.com
      Cc: srao@redhat.com
      Cc: lwoodman@redhat.com
      Cc: atheurer@redhat.com
      Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e78c3496
  8. 20 8月, 2014 1 次提交
  9. 07 5月, 2014 1 次提交
  10. 13 3月, 2014 1 次提交
    • F
      cputime: Fix jiffies based cputime assumption on steal accounting · dee08a72
      Frederic Weisbecker 提交于
      The steal guest time accounting code assumes that cputime_t is based on
      jiffies. So when CONFIG_NO_HZ_FULL=y, which implies that cputime_t
      is based on nsecs, steal_account_process_tick() passes the delta in
      jiffies to account_steal_time() which then accounts it as if it's a
      value in nsecs.
      
      As a result, accounting 1 second of steal time (with HZ=100 that would
      be 100 jiffies) is spuriously accounted as 100 nsecs.
      
      As such /proc/stat may report 0 values of steal time even when two
      guests have run concurrently for a few seconds on the same host and
      same CPU.
      
      In order to fix this, lets convert the nsecs based steal delta to
      cputime instead of jiffies by using the right conversion API.
      
      Given that the steal time is stored in cputime_t and this type can have
      a smaller granularity than nsecs, we only account the rounded converted
      value and leave the remaining nsecs for the next deltas.
      Reported-by: NHuiqingding <huding@redhat.com>
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      dee08a72
  11. 09 2月, 2014 1 次提交
  12. 04 9月, 2013 1 次提交
    • S
      sched/cputime: Do not scale when utime == 0 · 5a8e01f8
      Stanislaw Gruszka 提交于
      scale_stime() silently assumes that stime < rtime, otherwise
      when stime == rtime and both values are big enough (operations
      on them do not fit in 32 bits), the resulting scaling stime can
      be bigger than rtime. In consequence utime = rtime - stime
      results in negative value.
      
      User space visible symptoms of the bug are overflowed TIME
      values on ps/top, for example:
      
       $ ps aux | grep rcu
       root         8  0.0  0.0      0     0 ?        S    12:42   0:00 [rcuc/0]
       root         9  0.0  0.0      0     0 ?        S    12:42   0:00 [rcub/0]
       root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]
       root        11  0.1  0.0      0     0 ?        S    12:42   0:02 [rcuop/0]
       root        12 62422329  0.0  0     0 ?        S    12:42 21114581:35 [rcuop/1]
       root        10 62422329  0.0  0     0 ?        R    12:42 21114581:37 [rcu_preempt]
      
      or overflowed utime values read directly from /proc/$PID/stat
      
      Reference:
      
        https://lkml.org/lkml/2013/8/20/259Reported-and-tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/r/20130904131602.GC2564@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5a8e01f8
  13. 16 8月, 2013 1 次提交
  14. 14 8月, 2013 6 次提交
    • F
      vtime: Always debug check snapshot source _before_ updating it · af2350bd
      Frederic Weisbecker 提交于
      The vtime delta update performed by get_vtime_delta() always check
      that the source of the snapshot is valid.
      
      Meanhile the snapshot updaters that rely on get_vtime_delta() also
      set the new snapshot origin. But some of them do this right before
      the call to get_vtime_delta(), making its debug check useless.
      
      This is easily fixable by moving the snapshot origin update after
      the call to get_vtime_delta(). The order doesn't matter there.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      af2350bd
    • F
      vtime: Always scale generic vtime accounting results · b854fafa
      Frederic Weisbecker 提交于
      The cputime accounting in full dynticks can be a subtle
      mixup of CPUs using tick based accounting and others using
      generic vtime.
      
      As long as the tick can have a share on producing these stats, we
      want to scale the result against CFS precise accounting as the tick
      can miss some task hiding between the periodic interrupt.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      b854fafa
    • F
      vtime: Optimize full dynticks accounting off case with static keys · b0493406
      Frederic Weisbecker 提交于
      If no CPU is in the full dynticks range, we can avoid the full
      dynticks cputime accounting through generic vtime along with its
      overhead and use the traditional tick based accounting instead.
      
      Let's do this and nope the off case with static keys.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      b0493406
    • F
      vtime: Fix racy cputime delta update · 54461562
      Frederic Weisbecker 提交于
      get_vtime_delta() must be called under the task vtime_seqlock
      with the code that does the cputime accounting flush.
      
      Otherwise the cputime reader can be fooled and run into
      a race where it sees the snapshot update but misses the
      cputime flush. As a result it can report a cputime that is
      way too short.
      
      Fix vtime_account_user() that wasn't complying to that rule.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      54461562
    • F
      vtime: Remove a few unneeded generic vtime state checks · 7621d1f8
      Frederic Weisbecker 提交于
      Some generic vtime APIs check if the vtime accounting
      is enabled on the local CPU before doing their work.
      
      Some of these are not needed because all their callers already
      take care of that. Let's remove the checks on these.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      7621d1f8
    • F
      context_tracking: Optimize guest APIs off case with static key · 48d6a816
      Frederic Weisbecker 提交于
      Optimize guest entry/exit APIs with static keys. This minimize
      the overhead for those who enable CONFIG_NO_HZ_FULL without
      always using it. Having no range passed to nohz_full= should
      result in the probes overhead to be minimized.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      48d6a816
  15. 13 8月, 2013 1 次提交
    • F
      vtime: Update a few comments · 5b206d48
      Frederic Weisbecker 提交于
      Update a stale comment from the old vtime era and document some
      locking that might be non obvious.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      5b206d48
  16. 31 5月, 2013 1 次提交
    • F
      vtime: Use consistent clocks among nohz accounting · 45eacc69
      Frederic Weisbecker 提交于
      While computing the cputime delta of dynticks CPUs,
      we are mixing up clocks of differents natures:
      
      * local_clock() which takes care of unstable clock
      sources and fix these if needed.
      
      * sched_clock() which is the weaker version of
      local_clock(). It doesn't compute any fixup in case
      of unstable source.
      
      If the clock source is stable, those two clocks are the
      same and we can safely compute the difference against
      two random points.
      
      Otherwise it results in random deltas as sched_clock()
      can randomly drift away, back or forward, from local_clock().
      
      As a consequence, some strange behaviour with unstable tsc
      has been observed such as non progressing constant zero cputime.
      (The 'top' command showing no load).
      
      Fix this by only using local_clock(), or its irq safe/remote
      equivalent, in vtime code.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Suggested-by: NMike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45eacc69
  17. 28 5月, 2013 1 次提交
  18. 01 5月, 2013 3 次提交
  19. 10 4月, 2013 1 次提交
  20. 08 4月, 2013 1 次提交
  21. 14 3月, 2013 1 次提交
    • F
      sched: Lower chances of cputime scaling overflow · d9a3c982
      Frederic Weisbecker 提交于
      Some users have reported that after running a process with
      hundreds of threads on intensive CPU-bound loads, the cputime
      of the group started to freeze after a few days.
      
      This is due to how we scale the tick-based cputime against
      the scheduler precise execution time value.
      
      We add the values of all threads in the group and we multiply
      that against the sum of the scheduler exec runtime of the whole
      group.
      
      This easily overflows after a few days/weeks of execution.
      
      A proposed solution to solve this was to compute that multiplication
      on stime instead of utime:
         62188451
         ("cputime: Avoid multiplication overflow on utime scaling")
      
      The rationale behind that was that it's easy for a thread to
      spend most of its time in userspace under intensive CPU-bound workload
      but it's much harder to do CPU-bound intensive long run in the kernel.
      
      This postulate got defeated when a user recently reported he was still
      seeing cputime freezes after the above patch. The workload that
      triggers this issue relates to intensive networking workloads where
      most of the cputime is consumed in the kernel.
      
      To reduce much more the opportunities for multiplication overflow,
      lets reduce the multiplication factors to the remainders of the division
      between sched exec runtime and cputime. Assuming the difference between
      these shouldn't ever be that large, it could work on many situations.
      
      This gets the same results as in the upstream scaling code except for
      a small difference: the upstream code always rounds the results to
      the nearest integer not greater to what would be the precise result.
      The new code rounds to the nearest integer either greater or not
      greater. In practice this difference probably shouldn't matter but
      it's worth mentioning.
      
      If this solution appears not to be enough in the end, we'll
      need to partly revert back to the behaviour prior to commit
           0cf55e1e
           ("sched, cputime: Introduce thread_group_times()")
      
      Back then, the scaling was done on exit() time before adding the cputime
      of an exiting thread to the signal struct. And then we'll need to
      scale one-by-one the live threads cputime in thread_group_cputime(). The
      drawback may be a slightly slower code on exit time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      d9a3c982
  22. 08 3月, 2013 1 次提交
    • F
      cputime: Dynamically scale cputime for full dynticks accounting · 9fbc42ea
      Frederic Weisbecker 提交于
      The full dynticks cputime accounting is able to account either
      using the tick or the context tracking subsystem. This way
      the housekeeping CPU can keep the low overhead tick based
      solution.
      
      This latter mode has a low jiffies resolution granularity and
      need to be scaled against CFS precise runtime accounting to
      improve its result. We are doing this for CONFIG_TICK_CPU_ACCOUNTING,
      now we also need to expand it to full dynticks accounting dynamic
      off-case as well.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Mats Liljegren <mats.liljegren@enea.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      9fbc42ea
  23. 24 2月, 2013 1 次提交
    • F
      cputime: Use local_clock() for full dynticks cputime accounting · 7f6575f1
      Frederic Weisbecker 提交于
      Running the full dynticks cputime accounting with preemptible
      kernel debugging trigger the following warning:
      
      	[    4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
      	[    4.490971] caller is native_sched_clock+0x22/0x80
      	[    4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
      	[    4.496376] Call Trace:
      	[    4.498996]  [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
      	[    4.501716]  [<ffffffff8101e642>] native_sched_clock+0x22/0x80
      	[    4.504434]  [<ffffffff8101db99>] sched_clock+0x9/0x10
      	[    4.507185]  [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
      	[    4.509916]  [<ffffffff81096dd5>] task_cputime+0x35/0x60
      	[    4.512622]  [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
      	[    4.515372]  [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
      	[    4.518117]  [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
      	[    4.520844]  [<ffffffff81867a10>] ? rest_init+0x160/0x160
      	[    4.523554]  [<ffffffff8117d457>] do_execve+0x37/0x40
      	[    4.526276]  [<ffffffff810021a3>] run_init_process+0x23/0x30
      	[    4.528953]  [<ffffffff81867aac>] kernel_init+0x9c/0xf0
      	[    4.531608]  [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0
      
      We use sched_clock() to perform and fixup the cputime
      accounting. However we are calling it with preemption enabled
      from the read side, which trigger the bug above.
      
      To fix this up, use local_clock() instead. It takes care of
      preemption and also provide a more reliable clock source. This
      is welcome for this kind of statistic that is widely relied on
      in userspace.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f6575f1
  24. 19 2月, 2013 1 次提交
  25. 28 1月, 2013 3 次提交
    • F
      cputime: Safely read cputime of full dynticks CPUs · 6a61671b
      Frederic Weisbecker 提交于
      While remotely reading the cputime of a task running in a
      full dynticks CPU, the values stored in utime/stime fields
      of struct task_struct may be stale. Its values may be those
      of the last kernel <-> user transition time snapshot and
      we need to add the tickless time spent since this snapshot.
      
      To fix this, flush the cputime of the dynticks CPUs on
      kernel <-> user transition and record the time / context
      where we did this. Then on top of this snapshot and the current
      time, perform the fixup on the reader side from task_times()
      accessors.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [fixed kvm module related build errors]
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      6a61671b
    • F
      kvm: Prepare to add generic guest entry/exit callbacks · c11f11fc
      Frederic Weisbecker 提交于
      Do some ground preparatory work before adding guest_enter()
      and guest_exit() context tracking callbacks. Those will
      be later used to read the guest cputime safely when we
      run in full dynticks mode.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c11f11fc
    • F
      cputime: Use accessors to read task cputime stats · 6fac4829
      Frederic Weisbecker 提交于
      This is in preparation for the full dynticks feature. While
      remotely reading the cputime of a task running in a full
      dynticks CPU, we'll need to do some extra-computation. This
      way we can account the time it spent tickless in userspace
      since its last cputime snapshot.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      6fac4829