1. 01 2月, 2017 3 次提交
  2. 14 1月, 2017 3 次提交
  3. 15 11月, 2016 3 次提交
  4. 30 9月, 2016 4 次提交
  5. 18 8月, 2016 3 次提交
    • S
      sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
      Stanislaw Gruszka 提交于
      Commit:
      
        d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
      
      started accounting thread group tasks pending runtime in thread_group_cputime().
      
      Another commit:
      
        6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
      
      updated scheduler runtime statistics (call update_curr()) when reading task pending
      runtime. Those changes cause bad performance of SYS_times() and
      SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
      larger systems with many CPUs.
      
      While we would like to have cpuclock monotonicity kept i.e. have
      problems fixed by above commits stay fixed, we also would like to have
      good performance.
      
      However when we notice that change from commit d670ec13 is not
      longer needed to solve problem addressed by that commit, because of
      change from the second commit 6e998916, we can get room for
      optimization. Since we update task while reading it's pending runtime
      in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
      see updated values and on testcase from d670ec13 process cpuclock
      will not be smaller than thread cpuclock.
      
      I tested the patch on testcases from commits d670ec13,
      6e998916 and some other cpuclock/cputimers testcases and
      did not found cpuclock monotonicity problems or other malfunction.
      
      This patch has the drawback that we will not provide thread group cputime
      up-to-date to the last moment. For example when arming cputime timer,
      we will arm it with possibly a bit outdated values and that timer will
      trigger earlier compared to behaviour without the patch. However that
      was the behaviour before d670ec13 commit (kernel v3.1) so it's
      unlikely to affect applications.
      
      Patch improves related syscall performance, as measured by Giovanni's
      benchmarks described in commit:
      
        6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
      
      The benchmark results are:
      
      SYS_clock_gettime():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
        5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
        8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
        12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
        21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
        30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
        48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
        79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
        110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
        128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
      
      SYS_times():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
        5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
        8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
        12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
        21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
        30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
        48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
        79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
        110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
        128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
      Reported-and-tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1eb1411
    • W
      sched/cputime: Resync steal time when guest & host lose sync · 03cbc732
      Wanpeng Li 提交于
      Commit:
      
        57430218 ("sched/cputime: Count actually elapsed irq & softirq time")
      
      ... fixed a bug but also triggered a regression:
      
      On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
      CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
      on host one by one until there is only one left, then observe CPU utilization
      via 'top' in the guest, it shows:
      
        100% st for cpu0(housekeeping)
         75% st for other CPUs (nohz full mode)
      
      However, w/o this commit it shows the correct 75% for all four CPUs.
      
      When a guest is interrupted for a longer amount of time, missed clock ticks
      are not redelivered later. Because of that, we should not limit the amount
      of steal time accounted to the amount of time that the calling functions
      think have passed.
      
      However, the interval returned by account_other_time() is NOT rounded down
      to the nearest jiffy, while the base interval in get_vtime_delta() it is
      subtracted from is, so the max cputime limit is required to avoid underflow.
      
      This patch fixes the regression by limiting the account_other_time() from
      get_vtime_delta() to avoid underflow, and lets the other three call sites
      (in account_other_time() and steal_account_process_time()) account however
      much steal time the host told us elapsed.
      Suggested-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      03cbc732
    • P
      sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression · 173be9a1
      Peter Zijlstra 提交于
      Mike reports:
      
       Roughly 10% of the time, ltp testcase getrusage04 fails:
       getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
       getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
       getrusage04    0  TINFO  :  utime:           0us; stime:         179us
       getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
       getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
      
      And tracked it down to the case where the task simply doesn't get
      _any_ [us]time ticks.
      
      Update the code to assume all rtime is utime when we lack information,
      thus ensuring a task that elides the tick gets time accounted.
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: 9d7fb042 ("sched/cputime: Guarantee stime + utime == rtime")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      173be9a1
  6. 11 8月, 2016 2 次提交
  7. 14 7月, 2016 4 次提交
    • R
      sched/cputime: Drop local_irq_save/restore from irqtime_account_irq() · 553bf6bb
      Rik van Riel 提交于
      Paolo pointed out that irqs are already blocked when irqtime_account_irq()
      is called. That means there is no reason to call local_irq_save/restore()
      again.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      553bf6bb
    • F
      sched/cputime: Clean up the old vtime gen irqtime accounting completely · 0cfdf9a1
      Frederic Weisbecker 提交于
      Vtime generic irqtime accounting has been removed but there are a few
      remnants to clean up:
      
      * The vtime_accounting_cpu_enabled() check in irq entry was only used
        by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it.
      
      * Without the vtime_accounting_cpu_enabled(), we no longer need to
        have a vtime_common_account_irq_enter() indirect function.
      
      * Move vtime_account_irq_enter() implementation under
        CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user.
      
      * The vtime_account_user() call was only used on irq entry for
        CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0cfdf9a1
    • R
      sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code · b58c3584
      Rik van Riel 提交于
      The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
      appear to currently work right.
      
      On CPUs without nohz_full=, only tick based irq time sampling is
      done, which breaks down when dealing with a nohz_idle CPU.
      
      On firewalls and similar systems, no ticks may happen on a CPU for a
      while, and the irq time spent may never get accounted properly. This
      can cause issues with capacity planning and power saving, which use
      the CPU statistics as inputs in decision making.
      
      Remove the VTIME_GEN vtime irq time code, and replace it with the
      IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b58c3584
    • R
      sched/cputime: Count actually elapsed irq & softirq time · 57430218
      Rik van Riel 提交于
      Currently, if there was any irq or softirq time during 'ticks'
      jiffies, the entire period will be accounted as irq or softirq
      time.
      
      This is inaccurate if only a subset of the time was actually spent
      handling irqs, and could conceivably mis-count all of the ticks during
      a period as irq time, when there was some irq and some softirq time.
      
      This can actually happen when irqtime_account_process_tick is called
      from account_idle_ticks, which can pass a larger number of ticks down
      all at once.
      
      Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(),
      and steal_account_process_ticks() to work with cputime_t time units, and
      return the amount of time spent in each mode.
      
      Rename steal_account_process_ticks() to steal_account_process_time(), to
      reflect that time is now accounted in cputime_t, instead of ticks.
      
      Additionally, have irqtime_account_process_tick() take into account how
      much time was spent in each of steal, irq, and softirq time.
      
      The latter could help improve the accuracy of cputime
      accounting when returning from idle on a NO_HZ_IDLE CPU.
      
      Properly accounting how much time was spent in hardirq and
      softirq time will also allow the NO_HZ_FULL code to re-use
      these same functions for hardirq and softirq accounting.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      [ Make nsecs_to_cputime64() actually return cputime64_t. ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57430218
  8. 06 7月, 2016 1 次提交
  9. 14 6月, 2016 1 次提交
  10. 08 3月, 2016 1 次提交
  11. 29 2月, 2016 1 次提交
    • R
      sched, time: Switch VIRT_CPU_ACCOUNTING_GEN to jiffy granularity · ff9a9b4c
      Rik van Riel 提交于
      When profiling syscall overhead on nohz-full kernels,
      after removing __acct_update_integrals() from the profile,
      native_sched_clock() remains as the top CPU user. This can be
      reduced by moving VIRT_CPU_ACCOUNTING_GEN to jiffy granularity.
      
      This will reduce timing accuracy on nohz_full CPUs to jiffy
      based sampling, just like on normal CPUs. It results in
      totally removing native_sched_clock from the profile, and
      significantly speeding up the syscall entry and exit path,
      as well as irq entry and exit, and KVM guest entry & exit.
      
      Additionally, only call the more expensive functions (and
      advance the seqlock) when jiffies actually changed.
      
      This code relies on another CPU advancing jiffies when the
      system is busy. On a nohz_full system, this is done by a
      housekeeping CPU.
      
      A microbenchmark calling an invalid syscall number 10 million
      times in a row speeds up an additional 30% over the numbers
      with just the previous patches, for a total speedup of about
      40% over 4.4 and 4.5-rc1.
      
      Run times for the microbenchmark:
      
       4.4				3.8 seconds
       4.5-rc1			3.7 seconds
       4.5-rc1 + first patch		3.3 seconds
       4.5-rc1 + first 3 patches	3.1 seconds
       4.5-rc1 + all patches		2.3 seconds
      
      A non-NOHZ_FULL cpu (not the housekeeping CPU):
      
       all kernels			1.86 seconds
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: clark@redhat.com
      Cc: eric.dumazet@gmail.com
      Cc: fweisbec@gmail.com
      Cc: luto@amacapital.net
      Link: http://lkml.kernel.org/r/1455152907-18495-5-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff9a9b4c
  12. 21 12月, 2015 1 次提交
    • S
      missing include asm/paravirt.h in cputime.c · 1fe7c4ef
      Stefano Stabellini 提交于
      Add include asm/paravirt.h to cputime.c, as steal_account_process_tick
      calls paravirt_steal_clock, which is defined in asm/paravirt.h.
      
      The ifdef CONFIG_PARAVIRT is necessary because not all archs have an
      asm/paravirt.h to include.
      
      The reason why currently cputime.c compiles, even though include
      <asm/paravirt.h> is missing, is that on x86 asm/paravirt.h is included
      by one of the other headers included in kernel/sched/cputime.c:
      
      On arm and arm64, where I am about to introduce asm/paravirt.h and
      stolen time support, without #include <asm/paravirt.h> in cputime.c, I
      would get an error.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      1fe7c4ef
  13. 04 12月, 2015 7 次提交
  14. 01 10月, 2015 1 次提交
  15. 03 8月, 2015 1 次提交
  16. 08 5月, 2015 1 次提交
  17. 03 10月, 2014 1 次提交
  18. 19 9月, 2014 1 次提交
  19. 08 9月, 2014 1 次提交
    • R
      sched, time: Atomically increment stime & utime · eb1b4af0
      Rik van Riel 提交于
      The functions task_cputime_adjusted and thread_group_cputime_adjusted()
      can be called locklessly, as well as concurrently on many different CPUs.
      
      This can occasionally lead to the utime and stime reported by times(), and
      other syscalls like it, going backward. The cause for this appears to be
      multiple threads racing in cputime_adjust(), both with values for utime or
      stime that is larger than the original, but each with a different value.
      
      Sometimes the larger value gets saved first, only to be immediately
      overwritten with a smaller value by another thread.
      
      Using atomic exchange prevents that problem, and ensures time
      progresses monotonically.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: umgwanakikbuti@gmail.com
      Cc: fweisbec@gmail.com
      Cc: akpm@linux-foundation.org
      Cc: srao@redhat.com
      Cc: lwoodman@redhat.com
      Cc: atheurer@redhat.com
      Cc: oleg@redhat.com
      Link: http://lkml.kernel.org/r/1408133138-22048-4-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      eb1b4af0