1. 05 7月, 2017 2 次提交
  2. 04 7月, 2017 1 次提交
    • I
      Revert "sched/cputime: Refactor the cputime_adjust() code" · 3b9c08ae
      Ingo Molnar 提交于
      This reverts commit 72298e5c.
      
      As Peter explains:
      
      > Argh, no... That code was perfectly fine. The new code otoh is
      > convoluted.
      >
      > The old code had the following form:
      >
      >         if (exception1)
      >           deal with exception1
      >
      >         if (execption2)
      >           deal with exception2
      >
      >         do normal stuff
      >
      > Which is as simple and straight forward as it gets.
      >
      > The new code otoh reads like:
      >
      >         if (!exception1) {
      >                 if (exception2)
      >                   deal with exception 2
      >                 else
      >                   do normal stuff
      >         }
      
      So restore the old form.
      
      Also fix the comment describing the logic, as it was confusing.
      Requested-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Gustavo A. R. Silva <garsilva@embeddedor.com>
      Cc: Frans Klaver <fransklaver@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3b9c08ae
  3. 30 6月, 2017 1 次提交
  4. 27 4月, 2017 1 次提交
  5. 02 3月, 2017 2 次提交
  6. 01 2月, 2017 11 次提交
  7. 14 1月, 2017 3 次提交
  8. 15 11月, 2016 3 次提交
  9. 30 9月, 2016 4 次提交
  10. 18 8月, 2016 3 次提交
    • S
      sched/cputime: Improve scalability by not accounting thread group tasks pending runtime · a1eb1411
      Stanislaw Gruszka 提交于
      Commit:
      
        d670ec13 ("posix-cpu-timers: Cure SMP wobbles")
      
      started accounting thread group tasks pending runtime in thread_group_cputime().
      
      Another commit:
      
        6e998916 ("sched/cputime: Fix clock_nanosleep()/clock_gettime() inconsistency")
      
      updated scheduler runtime statistics (call update_curr()) when reading task pending
      runtime. Those changes cause bad performance of SYS_times() and
      SYS_clock_gettimes(CLOCK_PROCESS_CPUTIME_ID) syscalls, especially on
      larger systems with many CPUs.
      
      While we would like to have cpuclock monotonicity kept i.e. have
      problems fixed by above commits stay fixed, we also would like to have
      good performance.
      
      However when we notice that change from commit d670ec13 is not
      longer needed to solve problem addressed by that commit, because of
      change from the second commit 6e998916, we can get room for
      optimization. Since we update task while reading it's pending runtime
      in task_sched_runtime(), clock_gettime(CLOCK_PROCESS_CPUTIME_ID) will
      see updated values and on testcase from d670ec13 process cpuclock
      will not be smaller than thread cpuclock.
      
      I tested the patch on testcases from commits d670ec13,
      6e998916 and some other cpuclock/cputimers testcases and
      did not found cpuclock monotonicity problems or other malfunction.
      
      This patch has the drawback that we will not provide thread group cputime
      up-to-date to the last moment. For example when arming cputime timer,
      we will arm it with possibly a bit outdated values and that timer will
      trigger earlier compared to behaviour without the patch. However that
      was the behaviour before d670ec13 commit (kernel v3.1) so it's
      unlikely to affect applications.
      
      Patch improves related syscall performance, as measured by Giovanni's
      benchmarks described in commit:
      
        6075620b ("sched/cputime: Mitigate performance regression in times()/clock_gettime()")
      
      The benchmark results are:
      
      SYS_clock_gettime():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.48        2.23 ( 35.68%)        3.06 ( 11.83%)        1.08 ( 68.81%)
        5          3.33        2.83 ( 14.84%)        3.25 (  2.40%)        0.71 ( 78.55%)
        8          3.37        2.84 ( 15.80%)        3.26 (  3.30%)        0.56 ( 83.49%)
        12         3.32        3.09 (  6.69%)        3.37 ( -1.60%)        0.42 ( 87.28%)
        21         4.01        3.14 ( 21.70%)        3.90 (  2.74%)        0.35 ( 91.35%)
        30         3.63        3.28 (  9.75%)        3.36 (  7.41%)        0.28 ( 92.23%)
        48         3.71        3.02 ( 18.69%)        3.11 ( 16.27%)        0.39 ( 89.39%)
        79         3.75        2.88 ( 23.23%)        3.16 ( 15.74%)        0.46 ( 87.76%)
        110        3.81        2.95 ( 22.62%)        3.25 ( 14.80%)        0.56 ( 85.41%)
        128        3.88        3.05 ( 21.28%)        3.31 ( 14.76%)        0.62 ( 84.10%)
      
      SYS_times():
      
        threads    4.7-rc7     3.18-rc3              4.7-rc7 + prefetch    4.7-rc7 + patch
                               (pre-6e998916)
        2          3.65        2.27 ( 37.94%)        3.25 ( 11.03%)        1.62 ( 55.71%)
        5          3.45        2.78 ( 19.34%)        3.17 (  7.92%)        2.33 ( 32.28%)
        8          3.52        2.79 ( 20.66%)        3.22 (  8.69%)        2.06 ( 41.44%)
        12         3.29        3.02 (  8.33%)        3.36 ( -2.04%)        2.00 ( 39.18%)
        21         4.07        3.10 ( 23.86%)        3.92 (  3.78%)        2.07 ( 49.18%)
        30         3.87        3.33 ( 13.80%)        3.40 ( 12.17%)        1.89 ( 51.12%)
        48         3.79        2.96 ( 21.94%)        3.16 ( 16.61%)        1.69 ( 55.46%)
        79         3.88        2.88 ( 25.82%)        3.28 ( 15.42%)        1.60 ( 58.81%)
        110        3.90        2.98 ( 23.73%)        3.38 ( 13.35%)        1.73 ( 55.61%)
        128        4.00        3.10 ( 22.40%)        3.38 ( 15.45%)        1.66 ( 58.52%)
      Reported-and-tested-by: NGiovanni Gherdovich <ggherdovich@suse.cz>
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <mgalbraith@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/20160817093043.GA25206@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1eb1411
    • W
      sched/cputime: Resync steal time when guest & host lose sync · 03cbc732
      Wanpeng Li 提交于
      Commit:
      
        57430218 ("sched/cputime: Count actually elapsed irq & softirq time")
      
      ... fixed a bug but also triggered a regression:
      
      On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four
      CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs
      on host one by one until there is only one left, then observe CPU utilization
      via 'top' in the guest, it shows:
      
        100% st for cpu0(housekeeping)
         75% st for other CPUs (nohz full mode)
      
      However, w/o this commit it shows the correct 75% for all four CPUs.
      
      When a guest is interrupted for a longer amount of time, missed clock ticks
      are not redelivered later. Because of that, we should not limit the amount
      of steal time accounted to the amount of time that the calling functions
      think have passed.
      
      However, the interval returned by account_other_time() is NOT rounded down
      to the nearest jiffy, while the base interval in get_vtime_delta() it is
      subtracted from is, so the max cputime limit is required to avoid underflow.
      
      This patch fixes the regression by limiting the account_other_time() from
      get_vtime_delta() to avoid underflow, and lets the other three call sites
      (in account_other_time() and steal_account_process_time()) account however
      much steal time the host told us elapsed.
      Suggested-by: NRik van Riel <riel@redhat.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      03cbc732
    • P
      sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression · 173be9a1
      Peter Zijlstra 提交于
      Mike reports:
      
       Roughly 10% of the time, ltp testcase getrusage04 fails:
       getrusage04    0  TINFO  :  Expected timers granularity is 4000 us
       getrusage04    0  TINFO  :  Using 1 as multiply factor for max [us]time increment (1000+4000us)!
       getrusage04    0  TINFO  :  utime:           0us; stime:         179us
       getrusage04    0  TINFO  :  utime:        3751us; stime:           0us
       getrusage04    1  TFAIL  :  getrusage04.c:133: stime increased > 5000us:
      
      And tracked it down to the case where the task simply doesn't get
      _any_ [us]time ticks.
      
      Update the code to assume all rtime is utime when we lack information,
      thus ensuring a task that elides the tick gets time accounted.
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Fredrik Markstrom <fredrik.markstrom@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org # 4.3+
      Fixes: 9d7fb042 ("sched/cputime: Guarantee stime + utime == rtime")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      173be9a1
  11. 11 8月, 2016 2 次提交
  12. 14 7月, 2016 4 次提交
    • R
      sched/cputime: Drop local_irq_save/restore from irqtime_account_irq() · 553bf6bb
      Rik van Riel 提交于
      Paolo pointed out that irqs are already blocked when irqtime_account_irq()
      is called. That means there is no reason to call local_irq_save/restore()
      again.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      553bf6bb
    • F
      sched/cputime: Clean up the old vtime gen irqtime accounting completely · 0cfdf9a1
      Frederic Weisbecker 提交于
      Vtime generic irqtime accounting has been removed but there are a few
      remnants to clean up:
      
      * The vtime_accounting_cpu_enabled() check in irq entry was only used
        by CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can safely remove it.
      
      * Without the vtime_accounting_cpu_enabled(), we no longer need to
        have a vtime_common_account_irq_enter() indirect function.
      
      * Move vtime_account_irq_enter() implementation under
        CONFIG_VIRT_CPU_ACCOUNTING_NATIVE which is the last user.
      
      * The vtime_account_user() call was only used on irq entry for
        CONFIG_VIRT_CPU_ACCOUNTING_GEN. We can remove that too.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0cfdf9a1
    • R
      sched/cputime: Replace VTIME_GEN irq time code with IRQ_TIME_ACCOUNTING code · b58c3584
      Rik van Riel 提交于
      The CONFIG_VIRT_CPU_ACCOUNTING_GEN irq time tracking code does not
      appear to currently work right.
      
      On CPUs without nohz_full=, only tick based irq time sampling is
      done, which breaks down when dealing with a nohz_idle CPU.
      
      On firewalls and similar systems, no ticks may happen on a CPU for a
      while, and the irq time spent may never get accounted properly. This
      can cause issues with capacity planning and power saving, which use
      the CPU statistics as inputs in decision making.
      
      Remove the VTIME_GEN vtime irq time code, and replace it with the
      IRQ_TIME_ACCOUNTING code, when selected as a config option by the user.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b58c3584
    • R
      sched/cputime: Count actually elapsed irq & softirq time · 57430218
      Rik van Riel 提交于
      Currently, if there was any irq or softirq time during 'ticks'
      jiffies, the entire period will be accounted as irq or softirq
      time.
      
      This is inaccurate if only a subset of the time was actually spent
      handling irqs, and could conceivably mis-count all of the ticks during
      a period as irq time, when there was some irq and some softirq time.
      
      This can actually happen when irqtime_account_process_tick is called
      from account_idle_ticks, which can pass a larger number of ticks down
      all at once.
      
      Fix this by changing irqtime_account_hi_update(), irqtime_account_si_update(),
      and steal_account_process_ticks() to work with cputime_t time units, and
      return the amount of time spent in each mode.
      
      Rename steal_account_process_ticks() to steal_account_process_time(), to
      reflect that time is now accounted in cputime_t, instead of ticks.
      
      Additionally, have irqtime_account_process_tick() take into account how
      much time was spent in each of steal, irq, and softirq time.
      
      The latter could help improve the accuracy of cputime
      accounting when returning from idle on a NO_HZ_IDLE CPU.
      
      Properly accounting how much time was spent in hardirq and
      softirq time will also allow the NO_HZ_FULL code to re-use
      these same functions for hardirq and softirq accounting.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      [ Make nsecs_to_cputime64() actually return cputime64_t. ]
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1468421405-20056-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57430218
  13. 06 7月, 2016 1 次提交
  14. 14 6月, 2016 1 次提交
  15. 08 3月, 2016 1 次提交