1. 14 1月, 2017 29 次提交
    • P
      sched/clock: Provide better clock continuity · 5680d809
      Peter Zijlstra 提交于
      When switching between the unstable and stable variants it is
      currently possible that clock discontinuities occur.
      
      And while these will mostly be 'small', attempt to do better.
      
      As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
      ktime_get_ns() based timeline at the point of switchover
      (sched_clock_init_late()) after SMP bringup.
      
      Equally, when the TSC is later found to be unstable -- typically
      because SMM tries to hide its SMI latencies by mucking with the TSC --
      we want to avoid large jumps.
      
      Since the clocksource watchdog reports the issue after the fact we
      cannot exactly fix up time, but since SMI latencies are typically
      small (~10ns range), the discontinuity is mainly due to drift between
      sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
      24days).
      
      I dislike this patch because it adds overhead to the good case in
      favour of dealing with badness. But given the widespread failure of
      TSC stability this is worth it.
      
      Note that in case the TSC makes drastic jumps after SMP bringup we're
      still hosed. There's just not much we can do in that case without
      stupid overhead.
      
      If we were to somehow expose tsc_clocksource_reliable (which is hard
      because this code is also used on ia64 and parisc) we could avoid some
      of the newly introduced overhead.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5680d809
    • P
      sched/clock: Delay switching sched_clock to stable · 9881b024
      Peter Zijlstra 提交于
      Currently we switch to the stable sched_clock if we guess the TSC is
      usable, and then switch back to the unstable path if it turns out TSC
      isn't stable during SMP bringup after all.
      
      Delay switching to the stable path until after SMP bringup is
      complete. This way we'll avoid switching during the time we detect the
      worst of the TSC offences.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9881b024
    • P
      sched/clock: Update static_key usage · 555570d7
      Peter Zijlstra 提交于
      sched_clock was still using the deprecated static_key interface.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      555570d7
    • T
      sched/clock, clocksource: Add optional cs::mark_unstable() method · 12907fbb
      Thomas Gleixner 提交于
      PeterZ reported that we'd fail to mark the TSC unstable when the
      clocksource watchdog finds it unsuitable.
      
      Allow a clocksource to run a custom action when its being marked
      unstable and hook up the TSC unstable code.
      Reported-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12907fbb
    • M
      sched/core: Add debugging code to catch missing update_rq_clock() calls · cb42c9a3
      Matt Fleming 提交于
      There's no diagnostic checks for figuring out when we've accidentally
      missed update_rq_clock() calls. Let's add some by piggybacking on the
      rq_*pin_lock() wrappers.
      
      The idea behind the diagnostic checks is that upon pining rq lock the
      rq clock should be updated, via update_rq_clock(), before anybody
      reads the clock with rq_clock() or rq_clock_task().
      
      The exception to this rule is when updates have explicitly been
      disabled with the rq_clock_skip_update() optimisation.
      
      There are some functions that only unpin the rq lock in order to grab
      some other lock and avoid deadlock. In that case we don't need to
      update the clock again and the previous diagnostic state can be
      carried over in rq_repin_lock() by saving the state in the rq_flags
      context.
      
      Since this patch adds a new clock update flag and some already exist
      in rq::clock_skip_update, that field has now been renamed. An attempt
      has been made to keep the flag manipulation code small and fast since
      it's used in the heart of the __schedule() fast path.
      
      For the !CONFIG_SCHED_DEBUG case the only object code change (other
      than addresses) is the following change to reset RQCF_ACT_SKIP inside
      of __schedule(),
      
        -       c7 83 38 09 00 00 00    movl   $0x0,0x938(%rbx)
        -       00 00 00
        +       83 a3 38 09 00 00 fc    andl   $0xfffffffc,0x938(%rbx)
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-8-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb42c9a3
    • P
      sched/core: Add missing update_rq_clock() call in set_user_nice() · 2fb8d367
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          set_next_entity()
          ? _raw_spin_lock()
          set_curr_task_fair()
          set_user_nice.part.85()
          set_user_nice()
          create_worker()
          worker_thread()
          kthread()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2fb8d367
    • P
      sched/core: Add missing update_rq_clock() call for task_hot() · 3bed5e21
      Peter Zijlstra 提交于
      Add the update_rq_clock() call at the top of the callstack instead of
      at the bottom where we find it missing, this to aid later effort to
      minimize the number of update_rq_lock() calls.
      
        WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          assert_clock_updated.isra.63.part.64()
          can_migrate_task()
          load_balance()
          pick_next_task_fair()
          __schedule()
          schedule()
          worker_thread()
          kthread()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3bed5e21
    • P
      sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() · 80f5c1b8
      Peter Zijlstra 提交于
      Instead of adding the update_rq_clock() all the way at the bottom of
      the callstack, add one at the top, this to aid later effort to
      minimize update_rq_lock() calls.
      
        WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          detach_task_cfs_rq()
          switched_from_fair()
          __sched_setscheduler()
          _sched_setscheduler()
          sched_set_stop_task()
          cpu_stop_create()
          __smpboot_create_thread.part.2()
          smpboot_register_percpu_thread_cpumask()
          cpu_stop_init()
          do_one_initcall()
          ? print_cpu_info()
          kernel_init_freeable()
          ? rest_init()
          kernel_init()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      80f5c1b8
    • P
      sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() · 4126bad6
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          __warn()
          post_init_entity_util_avg()
          wake_up_new_task()
          _do_fork()
          kernel_thread()
          rest_init()
          start_kernel()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4126bad6
    • M
      sched/fair: Push rq lock pin/unpin into idle_balance() · 46f69fa3
      Matt Fleming 提交于
      Future patches will emit warnings if rq_clock() is called before
      update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.
      
      Since there is only one caller of idle_balance() we can push the
      unpin/repin there.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-7-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46f69fa3
    • M
      sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock · 92509b73
      Matt Fleming 提交于
      rq_clock() is called from sched_info_{depart,arrive}() after resetting
      RQCF_ACT_SKIP but prior to a call to update_rq_clock().
      
      In preparation for pending patches that check whether the rq clock has
      been updated inside of a pin context before rq_clock() is called, move
      the reset of rq->clock_skip_update immediately before unpinning the rq
      lock.
      
      This will avoid the new warnings which check if update_rq_clock() is
      being actively skipped.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92509b73
    • M
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming 提交于
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8ac8971
    • F
      sched/cputime: Rename vtime_account_user() to vtime_flush() · c8d7dabf
      Frederic Weisbecker 提交于
      CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and
      account it on ticks and context switches only through the
      vtime_account_user() function.
      
      Now this model has been generalized on the 3 archs for all kind of
      cputime (system, irq, ...) and all the cputime flushing happens under
      vtime_account_user().
      
      So let's rename this function to better reflect its new role.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8d7dabf
    • M
      sched/cputime, s390: Implement delayed accounting of system time · b7394a5f
      Martin Schwidefsky 提交于
      The account_system_time() function is called with a cputime that
      occurred while running in the kernel. The function detects which
      context the CPU is currently running in and accounts the time to
      the correct bucket. This forces the arch code to account the
      cputime for hardirq and softirq immediately.
      
      Such accounting function can be costly and perform unwelcome divisions
      and multiplications, among others.
      
      The arch code can delay the accounting for system time. For s390
      the accounting is done once per timer tick and for each task switch.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Rebase against latest linus tree and move account_system_index_scaled(). ]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-10-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b7394a5f
    • F
      sched/cputime, ia64: Accumulate cputime and account only on tick/task switch · 7dd58230
      Frederic Weisbecker 提交于
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7dd58230
    • F
      sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch · a19ff1a2
      Frederic Weisbecker 提交于
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a19ff1a2
    • F
      sched/cputime, powerpc: Migrate stolen_time field to the accounting structure · f828c3d0
      Frederic Weisbecker 提交于
      That in order to gather all cputime accumulation to the same place.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f828c3d0
    • F
      sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick · 8c8b73c4
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, provide finegrained accumulators to
      powerpc in order to store the cputime until flushing.
      
      While at it, normalize the name of several fields according to common
      cputime naming.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c8b73c4
    • F
      sched/cputime: Export account_guest_time() · 1213699a
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's allow archs to account cputime
      directly to gtime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1213699a
    • F
      sched/cputime: Allow accounting system time using cpustat index · c31cc6a5
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's provide APIs to account system
      time to precise contexts: hardirq, softirq, pure system, ...
      Inspired-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c31cc6a5
    • F
      sched/cputime, ia64: Fix incorrect start cputime assignment on task switch · 8388d214
      Frederic Weisbecker 提交于
      On task switch we must initialize the current cputime of the next task
      using the value of the previous task which got freshly updated.
      
      But we are confusing that with doing the opposite, which should result
      in incorrect cputime accounting.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8388d214
    • F
      sched/cputime, powerpc32: Fix stale scaled stime on context switch · 90d08ba2
      Frederic Weisbecker 提交于
      On context switch with powerpc32, the cputime is accumulated in the
      thread_info struct. So the switching-in task must move forward its
      start time snapshot to the current time in order to later compute the
      delta spent in system mode.
      
      This is what we do for the normal cputime by initializing the starttime
      field to the value of the previous task's starttime which got freshly
      updated.
      
      But we are missing the update of the scaled cputime start time. As a
      result we may be accounting too much scaled cputime later.
      
      Fix this by initializing the scaled cputime the same way we do for
      normal cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90d08ba2
    • L
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • L
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • L
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds 提交于
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a65c9259
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix huge_ptep_set_access_flags() to return "changed" when any of the
         ptes in the contiguous range is changed, not just the last one
      
       - Fix the adr_l assembly macro to work in modules under KASLR
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: assembler: make adr_l work in modules under KASLR
        arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
      a65c9259
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c79d47f1
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "The major fix is the bfa firmware, since the latest 10Gb cards fail
        probing with the current firmware.
      
        The rest is a set of minor fixes: one missed Kconfig dependency
        causing randconfig failures, a missed error return on an error leg, a
        change for how multiqueue waits on a blocked device and a don't reset
        while in reset fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: bfa: Increase requested firmware version to 3.2.5.1
        scsi: snic: Return error code on memory allocation failure
        scsi: fnic: Avoid sending reset to firmware when another reset is in progress
        scsi: qedi: fix build, depends on UIO
        scsi: scsi-mq: Wait for .queue_rq() if necessary
      c79d47f1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6d90b4f9
      Linus Torvalds 提交于
      Pull input updates from Dmitry Torokhov:
       "Small driver fixups"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
        Input: adxl34x - make it enumerable in ACPI environment
        Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
        Input: synaptics-rmi4 - fix F03 build error when serio is module
        Input: xpad - use correct product id for x360w controllers
        Input: synaptics_i2c - change msleep to usleep_range for small msecs
        Input: i8042 - add Pegatron touchpad to noloop table
        Input: joydev - remove unused linux/miscdevice.h include
      6d90b4f9
  2. 13 1月, 2017 11 次提交