1. 14 1月, 2017 24 次提交
    • P
      sched/core: Add missing update_rq_clock() call in set_user_nice() · 2fb8d367
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          set_next_entity()
          ? _raw_spin_lock()
          set_curr_task_fair()
          set_user_nice.part.85()
          set_user_nice()
          create_worker()
          worker_thread()
          kthread()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2fb8d367
    • P
      sched/core: Add missing update_rq_clock() call for task_hot() · 3bed5e21
      Peter Zijlstra 提交于
      Add the update_rq_clock() call at the top of the callstack instead of
      at the bottom where we find it missing, this to aid later effort to
      minimize the number of update_rq_lock() calls.
      
        WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          assert_clock_updated.isra.63.part.64()
          can_migrate_task()
          load_balance()
          pick_next_task_fair()
          __schedule()
          schedule()
          worker_thread()
          kthread()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3bed5e21
    • P
      sched/core: Add missing update_rq_clock() in detach_task_cfs_rq() · 80f5c1b8
      Peter Zijlstra 提交于
      Instead of adding the update_rq_clock() all the way at the bottom of
      the callstack, add one at the top, this to aid later effort to
      minimize update_rq_lock() calls.
      
        WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          dump_stack()
          __warn()
          warn_slowpath_fmt()
          detach_task_cfs_rq()
          switched_from_fair()
          __sched_setscheduler()
          _sched_setscheduler()
          sched_set_stop_task()
          cpu_stop_create()
          __smpboot_create_thread.part.2()
          smpboot_register_percpu_thread_cpumask()
          cpu_stop_init()
          do_one_initcall()
          ? print_cpu_info()
          kernel_init_freeable()
          ? rest_init()
          kernel_init()
          ret_from_fork()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      80f5c1b8
    • P
      sched/core: Add missing update_rq_clock() in post_init_entity_util_avg() · 4126bad6
      Peter Zijlstra 提交于
      Address this rq-clock update bug:
      
        WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
        rq->clock_update_flags < RQCF_ACT_SKIP
      
        Call Trace:
          __warn()
          post_init_entity_util_avg()
          wake_up_new_task()
          _do_fork()
          kernel_thread()
          rest_init()
          start_kernel()
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4126bad6
    • M
      sched/fair: Push rq lock pin/unpin into idle_balance() · 46f69fa3
      Matt Fleming 提交于
      Future patches will emit warnings if rq_clock() is called before
      update_rq_clock() inside a rq_pin_lock()/rq_unpin_lock() pair.
      
      Since there is only one caller of idle_balance() we can push the
      unpin/repin there.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-7-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      46f69fa3
    • M
      sched/core: Reset RQCF_ACT_SKIP before unpinning rq->lock · 92509b73
      Matt Fleming 提交于
      rq_clock() is called from sched_info_{depart,arrive}() after resetting
      RQCF_ACT_SKIP but prior to a call to update_rq_clock().
      
      In preparation for pending patches that check whether the rq clock has
      been updated inside of a pin context before rq_clock() is called, move
      the reset of rq->clock_skip_update immediately before unpinning the rq
      lock.
      
      This will avoid the new warnings which check if update_rq_clock() is
      being actively skipped.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-6-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92509b73
    • M
      sched/core: Add wrappers for lockdep_(un)pin_lock() · d8ac8971
      Matt Fleming 提交于
      In preparation for adding diagnostic checks to catch missing calls to
      update_rq_clock(), provide wrappers for (re)pinning and unpinning
      rq->lock.
      
      Because the pending diagnostic checks allow state to be maintained in
      rq_flags across pin contexts, swap the 'struct pin_cookie' arguments
      for 'struct rq_flags *'.
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luca Abeni <luca.abeni@unitn.it>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yuyang Du <yuyang.du@intel.com>
      Link: http://lkml.kernel.org/r/20160921133813.31976-5-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d8ac8971
    • F
      sched/cputime: Rename vtime_account_user() to vtime_flush() · c8d7dabf
      Frederic Weisbecker 提交于
      CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y used to accumulate user time and
      account it on ticks and context switches only through the
      vtime_account_user() function.
      
      Now this model has been generalized on the 3 archs for all kind of
      cputime (system, irq, ...) and all the cputime flushing happens under
      vtime_account_user().
      
      So let's rename this function to better reflect its new role.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-11-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c8d7dabf
    • M
      sched/cputime, s390: Implement delayed accounting of system time · b7394a5f
      Martin Schwidefsky 提交于
      The account_system_time() function is called with a cputime that
      occurred while running in the kernel. The function detects which
      context the CPU is currently running in and accounts the time to
      the correct bucket. This forces the arch code to account the
      cputime for hardirq and softirq immediately.
      
      Such accounting function can be costly and perform unwelcome divisions
      and multiplications, among others.
      
      The arch code can delay the accounting for system time. For s390
      the accounting is done once per timer tick and for each task switch.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      [ Rebase against latest linus tree and move account_system_index_scaled(). ]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-10-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b7394a5f
    • F
      sched/cputime, ia64: Accumulate cputime and account only on tick/task switch · 7dd58230
      Frederic Weisbecker 提交于
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7dd58230
    • F
      sched/cputime, powerpc/vtime: Accumulate cputime and account only on tick/task switch · a19ff1a2
      Frederic Weisbecker 提交于
      Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
      any context boundary: irq entry/exit, guest entry/exit, context switch,
      etc...
      
      Calling functions such as account_system_time(), account_user_time()
      and such can be costly, especially if they are called on many fastpath
      such as twice per IRQ. Those functions do more than just accounting to
      kcpustat and task cputime. Depending on the config, some subsystems can
      perform unpleasant multiplications and divisions, among other things.
      
      So lets accumulate the cputime instead and delay the accounting on ticks
      and context switches only.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-8-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a19ff1a2
    • F
      sched/cputime, powerpc: Migrate stolen_time field to the accounting structure · f828c3d0
      Frederic Weisbecker 提交于
      That in order to gather all cputime accumulation to the same place.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-7-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f828c3d0
    • F
      sched/cputime, powerpc: Prepare accounting structure for cputime flush on tick · 8c8b73c4
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, provide finegrained accumulators to
      powerpc in order to store the cputime until flushing.
      
      While at it, normalize the name of several fields according to common
      cputime naming.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-6-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c8b73c4
    • F
      sched/cputime: Export account_guest_time() · 1213699a
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's allow archs to account cputime
      directly to gtime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-5-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1213699a
    • F
      sched/cputime: Allow accounting system time using cpustat index · c31cc6a5
      Frederic Weisbecker 提交于
      In order to prepare for CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y to delay
      cputime accounting to the tick, let's provide APIs to account system
      time to precise contexts: hardirq, softirq, pure system, ...
      Inspired-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-4-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c31cc6a5
    • F
      sched/cputime, ia64: Fix incorrect start cputime assignment on task switch · 8388d214
      Frederic Weisbecker 提交于
      On task switch we must initialize the current cputime of the next task
      using the value of the previous task which got freshly updated.
      
      But we are confusing that with doing the opposite, which should result
      in incorrect cputime accounting.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8388d214
    • F
      sched/cputime, powerpc32: Fix stale scaled stime on context switch · 90d08ba2
      Frederic Weisbecker 提交于
      On context switch with powerpc32, the cputime is accumulated in the
      thread_info struct. So the switching-in task must move forward its
      start time snapshot to the current time in order to later compute the
      delta spent in system mode.
      
      This is what we do for the normal cputime by initializing the starttime
      field to the value of the previous task's starttime which got freshly
      updated.
      
      But we are missing the update of the scaled cputime start time. As a
      result we may be accounting too much scaled cputime later.
      
      Fix this by initializing the scaled cputime the same way we do for
      normal cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Link: http://lkml.kernel.org/r/1483636310-6557-2-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      90d08ba2
    • L
      Merge branch 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · e96f8f18
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "These are all over the place.
      
        The tracepoint part of the pull fixes a crash and adds a little more
        information to two tracepoints, while the rest are good old fashioned
        fixes"
      
      * 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
        btrfs: make tracepoint format strings more compact
        Btrfs: add truncated_len for ordered extent tracepoints
        Btrfs: add 'inode' for extent map tracepoint
        btrfs: fix crash when tracepoint arguments are freed by wq callbacks
        Btrfs: adjust outstanding_extents counter properly when dio write is split
        Btrfs: fix lockdep warning about log_mutex
        Btrfs: use down_read_nested to make lockdep silent
        btrfs: fix locking when we put back a delayed ref that's too new
        btrfs: fix error handling when run_delayed_extent_op fails
        btrfs: return the actual error value from  from btrfs_uuid_tree_iterate
      e96f8f18
    • L
      Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client · 04e39627
      Linus Torvalds 提交于
      Pull ceph fixes from Ilya Dryomov:
       "Two small fixups for the filesystem changes that went into this merge
        window"
      
      * tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
        ceph: fix get_oldest_context()
        ceph: fix mds cluster availability check
      04e39627
    • L
      Merge tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio · af54efa4
      Linus Torvalds 提交于
      Pull VFIO fixes from Alex Williamson:
      
       - Cleanups and bug fixes for the mtty sample driver (Dan Carpenter)
      
       - Export and make use of has_capability() to fix incorrect use of
         ns_capable() for testing task capabilities (Jike Song)
      
      * tag 'vfio-v4.10-rc4' of git://github.com/awilliam/linux-vfio:
        vfio/type1: Remove pid_namespace.h include
        vfio iommu type1: fix the testing of capability for remote task
        capability: export has_capability
        vfio-mdev: remove some dead code
        vfio-mdev: buffer overflow in ioctl()
        vfio-mdev: return -EFAULT if copy_to_user() fails
      af54efa4
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 406732c9
      Linus Torvalds 提交于
      Pull KVM fixes from Paolo Bonzini:
      
       - fix for module unload vs deferred jump labels (note: there might be
         other buggy modules!)
      
       - two NULL pointer dereferences from syzkaller
      
       - also syzkaller: fix emulation of fxsave/fxrstor/sgdt/sidt, problem
         made worse during this merge window, "just" kernel memory leak on
         releases
      
       - fix emulation of "mov ss" - somewhat serious on AMD, less so on Intel
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86: fix emulation of "MOV SS, null selector"
        KVM: x86: fix NULL deref in vcpu_scan_ioapic
        KVM: eventfd: fix NULL deref irqbypass consumer
        KVM: x86: Introduce segmented_write_std
        KVM: x86: flush pending lapic jump label updates on module unload
        jump_labels: API for flushing deferred jump label updates
      406732c9
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · a65c9259
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas:
      
       - Fix huge_ptep_set_access_flags() to return "changed" when any of the
         ptes in the contiguous range is changed, not just the last one
      
       - Fix the adr_l assembly macro to work in modules under KASLR
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: assembler: make adr_l work in modules under KASLR
        arm64: hugetlb: fix the wrong return value for huge_ptep_set_access_flags
      a65c9259
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · c79d47f1
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "The major fix is the bfa firmware, since the latest 10Gb cards fail
        probing with the current firmware.
      
        The rest is a set of minor fixes: one missed Kconfig dependency
        causing randconfig failures, a missed error return on an error leg, a
        change for how multiqueue waits on a blocked device and a don't reset
        while in reset fix"
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        scsi: bfa: Increase requested firmware version to 3.2.5.1
        scsi: snic: Return error code on memory allocation failure
        scsi: fnic: Avoid sending reset to firmware when another reset is in progress
        scsi: qedi: fix build, depends on UIO
        scsi: scsi-mq: Wait for .queue_rq() if necessary
      c79d47f1
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6d90b4f9
      Linus Torvalds 提交于
      Pull input updates from Dmitry Torokhov:
       "Small driver fixups"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: elants_i2c - avoid divide by 0 errors on bad touchscreen data
        Input: adxl34x - make it enumerable in ACPI environment
        Input: ALPS - fix TrackStick Y axis handling for SS5 hardware
        Input: synaptics-rmi4 - fix F03 build error when serio is module
        Input: xpad - use correct product id for x360w controllers
        Input: synaptics_i2c - change msleep to usleep_range for small msecs
        Input: i8042 - add Pegatron touchpad to noloop table
        Input: joydev - remove unused linux/miscdevice.h include
      6d90b4f9
  2. 13 1月, 2017 12 次提交
  3. 12 1月, 2017 4 次提交
    • P
      KVM: x86: fix emulation of "MOV SS, null selector" · 33ab9110
      Paolo Bonzini 提交于
      This is CVE-2017-2583.  On Intel this causes a failed vmentry because
      SS's type is neither 3 nor 7 (even though the manual says this check is
      only done for usable SS, and the dmesg splat says that SS is unusable!).
      On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.
      
      The fix fabricates a data segment descriptor when SS is set to a null
      selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
      Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
      this in turn ensures CPL < 3 because RPL must be equal to CPL.
      
      Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
      the bug and deciphering the manuals.
      Reported-by: NXiaohan Zhang <zhangxiaohan1@huawei.com>
      Fixes: 79d5b4c3
      Cc: stable@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      33ab9110
    • J
      capability: export has_capability · 19c816e8
      Jike Song 提交于
      has_capability() is sometimes needed by modules to test capability
      for specified task other than current, so export it.
      
      Cc: Kirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: NJike Song <jike.song@intel.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      19c816e8
    • W
      KVM: x86: fix NULL deref in vcpu_scan_ioapic · 546d87e5
      Wanpeng Li 提交于
      Reported by syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0
          IP: _raw_spin_lock+0xc/0x30
          PGD 3e28eb067
          PUD 3f0ac6067
          PMD 0
          Oops: 0002 [#1] SMP
          CPU: 0 PID: 2431 Comm: test Tainted: G           OE   4.10.0-rc1+ #3
          Call Trace:
           ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm]
           kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm]
           ? pick_next_task_fair+0xe1/0x4e0
           ? kvm_arch_vcpu_load+0xea/0x260 [kvm]
           kvm_vcpu_ioctl+0x33a/0x600 [kvm]
           ? hrtimer_try_to_cancel+0x29/0x130
           ? do_nanosleep+0x97/0xf0
           do_vfs_ioctl+0xa1/0x5d0
           ? __hrtimer_init+0x90/0x90
           ? do_nanosleep+0x5b/0xf0
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x6e/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0
      
      The syzkaller folks reported a NULL pointer dereference due to
      ENABLE_CAP succeeding even without an irqchip.  The Hyper-V
      synthetic interrupt controller is activated, resulting in a
      wrong request to rescan the ioapic and a NULL pointer dereference.
      
          #include <sys/ioctl.h>
          #include <sys/mman.h>
          #include <sys/types.h>
          #include <linux/kvm.h>
          #include <pthread.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <stdlib.h>
          #include <string.h>
          #include <unistd.h>
      
          #ifndef KVM_CAP_HYPERV_SYNIC
          #define KVM_CAP_HYPERV_SYNIC 123
          #endif
      
          void* thr(void* arg)
          {
      	struct kvm_enable_cap cap;
      	cap.flags = 0;
      	cap.cap = KVM_CAP_HYPERV_SYNIC;
      	ioctl((long)arg, KVM_ENABLE_CAP, &cap);
      	return 0;
          }
      
          int main()
          {
      	void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE,
      			MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
      	int kvmfd = open("/dev/kvm", 0);
      	int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
      	struct kvm_userspace_memory_region memreg;
      	memreg.slot = 0;
      	memreg.flags = 0;
      	memreg.guest_phys_addr = 0;
      	memreg.memory_size = 0x1000;
      	memreg.userspace_addr = (unsigned long)host_mem;
      	host_mem[0] = 0xf4;
      	ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg);
      	int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0);
      	struct kvm_sregs sregs;
      	ioctl(cpufd, KVM_GET_SREGS, &sregs);
      	sregs.cr0 = 0;
      	sregs.cr4 = 0;
      	sregs.efer = 0;
      	sregs.cs.selector = 0;
      	sregs.cs.base = 0;
      	ioctl(cpufd, KVM_SET_SREGS, &sregs);
      	struct kvm_regs regs = { .rflags = 2 };
      	ioctl(cpufd, KVM_SET_REGS, &regs);
      	ioctl(vmfd, KVM_CREATE_IRQCHIP, 0);
      	pthread_t th;
      	pthread_create(&th, 0, thr, (void*)(long)cpufd);
      	usleep(rand() % 10000);
      	ioctl(cpufd, KVM_RUN, 0);
      	pthread_join(th, 0);
      	return 0;
          }
      
      This patch fixes it by failing ENABLE_CAP if without an irqchip.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller)
      Cc: stable@vger.kernel.org # 4.5+
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      546d87e5
    • W
      KVM: eventfd: fix NULL deref irqbypass consumer · 4f3dbdf4
      Wanpeng Li 提交于
      Reported syzkaller:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
          IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          PGD 0
      
          Oops: 0002 [#1] SMP
          CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
          Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
          task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
          RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
          Call Trace:
           irqfd_shutdown+0x66/0xa0 [kvm]
           process_one_work+0x16b/0x480
           worker_thread+0x4b/0x500
           kthread+0x101/0x140
           ? process_one_work+0x480/0x480
           ? kthread_create_on_node+0x60/0x60
           ret_from_fork+0x25/0x30
          RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
          CR2: 0000000000000008
      
      The syzkaller folks reported a NULL pointer dereference that due to
      unregister an consumer which fails registration before. The syzkaller
      creates two VMs w/ an equal eventfd occasionally. So the second VM
      fails to register an irqbypass consumer. It will make irqfd as inactive
      and queue an workqueue work to shutdown irqfd and unregister the irqbypass
      consumer when eventfd is closed. However, the second consumer has been
      initialized though it fails registration. So the token(same as the first
      VM's) is taken to unregister the consumer through the workqueue, the
      consumer of the first VM is found and unregistered, then NULL deref incurred
      in the path of deleting consumer from the consumers list.
      
      This patch fixes it by making irq_bypass_register/unregister_consumer()
      looks for the consumer entry based on consumer pointer itself instead of
      token matching.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Alex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f3dbdf4