1. 08 1月, 2016 1 次提交
  2. 06 1月, 2016 7 次提交
    • P
      perf/core: Collapse more IPI loops · 7b648018
      Peter Zijlstra 提交于
      This patch collapses the two 'hard' cases, which are
      perf_event_{dis,en}able().
      
      I cannot seem to convince myself the current code is correct.
      
      So starting with perf_event_disable(); we don't strictly need to test
      for event->state == ACTIVE, ctx->is_active is enough. If the event is
      not scheduled while the ctx is, __perf_event_disable() still does the
      right thing.  Its a little less efficient to IPI in that case,
      over-all simpler.
      
      For perf_event_enable(); the same goes, but I think that's actually
      broken in its current form. The current condition is: ctx->is_active
      && event->state == OFF, that means it doesn't do anything when
      !ctx->active && event->state == OFF. This is wrong, it should still
      mark the event INACTIVE in that case, otherwise we'll still not try
      and schedule the event once the context becomes active again.
      
      This patch implements the two function using the new
      event_function_call() and does away with the tricky event->state
      tests.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b648018
    • Y
      sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task() · 0905f04e
      Yuyang Du 提交于
      If a newly created task is selected to go to a different CPU in fork
      balance when it wakes up the first time, its load averages should
      not be removed from the source CPU since they are never added to
      it before. The same is also applicable to a never used group entity.
      
      Fix it in remove_entity_load_avg(): when entity's last_update_time
      is 0, simply return. This should precisely identify the case in
      question, because in other migrations, the last_update_time is set
      to 0 after remove_entity_load_avg().
      Reported-by: NSteve Muckle <steve.muckle@linaro.org>
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      [peterz: cfs_rq_last_update_time]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <Juri.Lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0905f04e
    • W
      sched/deadline: Fix the earliest_dl.next logic · 7d92de3a
      Wanpeng Li 提交于
      earliest_dl.next should cache deadline of the earliest ready task that
      is also enqueued in the pushable rbtree, as pull algorithm uses this
      information to find candidates for migration: if the earliest_dl.next
      deadline of source rq is earlier than the earliest_dl.curr deadline of
      destination rq, the task from the source rq can be pulled.
      
      However, current implementation only guarantees that earliest_dl.next is
      the deadline of the next ready task instead of the next pushable task;
      which will result in potentially holding both rqs' lock and find nothing
      to migrate because of affinity constraints. In addition, current logic
      doesn't update the next candidate for pushing in pick_next_task_dl(),
      even if the running task is never eligible.
      
      This patch fixes both problems by updating earliest_dl.next when
      pushable dl task is enqueued/dequeued, similar to what we already do for
      RT.
      Tested-by: NLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1449135730-27202-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d92de3a
    • S
      sched/core: Reset task's lockless wake-queues on fork() · 093e5840
      Sebastian Andrzej Siewior 提交于
      In the following commit:
      
        76751049 ("sched: Implement lockless wake-queues")
      
      we gained lockless wake-queues.
      
      The -RT kernel managed to lockup itself with those. There could be multiple
      attempts for task X to enqueue it for a wakeup _even_ if task X is already
      running.
      
      The reason is that task X could be runnable but not yet on CPU. The the
      task performing the wakeup did not leave the CPU it could performe
      multiple wakeups.
      
      With the proper timming task X could be running and enqueued for a
      wakeup. If this happens while X is performing a fork() then its its
      child will have a !NULL `wake_q` member copied.
      
      This is not a problem as long as the child task does not participate in
      lockless wakeups :)
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 76751049 ("sched: Implement lockless wake-queues")
      Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      093e5840
    • A
      sched/fair: Fix multiplication overflow on 32-bit systems · 9e0e83a1
      Andrey Ryabinin 提交于
      Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
      on 32-bit systems:
      
      	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
      	signed integer overflow:
      	87950 * 47742 cannot be represented in type 'int'
      
      The most likely effect of this bug are bad load average numbers
      resulting in weird scheduling. It's also likely that this can
      persist for a longer time - until the system goes idle for
      a long time so that all load avg numbers get reset.
      
      [ This is the CFS load average metric, not the procfs output, which
        is separate. ]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e0e83a1
    • P
      perf: Fix race in swevent hash · 12ca6ad2
      Peter Zijlstra 提交于
      There's a race on CPU unplug where we free the swevent hash array
      while it can still have events on. This will result in a
      use-after-free which is BAD.
      
      Simply do not free the hash array on unplug. This leaves the thing
      around and no use-after-free takes place.
      
      When the last swevent dies, we do a for_each_possible_cpu() iteration
      anyway to clean these up, at which time we'll free it, so no leakage
      will occur.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12ca6ad2
    • P
      perf: Fix race in perf_event_exec() · c1274499
      Peter Zijlstra 提交于
      I managed to tickle this warning:
      
        [ 2338.884942] ------------[ cut here ]------------
        [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
        [ 2338.900504] Modules linked in:
        [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
        [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
        [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
        [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
        [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
        [ 2338.947987] Call Trace:
        [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
        [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
        [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
        [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
        [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
        [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
        [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
        [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
        [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
        [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
        [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
        [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
        [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
        [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
      
      Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
      what we expected it to be.
      
      This is because context switches can swap the task_struct::perf_event_ctxp[]
      pointer around. Therefore you have to either disable preemption when looking
      at current, or hold ctx->lock.
      
      Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
      before disabling interrupts, therefore a preemption in the right place
      can swap contexts around and we're using the wrong one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1274499
  3. 05 1月, 2016 1 次提交
    • Q
      tracing: Fix setting of start_index in find_next() · f36d1be2
      Qiu Peiyang 提交于
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Cc: stable@vger.kernel.org # 3.12+
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: NQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f36d1be2
  4. 29 12月, 2015 1 次提交
  5. 20 12月, 2015 6 次提交
  6. 19 12月, 2015 4 次提交
    • Y
      clocksource: Make clocksource validation work for all clocksources · 1f45f1f3
      Yang Yingliang 提交于
      The clocksource validation which makes sure that the newly read value
      is not smaller than the last value only works if the clocksource mask
      is 64bit, i.e. the counter is 64bit wide. But we want to use that
      mechanism also for clocksources which are less than 64bit wide.
      
      So instead of checking whether bit 63 is set, we check whether the
      most significant bit of the clocksource mask is set in the delta
      result. If it is set, we return 0.
      
      [ tglx: Simplified the implementation, added a comment and massaged
        	the commit message ]
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Cc: <linux-arm-kernel@lists.infradead.org>
      Link: http://lkml.kernel.org/r/56349607.6070708@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1f45f1f3
    • H
      kexec: Fix race between panic() and crash_kexec() · 7bbee5ca
      Hidehiro Kawai 提交于
      Currently, panic() and crash_kexec() can be called at the same time.
      For example (x86 case):
      
      CPU 0:
        oops_end()
          crash_kexec()
            mutex_trylock() // acquired
              nmi_shootdown_cpus() // stop other CPUs
      
      CPU 1:
        panic()
          crash_kexec()
            mutex_trylock() // failed to acquire
          smp_send_stop() // stop other CPUs
          infinite loop
      
      If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
      fails.
      
      In another case:
      
      CPU 0:
        oops_end()
          crash_kexec()
            mutex_trylock() // acquired
              <NMI>
              io_check_error()
                panic()
                  crash_kexec()
                    mutex_trylock() // failed to acquire
                  infinite loop
      
      Clearly, this is an undesirable result.
      
      To fix this problem, this patch changes crash_kexec() to exclude others
      by using the panic_cpu atomic.
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Minfei Huang <mnfhuang@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Link: http://lkml.kernel.org/r/20151210014630.25437.94161.stgit@softrsSigned-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7bbee5ca
    • H
      panic, x86: Allow CPUs to save registers even if looping in NMI context · 58c5661f
      Hidehiro Kawai 提交于
      Currently, kdump_nmi_shootdown_cpus(), a subroutine of crash_kexec(),
      sends an NMI IPI to CPUs which haven't called panic() to stop them,
      save their register information and do some cleanups for crash dumping.
      However, if such a CPU is infinitely looping in NMI context, we fail to
      save its register information into the crash dump.
      
      For example, this can happen when unknown NMIs are broadcast to all
      CPUs as follows:
      
        CPU 0                             CPU 1
        ===========================       ==========================
        receive an unknown NMI
        unknown_nmi_error()
          panic()                         receive an unknown NMI
            spin_trylock(&panic_lock)     unknown_nmi_error()
            crash_kexec()                   panic()
                                              spin_trylock(&panic_lock)
                                              panic_smp_self_stop()
                                                infinite loop
              kdump_nmi_shootdown_cpus()
                issue NMI IPI -----------> blocked until IRET
                                                infinite loop...
      
      Here, since CPU 1 is in NMI context, the second NMI from CPU 0 is
      blocked until CPU 1 executes IRET. However, CPU 1 never executes IRET,
      so the NMI is not handled and the callback function to save registers is
      never called.
      
      In practice, this can happen on some servers which broadcast NMIs to all
      CPUs when the NMI button is pushed.
      
      To save registers in this case, we need to:
      
        a) Return from NMI handler instead of looping infinitely
        or
        b) Call the callback function directly from the infinite loop
      
      Inherently, a) is risky because NMI is also used to prevent corrupted
      data from being propagated to devices.  So, we chose b).
      
      This patch does the following:
      
      1. Move the infinite looping of CPUs which haven't called panic() in NMI
         context (actually done by panic_smp_self_stop()) outside of panic() to
         enable us to refer pt_regs. Please note that panic_smp_self_stop() is
         still used for normal context.
      
      2. Call a callback of kdump_nmi_shootdown_cpus() directly to save
         registers and do some cleanups after setting waiting_for_crash_ipi which
         is used for counting down the number of CPUs which handled the callback
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Stefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20151210014628.25437.75256.stgit@softrs
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      58c5661f
    • H
      panic, x86: Fix re-entrance problem due to panic on NMI · 1717f209
      Hidehiro Kawai 提交于
      If panic on NMI happens just after panic() on the same CPU, panic() is
      recursively called. Kernel stalls, as a result, after failing to acquire
      panic_lock.
      
      To avoid this problem, don't call panic() in NMI context if we've
      already entered panic().
      
      For that, introduce nmi_panic() macro to reduce code duplication. In
      the case of panic on NMI, don't return from NMI handlers if another CPU
      already panicked.
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Hildenbrand <dahi@linux.vnet.ibm.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Gobinda Charan Maji <gobinda.cemk07@gmail.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: kexec@lists.infradead.org
      Cc: linux-doc@vger.kernel.org
      Cc: lkml <linux-kernel@vger.kernel.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Link: http://lkml.kernel.org/r/20151210014626.25437.13302.stgit@softrs
      [ Cleanup comments, fixup formatting. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      1717f209
  7. 18 12月, 2015 1 次提交
  8. 17 12月, 2015 4 次提交
    • J
      timekeeping: Cap adjustments so they don't exceed the maxadj value · ec02b076
      John Stultz 提交于
      Thus its been occasionally noted that users have seen
      confusing warnings like:
      
          Adjusting tsc more than 11% (5941981 vs 7759439)
      
      We try to limit the maximum total adjustment to 11% (10% tick
      adjustment + 0.5% frequency adjustment). But this is done by
      bounding the requested adjustment values, and the internal
      steering that is done by tracking the error from what was
      requested and what was applied, does not have any such limits.
      
      This is usually not problematic, but in some cases has a risk
      that an adjustment could cause the clocksource mult value to
      overflow, so its an indication things are outside of what is
      expected.
      
      It ends up most of the reports of this 11% warning are on systems
      using chrony, which utilizes the adjtimex() ADJ_TICK interface
      (which allows a +-10% adjustment). The original rational for
      ADJ_TICK unclear to me but my assumption it was originally added
      to allow broken systems to get a big constant correction at boot
      (see adjtimex userspace package for an example) which would allow
      the system to work w/ ntpd's 0.5% adjustment limit.
      
      Chrony uses ADJ_TICK to make very aggressive short term corrections
      (usually right at startup). Which push us close enough to the max
      bound that a few late ticks can cause the internal steering to push
      past the max adjust value (tripping the warning).
      
      Thus this patch adds some extra logic to enforce the max adjustment
      cap in the internal steering.
      
      Note: This has the potential to slow corrections when the ADJ_TICK
      value is furthest away from the default value. So it would be good to
      get some testing from folks using chrony, to make sure we don't
      cause any troubles there.
      
      Cc: Miroslav Lichvar <mlichvar@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Tested-by: NMiroslav Lichvar <mlichvar@redhat.com>
      Reported-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      ec02b076
    • D
      ntp: Fix second_overflow's input parameter type to be 64bits · c7963487
      DengChao 提交于
      The function "second_overflow" uses "unsign long"
      as its input parameter type which will overflow after
      year 2106 on 32bit systems.
      
      Thus this patch replaces it with time64_t type.
      
      While the 64-bit division is expensive, "next_ntp_leap_sec"
      has been calculated already, so we can just re-use it in the
      TIME_INS/DEL cases, allowing one expensive division per
      leapsecond instead of re-doing the divsion once a second after
      the leap flag has been set.
      Signed-off-by: NDengChao <chao.deng@linaro.org>
      [jstultz: Tweaked commit message]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      c7963487
    • D
      ntp: Change time_reftime to time64_t and utilize 64bit __ktime_get_real_seconds · 0af86465
      DengChao 提交于
      The type of static variant "time_reftime" and the call of
      get_seconds in ntp are both not y2038 safe.
      
      So change the type of time_reftime to time64_t and replace
      get_seconds with __ktime_get_real_seconds.
      
      The local variant "secs" in ntp_update_offset represents
      seconds between now and last ntp adjustment, it seems impossible
      that this time will last more than 68 years, so keep its type as
      "long".
      Reviewed-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NDengChao <chao.deng@linaro.org>
      [jstultz: Tweaked commit message]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      0af86465
    • D
      timekeeping: Provide internal function __ktime_get_real_seconds · dee36654
      DengChao 提交于
      In order to fix Y2038 issues in the ntp code we will need replace
      get_seconds() with ktime_get_real_seconds() but as the ntp code uses
      the timekeeping lock which is also used by ktime_get_real_seconds(),
      we need a version without locking.
      Add a new function __ktime_get_real_seconds() in timekeeping to
      do this.
      Reviewed-by: NJohn Stultz <john.stultz@linaro.org>
      Signed-off-by: NDengChao <chao.deng@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      dee36654
  9. 14 12月, 2015 2 次提交
  10. 13 12月, 2015 1 次提交
    • C
      kernel: remove stop_machine() Kconfig dependency · 86fffe4a
      Chris Wilson 提交于
      Currently the full stop_machine() routine is only enabled on SMP if
      module unloading is enabled, or if the CPUs are hotpluggable.  This
      leads to configurations where stop_machine() is broken as it will then
      only run the callback on the local CPU with irqs disabled, and not stop
      the other CPUs or run the callback on them.
      
      For example, this breaks MTRR setup on x86 in certain configs since
      ea8596bb ("kprobes/x86: Remove unused text_poke_smp() and
      text_poke_smp_batch() functions") as the MTRR is only established on the
      boot CPU.
      
      This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
      and HOTPLUG_CPU config options to compile the correct stop_machine() for
      the architecture, removing the false dependency on MODULE_UNLOAD in the
      process.
      
      Link: https://lkml.org/lkml/2014/10/8/124
      References: https://bugs.freedesktop.org/show_bug.cgi?id=84794Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86fffe4a
  11. 11 12月, 2015 2 次提交
    • J
      time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow · 37cf4dc3
      John Stultz 提交于
      For adjtimex()'s ADJ_SETOFFSET, make sure the tv_usec value is
      sane. We might multiply them later which can cause an overflow
      and undefined behavior.
      
      This patch introduces new helper functions to simplify the
      checking code and adds comments to clarify
      
      Orginally this patch was by Sasha Levin, but I've basically
      rewritten it, so he should get credit for finding the issue
      and I should get the blame for any mistakes made since.
      
      Also, credit to Richard Cochran for the phrasing used in the
      comment for what is considered valid here.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      37cf4dc3
    • S
      ntp: Verify offset doesn't overflow in ntp_update_offset · 52d189f1
      Sasha Levin 提交于
      We need to make sure that the offset is valid before manipulating it,
      otherwise it might overflow on the multiplication.
      
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      [jstultz: Reworked one of the checks so it makes more sense]
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      52d189f1
  12. 08 12月, 2015 8 次提交
  13. 06 12月, 2015 2 次提交
    • P
      perf/core: Collapse common IPI pattern · 0017960f
      Peter Zijlstra 提交于
      Various functions implement the same pattern to send IPIs to an
      event's CPU. Collapse the easy ones in a common helper function to
      reduce duplication.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0017960f
    • J
      perf: Do not send exit event twice · 4e93ad60
      Jiri Olsa 提交于
      In case we monitor events system wide, we get EXIT event
      (when configured) twice for each task that exited.
      
      Note doubled lines with same pid/tid in following example:
      
        $ sudo ./perf record -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.480 MB perf.data (2518 samples) ]
        $ sudo ./perf report -D | grep EXIT
      
        0 60290687567581 0x59910 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687568354 0x59948 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687988744 0x59ad8 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687989198 0x59b10 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        1 60290692567895 0x62af0 [0x38]: PERF_RECORD_EXIT(1253:1253):(1253:1253)
        1 60290692568322 0x62b28 [0x38]: PERF_RECORD_EXIT(1253:1253):(1253:1253)
        2 60290692739276 0x69a18 [0x38]: PERF_RECORD_EXIT(1252:1252):(1252:1252)
        2 60290692739910 0x69a50 [0x38]: PERF_RECORD_EXIT(1252:1252):(1252:1252)
      
      The reason is that the cpu contexts are processes each time
      we call perf_event_task. I'm changing the perf_event_aux logic
      to serve task_ctx and cpu contexts separately, which ensure we
      don't get EXIT event generated twice on same cpu context.
      
      This does not affect other auxiliary events, as they don't
      use task_ctx at all.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1446649205-5822-1-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e93ad60