1. 08 1月, 2016 1 次提交
  2. 06 1月, 2016 7 次提交
    • P
      perf/core: Collapse more IPI loops · 7b648018
      Peter Zijlstra 提交于
      This patch collapses the two 'hard' cases, which are
      perf_event_{dis,en}able().
      
      I cannot seem to convince myself the current code is correct.
      
      So starting with perf_event_disable(); we don't strictly need to test
      for event->state == ACTIVE, ctx->is_active is enough. If the event is
      not scheduled while the ctx is, __perf_event_disable() still does the
      right thing.  Its a little less efficient to IPI in that case,
      over-all simpler.
      
      For perf_event_enable(); the same goes, but I think that's actually
      broken in its current form. The current condition is: ctx->is_active
      && event->state == OFF, that means it doesn't do anything when
      !ctx->active && event->state == OFF. This is wrong, it should still
      mark the event INACTIVE in that case, otherwise we'll still not try
      and schedule the event once the context becomes active again.
      
      This patch implements the two function using the new
      event_function_call() and does away with the tricky event->state
      tests.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b648018
    • Y
      sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task() · 0905f04e
      Yuyang Du 提交于
      If a newly created task is selected to go to a different CPU in fork
      balance when it wakes up the first time, its load averages should
      not be removed from the source CPU since they are never added to
      it before. The same is also applicable to a never used group entity.
      
      Fix it in remove_entity_load_avg(): when entity's last_update_time
      is 0, simply return. This should precisely identify the case in
      question, because in other migrations, the last_update_time is set
      to 0 after remove_entity_load_avg().
      Reported-by: NSteve Muckle <steve.muckle@linaro.org>
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      [peterz: cfs_rq_last_update_time]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <Juri.Lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0905f04e
    • W
      sched/deadline: Fix the earliest_dl.next logic · 7d92de3a
      Wanpeng Li 提交于
      earliest_dl.next should cache deadline of the earliest ready task that
      is also enqueued in the pushable rbtree, as pull algorithm uses this
      information to find candidates for migration: if the earliest_dl.next
      deadline of source rq is earlier than the earliest_dl.curr deadline of
      destination rq, the task from the source rq can be pulled.
      
      However, current implementation only guarantees that earliest_dl.next is
      the deadline of the next ready task instead of the next pushable task;
      which will result in potentially holding both rqs' lock and find nothing
      to migrate because of affinity constraints. In addition, current logic
      doesn't update the next candidate for pushing in pick_next_task_dl(),
      even if the running task is never eligible.
      
      This patch fixes both problems by updating earliest_dl.next when
      pushable dl task is enqueued/dequeued, similar to what we already do for
      RT.
      Tested-by: NLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1449135730-27202-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d92de3a
    • S
      sched/core: Reset task's lockless wake-queues on fork() · 093e5840
      Sebastian Andrzej Siewior 提交于
      In the following commit:
      
        76751049 ("sched: Implement lockless wake-queues")
      
      we gained lockless wake-queues.
      
      The -RT kernel managed to lockup itself with those. There could be multiple
      attempts for task X to enqueue it for a wakeup _even_ if task X is already
      running.
      
      The reason is that task X could be runnable but not yet on CPU. The the
      task performing the wakeup did not leave the CPU it could performe
      multiple wakeups.
      
      With the proper timming task X could be running and enqueued for a
      wakeup. If this happens while X is performing a fork() then its its
      child will have a !NULL `wake_q` member copied.
      
      This is not a problem as long as the child task does not participate in
      lockless wakeups :)
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 76751049 ("sched: Implement lockless wake-queues")
      Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      093e5840
    • A
      sched/fair: Fix multiplication overflow on 32-bit systems · 9e0e83a1
      Andrey Ryabinin 提交于
      Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
      on 32-bit systems:
      
      	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
      	signed integer overflow:
      	87950 * 47742 cannot be represented in type 'int'
      
      The most likely effect of this bug are bad load average numbers
      resulting in weird scheduling. It's also likely that this can
      persist for a longer time - until the system goes idle for
      a long time so that all load avg numbers get reset.
      
      [ This is the CFS load average metric, not the procfs output, which
        is separate. ]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e0e83a1
    • P
      perf: Fix race in swevent hash · 12ca6ad2
      Peter Zijlstra 提交于
      There's a race on CPU unplug where we free the swevent hash array
      while it can still have events on. This will result in a
      use-after-free which is BAD.
      
      Simply do not free the hash array on unplug. This leaves the thing
      around and no use-after-free takes place.
      
      When the last swevent dies, we do a for_each_possible_cpu() iteration
      anyway to clean these up, at which time we'll free it, so no leakage
      will occur.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12ca6ad2
    • P
      perf: Fix race in perf_event_exec() · c1274499
      Peter Zijlstra 提交于
      I managed to tickle this warning:
      
        [ 2338.884942] ------------[ cut here ]------------
        [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
        [ 2338.900504] Modules linked in:
        [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
        [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
        [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
        [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
        [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
        [ 2338.947987] Call Trace:
        [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
        [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
        [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
        [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
        [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
        [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
        [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
        [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
        [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
        [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
        [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
        [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
        [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
        [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
      
      Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
      what we expected it to be.
      
      This is because context switches can swap the task_struct::perf_event_ctxp[]
      pointer around. Therefore you have to either disable preemption when looking
      at current, or hold ctx->lock.
      
      Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
      before disabling interrupts, therefore a preemption in the right place
      can swap contexts around and we're using the wrong one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1274499
  3. 05 1月, 2016 1 次提交
    • Q
      tracing: Fix setting of start_index in find_next() · f36d1be2
      Qiu Peiyang 提交于
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Cc: stable@vger.kernel.org # 3.12+
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: NQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f36d1be2
  4. 20 12月, 2015 6 次提交
  5. 18 12月, 2015 1 次提交
  6. 14 12月, 2015 2 次提交
  7. 13 12月, 2015 1 次提交
    • C
      kernel: remove stop_machine() Kconfig dependency · 86fffe4a
      Chris Wilson 提交于
      Currently the full stop_machine() routine is only enabled on SMP if
      module unloading is enabled, or if the CPUs are hotpluggable.  This
      leads to configurations where stop_machine() is broken as it will then
      only run the callback on the local CPU with irqs disabled, and not stop
      the other CPUs or run the callback on them.
      
      For example, this breaks MTRR setup on x86 in certain configs since
      ea8596bb ("kprobes/x86: Remove unused text_poke_smp() and
      text_poke_smp_batch() functions") as the MTRR is only established on the
      boot CPU.
      
      This patch removes the Kconfig option for STOP_MACHINE and uses the SMP
      and HOTPLUG_CPU config options to compile the correct stop_machine() for
      the architecture, removing the false dependency on MODULE_UNLOAD in the
      process.
      
      Link: https://lkml.org/lkml/2014/10/8/124
      References: https://bugs.freedesktop.org/show_bug.cgi?id=84794Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Pranith Kumar <bobby.prani@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Vladimir Davydov <vdavydov@parallels.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Iulia Manda <iulia.manda21@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Chuck Ebbert <cebbert.lkml@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      86fffe4a
  8. 08 12月, 2015 6 次提交
  9. 06 12月, 2015 6 次提交
    • P
      perf/core: Collapse common IPI pattern · 0017960f
      Peter Zijlstra 提交于
      Various functions implement the same pattern to send IPIs to an
      event's CPU. Collapse the easy ones in a common helper function to
      reduce duplication.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0017960f
    • J
      perf: Do not send exit event twice · 4e93ad60
      Jiri Olsa 提交于
      In case we monitor events system wide, we get EXIT event
      (when configured) twice for each task that exited.
      
      Note doubled lines with same pid/tid in following example:
      
        $ sudo ./perf record -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.480 MB perf.data (2518 samples) ]
        $ sudo ./perf report -D | grep EXIT
      
        0 60290687567581 0x59910 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687568354 0x59948 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687988744 0x59ad8 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        0 60290687989198 0x59b10 [0x38]: PERF_RECORD_EXIT(1250:1250):(1250:1250)
        1 60290692567895 0x62af0 [0x38]: PERF_RECORD_EXIT(1253:1253):(1253:1253)
        1 60290692568322 0x62b28 [0x38]: PERF_RECORD_EXIT(1253:1253):(1253:1253)
        2 60290692739276 0x69a18 [0x38]: PERF_RECORD_EXIT(1252:1252):(1252:1252)
        2 60290692739910 0x69a50 [0x38]: PERF_RECORD_EXIT(1252:1252):(1252:1252)
      
      The reason is that the cpu contexts are processes each time
      we call perf_event_task. I'm changing the perf_event_aux logic
      to serve task_ctx and cpu contexts separately, which ensure we
      don't get EXIT event generated twice on same cpu context.
      
      This does not affect other auxiliary events, as they don't
      use task_ctx at all.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1446649205-5822-1-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e93ad60
    • P
      rcutorture: Print symbolic name for ->gp_state · 6b50e119
      Paul E. McKenney 提交于
      Currently, ->gp_state is printed as an integer, which slows debugging.
      This commit therefore prints a symbolic name in addition to the integer.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Updated to fix relational operator called out by Dan Carpenter. ]
      [ paulmck: More "const", as suggested by Josh Triplett. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      6b50e119
    • P
      rcutorture: Print symbolic name for rcu_torture_writer_state · 18aff33e
      Paul E. McKenney 提交于
      Currently, rcu_torture_writer_state is printed as an integer, which slows
      debugging.  This commit therefore prints a symbolic name in addition to
      the integer.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: More "const", as suggested by Josh Triplett. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      18aff33e
    • P
      rcutorture: Dump stack when GP kthread stalls · b1adb3e2
      Paul E. McKenney 提交于
      This commit increases debug information in the case where the grace-period
      kthread is being prevented from running by dumping that kthread's stack.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Split into prior commit and this commit, as suggested by
        Josh Triplett. ]
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      b1adb3e2
    • P
      rcutorture: Flag nonexistent RCU GP kthread · a0e3a3aa
      Paul E. McKenney 提交于
      Currently, if the RCU grace-period kthread has not yet been created,
      in which case the starvation-check code will print zero for the state,
      which maps to TASK_RUNNING.  This could clearly be quite confusing, so
      this commit prints ~0, which does not map to any legal ->state value.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      a0e3a3aa
  10. 05 12月, 2015 9 次提交