• P
    perf_event: Fix oops triggered by cpu offline/online · 220b140b
    Paul Mackerras 提交于
    Anton Blanchard found that he could reliably make the kernel hit a
    BUG_ON in the slab allocator by taking a cpu offline and then online
    while a system-wide perf record session was running.
    
    The reason is that when the cpu comes up, we completely reinitialize
    the ctx field of the struct perf_cpu_context for the cpu.  If there is
    a system-wide perf record session running, then there will be a struct
    perf_event that has a reference to the context, so its refcount will
    be 2.  (The perf_event has been removed from the context's group_entry
    and event_entry lists by perf_event_exit_cpu(), but that doesn't
    remove the perf_event's reference to the context and doesn't decrement
    the context's refcount.)
    
    When the cpu comes up, perf_event_init_cpu() gets called, and it calls
    __perf_event_init_context() on the cpu's context.  That resets the
    refcount to 1.  Then when the perf record session finishes and the
    perf_event is closed, the refcount gets decremented to 0 and the
    context gets kfreed after an RCU grace period.  Since the context
    wasn't kmalloced -- it's part of a per-cpu variable -- bad things
    happen.
    
    In fact we don't need to completely reinitialize the context when the
    cpu comes up.  It's sufficient to initialize the context once at boot,
    but we need to do it for all possible cpus.
    
    This moves the context initialization to happen at boot time.  With
    this, we don't trash the refcount and the context never gets kfreed,
    and we don't hit the BUG_ON.
    Reported-by: NAnton Blanchard <anton@samba.org>
    Signed-off-by: NPaul Mackerras <paulus@samba.org>
    Tested-by: NAnton Blanchard <anton@samba.org>
    Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: <stable@kernel.org>
    Signed-off-by: NIngo Molnar <mingo@elte.hu>
    220b140b
perf_event.c 126.0 KB