1. 03 9月, 2016 1 次提交
    • A
      perf, bpf: add perf events core support for BPF_PROG_TYPE_PERF_EVENT programs · aa6a5f3c
      Alexei Starovoitov 提交于
      Allow attaching BPF_PROG_TYPE_PERF_EVENT programs to sw and hw perf events
      via overflow_handler mechanism.
      When program is attached the overflow_handlers become stacked.
      The program acts as a filter.
      Returning zero from the program means that the normal perf_event_output handler
      will not be called and sampling event won't be stored in the ring buffer.
      
      The overflow_handler_context==NULL is an additional safety check
      to make sure programs are not attached to hw breakpoints and watchdog
      in case other checks (that prevent that now anyway) get accidentally
      relaxed in the future.
      
      The program refcnt is incremented in case perf_events are inhereted
      when target task is forked.
      Similar to kprobe and tracepoint programs there is no ioctl to
      detach the program or swap already attached program. The user space
      expected to close(perf_event_fd) like it does right now for kprobe+bpf.
      That restriction simplifies the code quite a bit.
      
      The invocation of overflow_handler in __perf_event_overflow() is now
      done via READ_ONCE, since that pointer can be replaced when the program
      is attached while perf_event itself could have been active already.
      There is no need to do similar treatment for event->prog, since it's
      assigned only once before it's accessed.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa6a5f3c
  2. 24 8月, 2016 1 次提交
    • W
      perf/core: Use this_cpu_ptr() when stopping AUX events · 8b6a3fe8
      Will Deacon 提交于
      When tearing down an AUX buf for an event via perf_mmap_close(),
      __perf_event_output_stop() is called on the event's CPU to ensure that
      trace generation is halted before the process of unmapping and
      freeing the buffer pages begins.
      
      The callback is performed via cpu_function_call(), which ensures that it
      runs with interrupts disabled and is therefore not preemptible.
      Unfortunately, the current code grabs the per-cpu context pointer using
      get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair
      the call with put_cpu_ptr(), leading to a preempt_count() imbalance and
      a BUG when freeing the AUX buffer later on:
      
        WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120
        Modules linked in:
        [...]
        Call Trace:
         [<ffffffff813379dd>] dump_stack+0x4f/0x72
         [<ffffffff81059ff6>] __warn+0xc6/0xe0
         [<ffffffff8105a0c8>] warn_slowpath_null+0x18/0x20
         [<ffffffff8112761c>] __rb_free_aux+0x10c/0x120
         [<ffffffff81128163>] rb_free_aux+0x13/0x20
         [<ffffffff8112515e>] perf_mmap_close+0x29e/0x2f0
         [<ffffffff8111da30>] ? perf_iterate_ctx+0xe0/0xe0
         [<ffffffff8115f685>] remove_vma+0x25/0x60
         [<ffffffff81161796>] exit_mmap+0x106/0x140
         [<ffffffff8105725c>] mmput+0x1c/0xd0
         [<ffffffff8105cac3>] do_exit+0x253/0xbf0
         [<ffffffff8105e32e>] do_group_exit+0x3e/0xb0
         [<ffffffff81068d49>] get_signal+0x249/0x640
         [<ffffffff8101c273>] do_signal+0x23/0x640
         [<ffffffff81905f42>] ? _raw_write_unlock_irq+0x12/0x30
         [<ffffffff81905f69>] ? _raw_spin_unlock_irq+0x9/0x10
         [<ffffffff81901896>] ? __schedule+0x2c6/0x710
         [<ffffffff810022a4>] exit_to_usermode_loop+0x74/0x90
         [<ffffffff81002a56>] prepare_exit_to_usermode+0x26/0x30
         [<ffffffff81906d1b>] retint_user+0x8/0x10
      
      This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is
      already disabled by the caller.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 95ff4ca2 ("perf/core: Free AUX pages in unmap path")
      Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8b6a3fe8
  3. 18 8月, 2016 6 次提交
    • D
      perf/core: Check return value of the perf_event_read() IPI · 71e7bc2b
      David Carrillo-Cisneros 提交于
      The call to smp_call_function_single in perf_event_read() may fail if
      an invalid or not online CPU index is passed. Warn user if such bug is
      present and return error.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1471467307-61171-2-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      71e7bc2b
    • M
      perf/core: Enable mapping of the stop filters · 99f5bc9b
      Mathieu Poirier 提交于
      At this time the perf_addr_filter_needs_mmap() function will _not_
      return true on a user space 'stop' filter.  But stop filters need
      exactly the same kind of mapping that range and start filters get.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1468860187-318-4-git-send-email-mathieu.poirier@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      99f5bc9b
    • M
      perf/core: Update filters only on executable mmap · 12b40a23
      Mathieu Poirier 提交于
      Function perf_event_mmap() is called by the MM subsystem each time
      part of a binary is loaded in memory.  There can be several mapping
      for a binary, many times unrelated to the code section.
      
      Each time a section of a binary is mapped address filters are
      updated, event when the map doesn't pertain to the code section.
      The end result is that filters are configured based on the last map
      event that was received rather than the last mapping of the code
      segment.
      
      For example if we have an executable 'main' that calls library
      'libcstest.so.1.0', and that we want to collect traces on code
      that is in that library.  The perf cmd line for this scenario
      would be:
      
        perf record -e cs_etm// --filter 'filter 0x72c/0x40@/opt/lib/libcstest.so.1.0' --per-thread ./main
      
      Resulting in binaries being mapped this way:
      
        root@linaro-nano:~# cat /proc/1950/maps
        00400000-00401000 r-xp 00000000 08:02 33169     /home/linaro/main
        00410000-00411000 r--p 00000000 08:02 33169     /home/linaro/main
        00411000-00412000 rw-p 00001000 08:02 33169     /home/linaro/main
        7fa2464000-7fa2474000 rw-p 00000000 00:00 0
        7fa2474000-7fa25a4000 r-xp 00000000 08:02 543   /lib/aarch64-linux-gnu/libc-2.21.so
        7fa25a4000-7fa25b3000 ---p 00130000 08:02 543   /lib/aarch64-linux-gnu/libc-2.21.so
        7fa25b3000-7fa25b7000 r--p 0012f000 08:02 543   /lib/aarch64-linux-gnu/libc-2.21.so
        7fa25b7000-7fa25b9000 rw-p 00133000 08:02 543   /lib/aarch64-linux-gnu/libc-2.21.so
        7fa25b9000-7fa25bd000 rw-p 00000000 00:00 0
        7fa25bd000-7fa25be000 r-xp 00000000 08:02 38308 /opt/lib/libcstest.so.1.0
        7fa25be000-7fa25cd000 ---p 00001000 08:02 38308 /opt/lib/libcstest.so.1.0
        7fa25cd000-7fa25ce000 r--p 00000000 08:02 38308 /opt/lib/libcstest.so.1.0
        7fa25ce000-7fa25cf000 rw-p 00001000 08:02 38308 /opt/lib/libcstest.so.1.0
        7fa25cf000-7fa25eb000 r-xp 00000000 08:02 574   /lib/aarch64-linux-gnu/ld-2.21.so
        7fa25ef000-7fa25f2000 rw-p 00000000 00:00 0
        7fa25f7000-7fa25f9000 rw-p 00000000 00:00 0
        7fa25f9000-7fa25fa000 r--p 00000000 00:00 0     [vvar]
        7fa25fa000-7fa25fb000 r-xp 00000000 00:00 0     [vdso]
        7fa25fb000-7fa25fc000 r--p 0001c000 08:02 574   /lib/aarch64-linux-gnu/ld-2.21.so
        7fa25fc000-7fa25fe000 rw-p 0001d000 08:02 574   /lib/aarch64-linux-gnu/ld-2.21.so
        7ff2ea8000-7ff2ec9000 rw-p 00000000 00:00 0     [stack]
        root@linaro-nano:~#
      
      Before 'main()' can execute 'libcstest.so.1.0' has to be loaded in
      memory.  Once that has been done perf_event_mmap() has been called
      4 times, with the last map starting at address 0x7fa25ce000 and
      the address filter configured to start filtering when the
      IP has passed over address 0x0x7fa25ce72c (0x7fa25ce000 + 0x72c).
      
      But that is wrong since the code segment for library 'libcstest.so.1.0'
      as been mapped at 0x7fa25bd000, resulting in traces not being
      collected.
      
      This patch corrects the situation by requesting that address
      filters be updated only if the mapped event is for a code
      segment.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1468860187-318-3-git-send-email-mathieu.poirier@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      12b40a23
    • M
      perf/core: Fix file name handling for start/stop filters · 4059ffd0
      Mathieu Poirier 提交于
      Binary file names have to be supplied for both range and start/stop
      filters but the current code only processes the filename if an
      address range filter is specified.  This code adds processing of
      the filename for start/stop filters.
      Signed-off-by: NMathieu Poirier <mathieu.poirier@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1468860187-318-2-git-send-email-mathieu.poirier@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4059ffd0
    • P
      perf/core: Fix event_function_local() · cca20946
      Peter Zijlstra 提交于
      Vincent reported triggering the WARN_ON_ONCE() in event_function_local().
      
      While thinking through cases I noticed that by using event_function()
      directly, we miss the inactive case usually handled by
      event_function_call().
      
      Therefore construct a blend of event_function_call() and
      event_function() that handles the cases relevant to
      event_function_local().
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 4.5+
      Fixes: fae3fde6 ("perf: Collapse and fix event_function_call() users")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      cca20946
    • O
      uprobes: Fix the memcg accounting · 6c4687cc
      Oleg Nesterov 提交于
      __replace_page() wronlgy calls mem_cgroup_cancel_charge() in "success" path,
      it should only do this if page_check_address() fails.
      
      This means that every enable/disable leads to unbalanced mem_cgroup_uncharge()
      from put_page(old_page), it is trivial to underflow the page_counter->count
      and trigger OOM.
      Reported-and-tested-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
      Cc: stable@vger.kernel.org # 3.17+
      Fixes: 00501b53 ("mm: memcontrol: rewrite charge API")
      Link: http://lkml.kernel.org/r/20160817153629.GB29724@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6c4687cc
  4. 10 8月, 2016 2 次提交
    • D
      perf/core: Set cgroup in CPU contexts for new cgroup events · db4a8356
      David Carrillo-Cisneros 提交于
      There's a perf stat bug easy to observer on a machine with only one cgroup:
      
        $ perf stat -e cycles -I 1000 -C 0 -G /
        #          time             counts unit events
            1.000161699      <not counted>      cycles                    /
            2.000355591      <not counted>      cycles                    /
            3.000565154      <not counted>      cycles                    /
            4.000951350      <not counted>      cycles                    /
      
      We'd expect some output there.
      
      The underlying problem is that there is an optimization in
      perf_cgroup_sched_{in,out}() that skips the switch of cgroup events
      if the old and new cgroups in a task switch are the same.
      
      This optimization interacts with the current code in two ways
      that cause a CPU context's cgroup (cpuctx->cgrp) to be NULL even if a
      cgroup event matches the current task. These are:
      
        1. On creation of the first cgroup event in a CPU: In current code,
        cpuctx->cpu is only set in perf_cgroup_sched_in, but due to the
        aforesaid optimization, perf_cgroup_sched_in will run until the next
        cgroup switches in that CPU. This may happen late or never happen,
        depending on system's number of cgroups, CPU load, etc.
      
        2. On deletion of the last cgroup event in a cpuctx: In list_del_event,
        cpuctx->cgrp is set NULL. Any new cgroup event will not be sched in
        because cpuctx->cgrp == NULL until a cgroup switch occurs and
        perf_cgroup_sched_in is executed (updating cpuctx->cgrp).
      
      This patch fixes both problems by setting cpuctx->cgrp in list_add_event,
      mirroring what list_del_event does when removing a cgroup event from CPU
      context, as introduced in:
      
        commit 68cacd29 ("perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()")
      
      With this patch, cpuctx->cgrp is always set/clear when installing/removing
      the first/last cgroup event in/from the CPU context. With cpuctx->cgrp
      correctly set, event_filter_match works as intended when events are
      sched in/out.
      
      After the fix, the output is as expected:
      
        $ perf stat -e cycles -I 1000 -a -G /
        #         time             counts unit events
           1.004699159          627342882      cycles                    /
           2.007397156          615272690      cycles                    /
           3.010019057          616726074      cycles                    /
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1470124092-113192-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      db4a8356
    • P
      perf/core: Fix sideband list-iteration vs. event ordering NULL pointer deference crash · 0b8f1e2e
      Peter Zijlstra 提交于
      Vegard Nossum reported that perf fuzzing generates a NULL
      pointer dereference crash:
      
      > Digging a bit deeper into this, it seems the event itself is getting
      > created by perf_event_open() and it gets added to the pmu_event_list
      > through:
      >
      > perf_event_open()
      >  - perf_event_alloc()
      >     - account_event()
      >        - account_pmu_sb_event()
      >           - attach_sb_event()
      >
      > so at this point the event is being attached but its ->ctx is still
      > NULL. It seems like ->ctx is set just a bit later in
      > perf_event_open(), though.
      >
      > But before that, __schedule() comes along and creates a stack trace
      > similar to the one above:
      >
      > __schedule()
      >  - __perf_event_task_sched_out()
      >    - perf_iterate_sb()
      >      - perf_iterate_sb_cpu()
      >         - event_filter_match()
      >           - perf_cgroup_match()
      >             - __get_cpu_context()
      >               - (dereference ctx which is NULL)
      >
      > So I guess the question is... should the event be attached (= put on
      > the list) before ->ctx gets set? Or should the cgroup code check for a
      > NULL ->ctx?
      
      The latter seems like the simplest solution. Moving the list-add later
      creates a bit of a mess.
      Reported-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f2fb6bef ("perf/core: Optimize side-band event delivery")
      Link: http://lkml.kernel.org/r/20160804123724.GN6862@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0b8f1e2e
  5. 02 8月, 2016 1 次提交
    • D
      perf/core: Change log level for duration warning to KERN_INFO · 0d87d7ec
      David Ahern 提交于
      When the perf interrupt handler exceeds a threshold warning messages
      are displayed on console:
      
        [12739.31793] perf interrupt took too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
        [71340.165065] perf interrupt took too long (5005 > 5000), lowering kernel.perf_event_max_sample_rate to 25000
      
      Many customers and users are confused by the message wondering if
      something is wrong or they need to take action to fix a problem.
      Since a user can not do anything to fix the issue, the message is really
      more informational than a warning. Adjust the log level accordingly.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1470084569-438-1-git-send-email-dsa@cumulusnetworks.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0d87d7ec
  6. 26 7月, 2016 1 次提交
    • D
      bpf, events: fix offset in skb copy handler · aa7145c1
      Daniel Borkmann 提交于
      This patch fixes the __output_custom() routine we currently use with
      bpf_skb_copy(). I missed that when len is larger than the size of the
      current handle, we can issue multiple invocations of copy_func, and
      __output_custom() advances destination but also source buffer by the
      written amount of bytes. When we have __output_custom(), this is actually
      wrong since in that case the source buffer points to a non-linear object,
      in our case an skb, which the copy_func helper is supposed to walk.
      Therefore, since this is non-linear we thus need to pass the offset into
      the helper, so that copy_func can use it for extracting the data from
      the source object.
      
      Therefore, adjust the callback signatures properly and pass offset
      into the skb_header_pointer() invoked from bpf_skb_copy() callback. The
      __DEFINE_OUTPUT_COPY_BODY() is adjusted to accommodate for two things:
      i) to pass in whether we should advance source buffer or not; this is
      a compile-time constant condition, ii) to pass in the offset for
      __output_custom(), which we do with help of __VA_ARGS__, so everything
      can stay inlined as is currently. Both changes allow for adapting the
      __output_* fast-path helpers w/o extra overhead.
      
      Fixes: 555c8a86 ("bpf: avoid stack copy and use skb ctx for event output")
      Fixes: 7e3f977e ("perf, events: add non-linear data support for raw records")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa7145c1
  7. 16 7月, 2016 1 次提交
    • D
      perf, events: add non-linear data support for raw records · 7e3f977e
      Daniel Borkmann 提交于
      This patch adds support for non-linear data on raw records. It
      extends raw records to have one or multiple fragments that will
      be written linearly into the ring slot, where each fragment can
      optionally have a custom callback handler to walk and extract
      complex, possibly non-linear data.
      
      If a callback handler is provided for a fragment, then the new
      __output_custom() will be used instead of __output_copy() for
      the perf_output_sample() part. perf_prepare_sample() does all
      the size calculation only once, so perf_output_sample() doesn't
      need to redo the same work anymore, meaning real_size and padding
      will be cached in the raw record. The raw record becomes 32 bytes
      in size without holes; to not increase it further and to avoid
      doing unnecessary recalculations in fast-path, we can reuse
      next pointer of the last fragment, idea here is borrowed from
      ZERO_OR_NULL_PTR(), which should keep the perf_output_sample()
      path for PERF_SAMPLE_RAW minimal.
      
      This facility is needed for BPF's event output helper as a first
      user that will, in a follow-up, add an additional perf_raw_frag
      to its perf_raw_record in order to be able to more efficiently
      dump skb context after a linear head meta data related to it.
      skbs can be non-linear and thus need a custom output function to
      dump buffers. Currently, the skb data needs to be copied twice;
      with the help of __output_custom() this work only needs to be
      done once. Future users could be things like XDP/BPF programs
      that work on different context though and would thus also have
      a different callback function.
      
      The few users of raw records are adapted to initialize their frag
      data from the raw record itself, no change in behavior for them.
      The code is based upon a PoC diff provided by Peter Zijlstra [1].
      
        [1] http://thread.gmane.org/gmane.linux.network/421294Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e3f977e
  8. 14 7月, 2016 1 次提交
    • T
      perf/core: Convert to hotplug state machine · 00e16c3d
      Thomas Gleixner 提交于
      Actually a nice symmetric startup/teardown pair which fits properly into
      the state machine concept. In the long run we should be able to invoke
      the startup callback for the boot CPU via the state machine and get
      rid of the init function which invokes it on the boot CPU.
      
      Note: This comes actually before the perf hardware callbacks. In the notifier
      model the hardware callbacks have a higher priority than the core
      callback. But that's solely for CPU offline so that hardware migration of
      events happens before the core is notified about the outgoing CPU.
      
      With the symetric state array model we have the following ordering:
      
       UP:     core -> hardware
       DOWN:   hardware -> core
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Reviewed-by: NSebastian Siewior <bigeasy@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20160713153333.587514098@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      00e16c3d
  9. 07 7月, 2016 1 次提交
    • M
      perf/core: Fix pmu::filter_match for SW-led groups · 2c81a647
      Mark Rutland 提交于
      The following commit:
      
        66eb579e ("perf: allow for PMU-specific event filtering")
      
      added the pmu::filter_match() callback. This was intended to
      avoid HW constraints on events from resulting in extremely
      pessimistic scheduling.
      
      However, pmu::filter_match() is only called for the leader of each event
      group. When the leader is a SW event, we do not filter the groups, and
      may fail at pmu::add() time, and when this happens we'll give up on
      scheduling any event groups later in the list until they are rotated
      ahead of the failing group.
      
      This can result in extremely sub-optimal event scheduling behaviour,
      e.g. if running the following on a big.LITTLE platform:
      
      $ taskset -c 0 ./perf stat \
       -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
       -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
       ls
      
           <not counted>      context-switches                                              (0.00%)
           <not counted>      armv8_cortex_a57/config=0x11/                                 (0.00%)
                      24      context-switches                                              (37.36%)
                57589154      armv8_cortex_a53/config=0x11/                                 (37.36%)
      
      Here the 'a53' event group was always eligible to be scheduled, but
      the 'a57' group never eligible to be scheduled, as the task was always
      affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
      group was eligible, but the HW event failed at pmu::add() time,
      resulting in ctx_flexible_sched_in giving up on scheduling further
      groups with HW events.
      
      One way of avoiding this is to check pmu::filter_match() on siblings
      as well as the group leader. If any of these fail their
      pmu::filter_match() call, we must skip the entire group before
      attempting to add any events.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Fixes: 66eb579e ("perf: allow for PMU-specific event filtering")
      Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      2c81a647
  10. 02 7月, 2016 1 次提交
    • D
      bpf: generally move prog destruction to RCU deferral · 1aacde3d
      Daniel Borkmann 提交于
      Jann Horn reported following analysis that could potentially result
      in a very hard to trigger (if not impossible) UAF race, to quote his
      event timeline:
      
       - Set up a process with threads T1, T2 and T3
       - Let T1 set up a socket filter F1 that invokes another filter F2
         through a BPF map [tail call]
       - Let T1 trigger the socket filter via a unix domain socket write,
         don't wait for completion
       - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
       - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
       - Let T3 close the file descriptor for F2, dropping the reference
         count of F2 to 2
       - At this point, T1 should have looked up F2 from the map, but not
         finished executing it
       - Let T3 remove F2 from the BPF map, dropping the reference count of
         F2 to 1
       - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
         the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
         via schedule_work()
       - At this point, the BPF program could be freed
       - BPF execution is still running in a freed BPF program
      
      While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
      event fd we're doing the syscall on doesn't disappear from underneath us
      for whole syscall time, it may not be the case for the bpf fd used as
      an argument only after we did the put. It needs to be a valid fd pointing
      to a BPF program at the time of the call to make the bpf_prog_get() and
      while T2 gets preempted, F2 must have dropped reference to 1 on the other
      CPU. The fput() from the close() in T3 should also add additionally delay
      to the reference drop via exit_task_work() when bpf_prog_release() gets
      called as well as scheduling bpf_prog_free_deferred().
      
      That said, it makes nevertheless sense to move the BPF prog destruction
      generally after RCU grace period to guarantee that such scenario above,
      but also others as recently fixed in ceb56070 ("bpf, perf: delay release
      of BPF prog after grace period") with regards to tail calls won't happen.
      Integrating bpf_prog_free_deferred() directly into the RCU callback is
      not allowed since the invocation might happen from either softirq or
      process context, so we're not permitted to block. Reviewing all bpf_prog_put()
      invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
      their destruction) with call_rcu() look good to me.
      
      Since we don't know whether at the time of attaching the program, we're
      already part of a tail call map, we need to use RCU variant. However, due
      to this, there won't be severely more stress on the RCU callback queue:
      situations with above bpf_prog_get() and bpf_prog_put() combo in practice
      normally won't lead to releases, but even if they would, enough effort/
      cycles have to be put into loading a BPF program into the kernel already.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aacde3d
  11. 29 6月, 2016 1 次提交
    • D
      bpf, perf: delay release of BPF prog after grace period · ceb56070
      Daniel Borkmann 提交于
      Commit dead9f29 ("perf: Fix race in BPF program unregister") moved
      destruction of BPF program from free_event_rcu() callback to __free_event(),
      which is problematic if used with tail calls: if prog A is attached as
      trace event directly, but at the same time present in a tail call map used
      by another trace event program elsewhere, then we need to delay destruction
      via RCU grace period since it can still be in use by the program doing the
      tail call (the prog first needs to be dropped from the tail call map, then
      trace event with prog A attached destroyed, so we get immediate destruction).
      
      Fixes: dead9f29 ("perf: Fix race in BPF program unregister")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Cc: Jann Horn <jann@thejh.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ceb56070
  12. 08 6月, 2016 2 次提交
  13. 03 6月, 2016 4 次提交
    • V
      perf/abi: Change the errno for sampling event not supported in hardware · a1396555
      Vineet Gupta 提交于
      Change the return code for sampling event not supported from -ENOTSUPP
      to -EOPNOTSUPP.
      
      This allows userspace to identify this case specifically, instead of
      printing the catch-all error message it did previously.
      
      Technically this is an ABI change, but we think we can get away
      with it.
      
      Old behavior:
       -------
       | # perf record ls
       | Error:
       | The sys_perf_event_open() syscall returned with 524 (Unknown error 524)
       | for event (cycles:ppp).
       | /bin/dmesg may provide additional information.
       | No CONFIG_PERF_EVENTS=y kernel support configured?
      
      New behavior:
       -------
       | # perf record ls
       | Error:
       | PMU Hardware doesn't support sampling/overflow-interrupts.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <acme@redhat.com>
      Cc: <linux-snps-arc@lists.infradead.org>
      Cc: <vincent.weaver@maine.edu>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com>
      Link: http://lkml.kernel.org/r/1462786660-2900-3-git-send-email-vgupta@synopsys.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a1396555
    • K
      perf/core: Fix implicitly enable dynamic interrupt throttle · ab7fdefb
      Kan Liang 提交于
      This patch fixes an issue which was introduced by commit:
      
        91a612ee ("perf/core: Fix dynamic interrupt throttle")
      
      ... which commit unconditionally sets the perf_sample_allowed_ns value
      to !0. But that could trigger a bug in the following corner case:
      
      The user can disable the dynamic interrupt throttle mechanism by setting
      perf_cpu_time_max_percent to 0. Then they change perf_event_max_sample_rate.
      For this case, the mechanism will be enabled implicitly, because
      perf_sample_allowed_ns becomes !0 - which is not what we want.
      
      This patch only updates perf_sample_allowed_ns when the dynamic
      interrupt throttle mechanism is enabled.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1462260366-3160-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab7fdefb
    • P
      perf/core: Rename the perf_event_aux*() APIs to perf_event_sb*(), to separate... · aab5b71e
      Peter Zijlstra 提交于
      perf/core: Rename the perf_event_aux*() APIs to perf_event_sb*(), to separate them from AUX ring-buffer records
      
      There are now two different things called AUX in perf, the
      infrastructure to deliver the mmap/comm/task records and the
      AUX part in the mmap buffer (with associated AUX_RECORD).
      
      Since the former is internal, rename it to side-band to reduce
      the confusion factor.
      
      No change in functionality.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      aab5b71e
    • K
      perf/core: Optimize side-band event delivery · f2fb6bef
      Kan Liang 提交于
      The perf_event_aux() function iterates all PMUs and all events in
      their respective per-CPU contexts to find the events to deliver
      side-band records to.
      
      For example, the brk test case in lkp triggers many mmap() operations,
      which, if we're also running perf, results in many perf_event_aux()
      invocations.
      
      If we enable uncore PMU support (even when uncore events are not used),
      dozens of uncore PMUs will be iterated, which can significantly
      decrease brk_test's throughput.
      
      For example, the brk throughput:
      
        without uncore PMUs: 2647573 ops_per_sec
        with    uncore PMUs: 1768444 ops_per_sec
      
      ... a 33% reduction.
      
      To get at the per-CPU events that need side-band records, this patch
      puts these events on a per-CPU list, this avoids iterating the PMUs
      and any events that do not need side-band records.
      
      Per task events are unchanged to avoid extra overhead on the context
      switch paths.
      Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reported-by: NHuang, Ying <ying.huang@linux.intel.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1458757477-3781-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f2fb6bef
  14. 30 5月, 2016 1 次提交
    • A
      perf core: Per event callchain limit · 97c79a38
      Arnaldo Carvalho de Melo 提交于
      Additionally to being able to control the system wide maximum depth via
      /proc/sys/kernel/perf_event_max_stack, now we are able to ask for
      different depths per event, using perf_event_attr.sample_max_stack for
      that.
      
      This uses an u16 hole at the end of perf_event_attr, that, when
      perf_event_attr.sample_type has the PERF_SAMPLE_CALLCHAIN, if
      sample_max_stack is zero, means use perf_event_max_stack, otherwise
      it'll be bounds checked under callchain_mutex.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      97c79a38
  15. 24 5月, 2016 1 次提交
  16. 23 5月, 2016 1 次提交
    • L
      x86: remove more uaccess_32.h complexity · bd28b145
      Linus Torvalds 提交于
      I'm looking at trying to possibly merge the 32-bit and 64-bit versions
      of the x86 uaccess.h implementation, but first this needs to be cleaned
      up.
      
      For example, the 32-bit version of "__copy_from_user_inatomic()" is
      mostly the special cases for the constant size, and it's actually almost
      never relevant.  Most users aren't actually using a constant size
      anyway, and the few cases that do small constant copies are better off
      just using __get_user() instead.
      
      So get rid of the unnecessary complexity.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd28b145
  17. 17 5月, 2016 5 次提交
    • A
      perf core: Separate accounting of contexts and real addresses in a stack trace · c85b0334
      Arnaldo Carvalho de Melo 提交于
      The perf_sample->ip_callchain->nr value includes all the entries in the
      ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc},
      while what the user expects is that what is in the kernel.perf_event_max_stack
      sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be
      honoured in terms of IP addresses in the stack trace.
      
      So allocate a bunch of extra entries for contexts, and do the accounting
      via perf_callchain_entry_ctx struct members.
      
      A new sysctl, kernel.perf_event_max_contexts_per_stack is also
      introduced for investigating possible bugs in the callchain
      implementation by some arch.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c85b0334
    • A
      perf core: Add perf_callchain_store_context() helper · 3e4de4ec
      Arnaldo Carvalho de Melo 提交于
      We need have different helpers to account how many contexts we have in
      the sample and for real addresses, so do it now as a prep patch, to
      ease review.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e4de4ec
    • A
      perf core: Add a 'nr' field to perf_event_callchain_context · 3b1fff08
      Arnaldo Carvalho de Melo 提交于
      We will use it to count how many addresses are in the entry->ip[] array,
      excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
      return the number of entries specified by the user via the relevant
      sysctl, kernel.perf_event_max_contexts, or via the per event
      perf_event_attr.sample_max_stack knob.
      
      This way we keep the perf_sample->ip_callchain->nr meaning, that is the
      number of entries, be it real addresses or PERF_CONTEXT_ entries, while
      honouring the max_stack knobs, i.e. the end result will be max_stack
      entries if we have at least that many entries in a given stack trace.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b1fff08
    • A
      perf core: Pass max stack as a perf_callchain_entry context · cfbcf468
      Arnaldo Carvalho de Melo 提交于
      This makes perf_callchain_{user,kernel}() receive the max stack
      as context for the perf_callchain_entry, instead of accessing
      the global sysctl_perf_event_max_stack.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cfbcf468
    • A
      perf core: Generalize max_stack sysctl handler · a831100a
      Arnaldo Carvalho de Melo 提交于
      So that it can be used for other stack related knobs, such as the
      upcoming one to tweak the max number of of contexts per stack sample.
      
      In all those cases we can only change the value if there are no perf
      sessions collecting stacks, so they need to grab that mutex, etc.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-8t3fk94wuzp8m2z1n4gc0s17@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a831100a
  18. 12 5月, 2016 2 次提交
  19. 10 5月, 2016 1 次提交
  20. 05 5月, 2016 5 次提交
  21. 28 4月, 2016 1 次提交
    • P
      perf/core: Fix perf_event_open() vs. execve() race · 79c9ce57
      Peter Zijlstra 提交于
      Jann reported that the ptrace_may_access() check in
      find_lively_task_by_vpid() is racy against exec().
      
      Specifically:
      
        perf_event_open()		execve()
      
        ptrace_may_access()
      				commit_creds()
        ...				if (get_dumpable() != SUID_DUMP_USER)
      				  perf_event_exit_task();
        perf_install_in_context()
      
      would result in installing a counter across the creds boundary.
      
      Fix this by wrapping lots of perf_event_open() in cred_guard_mutex.
      This should be fine as perf_event_exit_task() is already called with
      cred_guard_mutex held, so all perf locks already nest inside it.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      79c9ce57