1. 14 3月, 2020 1 次提交
  2. 25 2月, 2020 1 次提交
  3. 29 1月, 2020 2 次提交
  4. 17 1月, 2020 1 次提交
  5. 14 1月, 2020 1 次提交
  6. 17 12月, 2019 1 次提交
  7. 09 12月, 2019 1 次提交
  8. 21 11月, 2019 1 次提交
  9. 18 11月, 2019 2 次提交
  10. 15 11月, 2019 2 次提交
  11. 13 11月, 2019 8 次提交
  12. 28 10月, 2019 5 次提交
    • L
      perf/core: Optimize perf_init_event() for TYPE_SOFTWARE · d44f821b
      Liang, Kan 提交于
      Andi reported that he was hitting the linear search in
      perf_init_event() a lot. Now that all !TYPE_SOFTWARE events should hit
      the IDR, make sure the TYPE_SOFTWARE events are at the head of the
      list such that we'll quickly find the right PMU (provided a valid
      event was given).
      Signed-off-by: NLiang, Kan <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      d44f821b
    • P
      perf/core: Optimize perf_init_event() · 66d258c5
      Peter Zijlstra 提交于
      Andi reported that he was hitting the linear search in
      perf_init_event() a lot. Make more agressive use of the IDR lookup to
      avoid hitting the linear search.
      
      With exception of PERF_TYPE_SOFTWARE (which relies on a hideous hack),
      we can put everything in the IDR. On top of that, we can alias
      TYPE_HARDWARE and TYPE_HW_CACHE to TYPE_RAW on the lookup side.
      
      This greatly reduces the chances of hitting the linear search.
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      66d258c5
    • P
      perf/core: Optimize perf_install_in_event() · db0503e4
      Peter Zijlstra 提交于
      Andi reported that when creating a lot of events, a lot of time is
      spent in IPIs and asked if it would be possible to elide some of that.
      
      Now when, as for example the perf-tool always does, events are created
      disabled, then these events will not need to be scheduled when added
      to the context (they're still disable) and therefore the IPI is not
      required -- except for the very first event, that will need to set
      ctx->is_active.
      
      ( It might be possible to set ctx->is_active remotely for cpu_ctx, but
        we really need the IPI for task_ctx, so lets not make that
        distinction. )
      
      Also use __perf_effective_state() since group events depend on the
      state of the leader, if the leader is OFF, the whole group is OFF.
      
      So when sibling events are created enabled (XXX check tool) then we
      only need a single IPI to create and enable the whole group (+ that
      initial IPI to initialize the context).
      Suggested-by: NAndi Kleen <andi@firstfloor.org>
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: kan.liang@linux.intel.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      db0503e4
    • A
      perf/x86: Synchronize PMU task contexts on optimized context switches · c2b98a86
      Alexey Budankov 提交于
      Install Intel specific PMU task context synchronization adapter and
      extend optimized context switch path with PMU specific task context
      synchronization to fix LBR callstack virtualization on context switches.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: https://lkml.kernel.org/r/9c6445a9-bdba-ef03-3859-f1f91198f27a@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c2b98a86
    • A
      perf/core: Start rejecting the syscall with attr.__reserved_2 set · 8c7e9756
      Alexander Shishkin 提交于
      Commit:
      
        1a594131 ("perf: Add wakeup watermark control to the AUX area")
      
      added attr.__reserved_2 padding, but forgot to add an ABI check to reject
      attributes with this field set. Fix that.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: adrian.hunter@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: https://lkml.kernel.org/r/20191025121636.75182-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c7e9756
  13. 22 10月, 2019 1 次提交
    • A
      perf/aux: Fix AUX output stopping · f3a519e4
      Alexander Shishkin 提交于
      Commit:
      
        8a58ddae ("perf/core: Fix exclusive events' grouping")
      
      allows CAP_EXCLUSIVE events to be grouped with other events. Since all
      of those also happen to be AUX events (which is not the case the other
      way around, because arch/s390), this changes the rules for stopping the
      output: the AUX event may not be on its PMU's context any more, if it's
      grouped with a HW event, in which case it will be on that HW event's
      context instead. If that's the case, munmap() of the AUX buffer can't
      find and stop the AUX event, potentially leaving the last reference with
      the atomic context, which will then end up freeing the AUX buffer. This
      will then trip warnings:
      
      Fix this by using the context's PMU context when looking for events
      to stop, instead of the event's PMU context.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f3a519e4
  14. 21 10月, 2019 1 次提交
    • T
      perf/aux: Fix tracking of auxiliary trace buffer allocation · 5e6c3c7b
      Thomas Richter 提交于
      The following commit from the v5.4 merge window:
      
        d44248a4 ("perf/core: Rework memory accounting in perf_mmap()")
      
      ... breaks auxiliary trace buffer tracking.
      
      If I run command 'perf record -e rbd000' to record samples and saving
      them in the **auxiliary** trace buffer then the value of 'locked_vm' becomes
      negative after all trace buffers have been allocated and released:
      
      During allocation the values increase:
      
        [52.250027] perf_mmap user->locked_vm:0x87 pinned_vm:0x0 ret:0
        [52.250115] perf_mmap user->locked_vm:0x107 pinned_vm:0x0 ret:0
        [52.250251] perf_mmap user->locked_vm:0x188 pinned_vm:0x0 ret:0
        [52.250326] perf_mmap user->locked_vm:0x208 pinned_vm:0x0 ret:0
        [52.250441] perf_mmap user->locked_vm:0x289 pinned_vm:0x0 ret:0
        [52.250498] perf_mmap user->locked_vm:0x309 pinned_vm:0x0 ret:0
        [52.250613] perf_mmap user->locked_vm:0x38a pinned_vm:0x0 ret:0
        [52.250715] perf_mmap user->locked_vm:0x408 pinned_vm:0x2 ret:0
        [52.250834] perf_mmap user->locked_vm:0x408 pinned_vm:0x83 ret:0
        [52.250915] perf_mmap user->locked_vm:0x408 pinned_vm:0x103 ret:0
        [52.251061] perf_mmap user->locked_vm:0x408 pinned_vm:0x184 ret:0
        [52.251146] perf_mmap user->locked_vm:0x408 pinned_vm:0x204 ret:0
        [52.251299] perf_mmap user->locked_vm:0x408 pinned_vm:0x285 ret:0
        [52.251383] perf_mmap user->locked_vm:0x408 pinned_vm:0x305 ret:0
        [52.251544] perf_mmap user->locked_vm:0x408 pinned_vm:0x386 ret:0
        [52.251634] perf_mmap user->locked_vm:0x408 pinned_vm:0x406 ret:0
        [52.253018] perf_mmap user->locked_vm:0x408 pinned_vm:0x487 ret:0
        [52.253197] perf_mmap user->locked_vm:0x408 pinned_vm:0x508 ret:0
        [52.253374] perf_mmap user->locked_vm:0x408 pinned_vm:0x589 ret:0
        [52.253550] perf_mmap user->locked_vm:0x408 pinned_vm:0x60a ret:0
        [52.253726] perf_mmap user->locked_vm:0x408 pinned_vm:0x68b ret:0
        [52.253903] perf_mmap user->locked_vm:0x408 pinned_vm:0x70c ret:0
        [52.254084] perf_mmap user->locked_vm:0x408 pinned_vm:0x78d ret:0
        [52.254263] perf_mmap user->locked_vm:0x408 pinned_vm:0x80e ret:0
      
      The value of user->locked_vm increases to a limit then the memory
      is tracked by pinned_vm.
      
      During deallocation the size is subtracted from pinned_vm until
      it hits a limit. Then a larger value is subtracted from locked_vm
      leading to a large number (because of type unsigned):
      
        [64.267797] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x78d
        [64.267826] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x70c
        [64.267848] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x68b
        [64.267869] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x60a
        [64.267891] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x589
        [64.267911] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x508
        [64.267933] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x487
        [64.267952] perf_mmap_close mmap_user->locked_vm:0x408 pinned_vm:0x406
        [64.268883] perf_mmap_close mmap_user->locked_vm:0x307 pinned_vm:0x406
        [64.269117] perf_mmap_close mmap_user->locked_vm:0x206 pinned_vm:0x406
        [64.269433] perf_mmap_close mmap_user->locked_vm:0x105 pinned_vm:0x406
        [64.269536] perf_mmap_close mmap_user->locked_vm:0x4 pinned_vm:0x404
        [64.269797] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff84 pinned_vm:0x303
        [64.270105] perf_mmap_close mmap_user->locked_vm:0xffffffffffffff04 pinned_vm:0x202
        [64.270374] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe84 pinned_vm:0x101
        [64.270628] perf_mmap_close mmap_user->locked_vm:0xfffffffffffffe04 pinned_vm:0x0
      
      This value sticks for the user until system is rebooted, causing
      follow-on system calls using locked_vm resource limit to fail.
      
      Note: There is no issue using the normal trace buffer.
      
      In fact the issue is in perf_mmap_close(). During allocation auxiliary
      trace buffer memory is either traced as 'extra' and added to 'pinned_vm'
      or trace as 'user_extra' and added to 'locked_vm'. This applies for
      normal trace buffers and auxiliary trace buffer.
      
      However in function perf_mmap_close() all auxiliary trace buffer is
      subtraced from 'locked_vm' and never from 'pinned_vm'. This breaks the
      ballance.
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: gor@linux.ibm.com
      Cc: hechaol@fb.com
      Cc: heiko.carstens@de.ibm.com
      Cc: linux-perf-users@vger.kernel.org
      Cc: songliubraving@fb.com
      Fixes: d44248a4 ("perf/core: Rework memory accounting in perf_mmap()")
      Link: https://lkml.kernel.org/r/20191021083354.67868-1-tmricht@linux.ibm.com
      [ Minor readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5e6c3c7b
  15. 18 10月, 2019 1 次提交
    • J
      perf_event: Add support for LSM and SELinux checks · da97e184
      Joel Fernandes (Google) 提交于
      In current mainline, the degree of access to perf_event_open(2) system
      call depends on the perf_event_paranoid sysctl.  This has a number of
      limitations:
      
      1. The sysctl is only a single value. Many types of accesses are controlled
         based on the single value thus making the control very limited and
         coarse grained.
      2. The sysctl is global, so if the sysctl is changed, then that means
         all processes get access to perf_event_open(2) opening the door to
         security issues.
      
      This patch adds LSM and SELinux access checking which will be used in
      Android to access perf_event_open(2) for the purposes of attaching BPF
      programs to tracepoints, perf profiling and other operations from
      userspace. These operations are intended for production systems.
      
      5 new LSM hooks are added:
      1. perf_event_open: This controls access during the perf_event_open(2)
         syscall itself. The hook is called from all the places that the
         perf_event_paranoid sysctl is checked to keep it consistent with the
         systctl. The hook gets passed a 'type' argument which controls CPU,
         kernel and tracepoint accesses (in this context, CPU, kernel and
         tracepoint have the same semantics as the perf_event_paranoid sysctl).
         Additionally, I added an 'open' type which is similar to
         perf_event_paranoid sysctl == 3 patch carried in Android and several other
         distros but was rejected in mainline [1] in 2016.
      
      2. perf_event_alloc: This allocates a new security object for the event
         which stores the current SID within the event. It will be useful when
         the perf event's FD is passed through IPC to another process which may
         try to read the FD. Appropriate security checks will limit access.
      
      3. perf_event_free: Called when the event is closed.
      
      4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.
      
      5. perf_event_write: Called from the ioctl(2) syscalls for the event.
      
      [1] https://lwn.net/Articles/696240/
      
      Since Peter had suggest LSM hooks in 2016 [1], I am adding his
      Suggested-by tag below.
      
      To use this patch, we set the perf_event_paranoid sysctl to -1 and then
      apply selinux checking as appropriate (default deny everything, and then
      add policy rules to give access to domains that need it). In the future
      we can remove the perf_event_paranoid sysctl altogether.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Co-developed-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: rostedt@goodmis.org
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: jeffv@google.com
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: primiano@google.com
      Cc: Song Liu <songliubraving@fb.com>
      Cc: rsavitski@google.com
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Matthew Garrett <matthewgarrett@google.com>
      Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
      da97e184
  16. 09 10月, 2019 2 次提交
    • S
      perf/core: Fix corner case in perf_rotate_context() · 7fa343b7
      Song Liu 提交于
      In perf_rotate_context(), when the first cpu flexible event fail to
      schedule, cpu_rotate is 1, while cpu_event is NULL. Since cpu_event is
      NULL, perf_rotate_context will _NOT_ call cpu_ctx_sched_out(), thus
      cpuctx->ctx.is_active will have EVENT_FLEXIBLE set. Then, the next
      perf_event_sched_in() will skip all cpu flexible events because of the
      EVENT_FLEXIBLE bit.
      
      In the next call of perf_rotate_context(), cpu_rotate stays 1, and
      cpu_event stays NULL, so this process repeats. The end result is, flexible
      events on this cpu will not be scheduled (until another event being added
      to the cpuctx).
      
      Here is an easy repro of this issue. On Intel CPUs, where ref-cycles
      could only use one counter, run one pinned event for ref-cycles, one
      flexible event for ref-cycles, and one flexible event for cycles. The
      flexible ref-cycles is never scheduled, which is expected. However,
      because of this issue, the cycles event is never scheduled either.
      
       $ perf stat -e ref-cycles:D,ref-cycles,cycles -C 5 -I 1000
      
                 time             counts unit events
          1.000152973         15,412,480      ref-cycles:D
          1.000152973      <not counted>      ref-cycles     (0.00%)
          1.000152973      <not counted>      cycles         (0.00%)
          2.000486957         18,263,120      ref-cycles:D
          2.000486957      <not counted>      ref-cycles     (0.00%)
          2.000486957      <not counted>      cycles         (0.00%)
      
      To fix this, when the flexible_active list is empty, try rotate the
      first event in the flexible_groups. Also, rename ctx_first_active() to
      ctx_event_to_rotate(), which is more accurate.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <kernel-team@fb.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8d5bce0c ("perf/core: Optimize perf_rotate_context() event scheduling")
      Link: https://lkml.kernel.org/r/20191008165949.920548-1-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fa343b7
    • S
      perf/core: Rework memory accounting in perf_mmap() · d44248a4
      Song Liu 提交于
      perf_mmap() always increases user->locked_vm. As a result, "extra" could
      grow bigger than "user_extra", which doesn't make sense. Here is an
      example case:
      
      (Note: Assume "user_lock_limit" is very small.)
      
        | # of perf_mmap calls |vma->vm_mm->pinned_vm|user->locked_vm|
        | 0                    | 0                   | 0             |
        | 1                    | user_extra          | user_extra    |
        | 2                    | 3 * user_extra      | 2 * user_extra|
        | 3                    | 6 * user_extra      | 3 * user_extra|
        | 4                    | 10 * user_extra     | 4 * user_extra|
      
      Fix this by maintaining proper user_extra and extra.
      Reviewed-By: NHechao Li <hechaol@fb.com>
      Reported-by: NHechao Li <hechaol@fb.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <kernel-team@fb.com>
      Cc: Jie Meng <jmeng@fb.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://lkml.kernel.org/r/20190904214618.3795672-1-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d44248a4
  17. 07 10月, 2019 1 次提交
  18. 01 10月, 2019 1 次提交
  19. 21 9月, 2019 1 次提交
  20. 28 8月, 2019 1 次提交
  21. 20 8月, 2019 1 次提交
  22. 02 8月, 2019 1 次提交
  23. 25 7月, 2019 2 次提交
  24. 13 7月, 2019 1 次提交
    • A
      perf/core: Fix exclusive events' grouping · 8a58ddae
      Alexander Shishkin 提交于
      So far, we tried to disallow grouping exclusive events for the fear of
      complications they would cause with moving between contexts. Specifically,
      moving a software group to a hardware context would violate the exclusivity
      rules if both groups contain matching exclusive events.
      
      This attempt was, however, unsuccessful: the check that we have in the
      perf_event_open() syscall is both wrong (looks at wrong PMU) and
      insufficient (group leader may still be exclusive), as can be illustrated
      by running:
      
        $ perf record -e '{intel_pt//,cycles}' uname
        $ perf record -e '{cycles,intel_pt//}' uname
      
      ultimately successfully.
      
      Furthermore, we are completely free to trigger the exclusivity violation
      by:
      
         perf -e '{cycles,intel_pt//}' -e '{intel_pt//,instructions}'
      
      even though the helpful perf record will not allow that, the ABI will.
      
      The warning later in the perf_event_open() path will also not trigger, because
      it's also wrong.
      
      Fix all this by validating the original group before moving, getting rid
      of broken safeguards and placing a useful one to perf_install_in_context().
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: mathieu.poirier@linaro.org
      Cc: will.deacon@arm.com
      Fixes: bed5b25a ("perf: Add a pmu capability for "exclusive" events")
      Link: https://lkml.kernel.org/r/20190701110755.24646-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a58ddae