1. 22 7月, 2020 1 次提交
  2. 21 7月, 2020 1 次提交
  3. 18 6月, 2020 2 次提交
  4. 18 5月, 2020 1 次提交
    • J
      powerpc: Use a datatype for instructions · 94afd069
      Jordan Niethe 提交于
      Currently unsigned ints are used to represent instructions on powerpc.
      This has worked well as instructions have always been 4 byte words.
      
      However, ISA v3.1 introduces some changes to instructions that mean
      this scheme will no longer work as well. This change is Prefixed
      Instructions. A prefixed instruction is made up of a word prefix
      followed by a word suffix to make an 8 byte double word instruction.
      No matter the endianness of the system the prefix always comes first.
      Prefixed instructions are only planned for powerpc64.
      
      Introduce a ppc_inst type to represent both prefixed and word
      instructions on powerpc64 while keeping it possible to exclusively
      have word instructions on powerpc32.
      Signed-off-by: NJordan Niethe <jniethe5@gmail.com>
      [mpe: Fix compile error in emulate_spe()]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200506034050.24806-12-jniethe5@gmail.com
      94afd069
  5. 11 2月, 2020 1 次提交
    • K
      perf/core: Add new branch sample type for HW index of raw branch records · bbfd5e4f
      Kan Liang 提交于
      The low level index is the index in the underlying hardware buffer of
      the most recently captured taken branch which is always saved in
      branch_entries[0]. It is very useful for reconstructing the call stack.
      For example, in Intel LBR call stack mode, the depth of reconstructed
      LBR call stack limits to the number of LBR registers. With the low level
      index information, perf tool may stitch the stacks of two samples. The
      reconstructed LBR call stack can break the HW limitation.
      
      Add a new branch sample type to retrieve low level index of raw branch
      records. The low level index is between -1 (unknown) and max depth which
      can be retrieved in /sys/devices/cpu/caps/branches.
      
      Only when the new branch sample type is set, the low level index
      information is dumped into the PERF_SAMPLE_BRANCH_STACK output.
      Perf tool should check the attr.branch_sample_type, and apply the
      corresponding format for PERF_SAMPLE_BRANCH_STACK samples.
      Otherwise, some user case may be broken. For example, users may parse a
      perf.data, which include the new branch sample type, with an old version
      perf tool (without the check). Users probably get incorrect information
      without any warning.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
      bbfd5e4f
  6. 25 1月, 2020 1 次提交
  7. 18 10月, 2019 1 次提交
    • J
      perf_event: Add support for LSM and SELinux checks · da97e184
      Joel Fernandes (Google) 提交于
      In current mainline, the degree of access to perf_event_open(2) system
      call depends on the perf_event_paranoid sysctl.  This has a number of
      limitations:
      
      1. The sysctl is only a single value. Many types of accesses are controlled
         based on the single value thus making the control very limited and
         coarse grained.
      2. The sysctl is global, so if the sysctl is changed, then that means
         all processes get access to perf_event_open(2) opening the door to
         security issues.
      
      This patch adds LSM and SELinux access checking which will be used in
      Android to access perf_event_open(2) for the purposes of attaching BPF
      programs to tracepoints, perf profiling and other operations from
      userspace. These operations are intended for production systems.
      
      5 new LSM hooks are added:
      1. perf_event_open: This controls access during the perf_event_open(2)
         syscall itself. The hook is called from all the places that the
         perf_event_paranoid sysctl is checked to keep it consistent with the
         systctl. The hook gets passed a 'type' argument which controls CPU,
         kernel and tracepoint accesses (in this context, CPU, kernel and
         tracepoint have the same semantics as the perf_event_paranoid sysctl).
         Additionally, I added an 'open' type which is similar to
         perf_event_paranoid sysctl == 3 patch carried in Android and several other
         distros but was rejected in mainline [1] in 2016.
      
      2. perf_event_alloc: This allocates a new security object for the event
         which stores the current SID within the event. It will be useful when
         the perf event's FD is passed through IPC to another process which may
         try to read the FD. Appropriate security checks will limit access.
      
      3. perf_event_free: Called when the event is closed.
      
      4. perf_event_read: Called from the read(2) and mmap(2) syscalls for the event.
      
      5. perf_event_write: Called from the ioctl(2) syscalls for the event.
      
      [1] https://lwn.net/Articles/696240/
      
      Since Peter had suggest LSM hooks in 2016 [1], I am adding his
      Suggested-by tag below.
      
      To use this patch, we set the perf_event_paranoid sysctl to -1 and then
      apply selinux checking as appropriate (default deny everything, and then
      add policy rules to give access to domains that need it). In the future
      we can remove the perf_event_paranoid sysctl altogether.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Co-developed-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: rostedt@goodmis.org
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: jeffv@google.com
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: primiano@google.com
      Cc: Song Liu <songliubraving@fb.com>
      Cc: rsavitski@google.com
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Matthew Garrett <matthewgarrett@google.com>
      Link: https://lkml.kernel.org/r/20191014170308.70668-1-joel@joelfernandes.org
      da97e184
  8. 31 5月, 2019 1 次提交
  9. 22 5月, 2019 1 次提交
    • R
      powerpc/perf: Fix MMCRA corruption by bhrb_filter · 3202e35e
      Ravi Bangoria 提交于
      Consider a scenario where user creates two events:
      
        1st event:
          attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
          attr.branch_sample_type = PERF_SAMPLE_BRANCH_ANY;
          fd = perf_event_open(attr, 0, 1, -1, 0);
      
        This sets cpuhw->bhrb_filter to 0 and returns valid fd.
      
        2nd event:
          attr.sample_type |= PERF_SAMPLE_BRANCH_STACK;
          attr.branch_sample_type = PERF_SAMPLE_BRANCH_CALL;
          fd = perf_event_open(attr, 0, 1, -1, 0);
      
        It overrides cpuhw->bhrb_filter to -1 and returns with error.
      
      Now if power_pmu_enable() gets called by any path other than
      power_pmu_add(), ppmu->config_bhrb(-1) will set MMCRA to -1.
      
      Fixes: 3925f46b ("powerpc/perf: Enable branch stack sampling framework")
      Cc: stable@vger.kernel.org # v3.10+
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.ibm.com>
      Reviewed-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      3202e35e
  10. 03 5月, 2019 2 次提交
    • M
      powerpc/perf: Add generic compat mode pmu driver · be80e758
      Madhavan Srinivasan 提交于
      Most of the power processor generation performance monitoring
      unit (PMU) driver code is bundled in the kernel and one of those
      is enabled/registered based on the oprofile_cpu_type check at
      the boot.
      
      But things get little tricky incase of "compat" mode boot.
      IBM POWER System Server based processors has a compactibility
      mode feature, which simpily put is, Nth generation processor
      (lets say POWER8) will act and appear in a mode consistent
      with an earlier generation (N-1) processor (that is POWER7).
      And in this "compat" mode boot, kernel modify the
      "oprofile_cpu_type" to be Nth generation (POWER8). If Nth
      generation pmu driver is bundled (POWER8), it gets registered.
      
      Key dependency here is to have distro support for latest
      processor performance monitoring support. Patch here adds
      a generic "compat-mode" performance monitoring driver to
      be register in absence of powernv platform specific pmu driver.
      
      Driver supports only "cycles" and "instruction" events.
      "0x0001e" used as event code for "cycles" and "0x00002"
      used as event code for "instruction" events. New file
      called "generic-compat-pmu.c" is created to contain the driver
      specific code. And base raw event code format modeled
      on PPMU_ARCH_207S.
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Use SPDX tag for license]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      be80e758
    • M
      powerpc/perf: init pmu from core-book3s · 708597da
      Madhavan Srinivasan 提交于
      Currenty pmu driver file for each ppc64 generation processor
      has a __init call in itself. Refactor the code by moving the
      __init call to core-books.c. This also clean's up compat mode
      pmu driver registration.
      Suggested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      [mpe: Use SPDX tag for license]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      708597da
  11. 21 12月, 2018 1 次提交
  12. 20 12月, 2018 2 次提交
    • M
      powerpc/perf: Add constraints for power9 l2/l3 bus events · 59029136
      Madhavan Srinivasan 提交于
      In previous generation processors, both bus events and direct
      events of performance monitoring unit can be individually
      programmabled and monitored in PMCs.
      
      But in Power9, L2/L3 bus events are always available as a
      "bank" of 4 events. To obtain the counts for any of the
      l2/l3 bus events in a given bank, the user will have to
      program PMC4 with corresponding l2/l3 bus event for that
      bank.
      
      Patch enforce two contraints incase of L2/L3 bus events.
      
      1)Any L2/L3 event when programmed is also expected to program corresponding
      PMC4 event from that group.
      2)PMC4 event should always been programmed first due to group constraint
      logic limitation
      
      For ex. consider these L3 bus events
      
      PM_L3_PF_ON_CHIP_MEM (0x460A0),
      PM_L3_PF_MISS_L3 (0x160A0),
      PM_L3_CO_MEM (0x260A0),
      PM_L3_PF_ON_CHIP_CACHE (0x360A0),
      
      1) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r160A0,r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >
      
      2) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r260A0,r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r260A0,r360A0}" < >
      
      3) This is an INVALID group for L3 Bus event monitoring,
      since it is missing PMC4 event.
      	perf stat -e "{r360A0}" < >
      
      And this is a VALID group for L3 Bus events:
      	perf stat -e "{r460A0,r360A0}" < >
      
      Patch here implements group constraint logic suggested by Michael Ellerman.
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      59029136
    • M
      powerpc/perf: Update perf_regs structure to include SIER · 333804dc
      Madhavan Srinivasan 提交于
      On each sample, Sample Instruction Event Register (SIER) content
      is saved in pt_regs. SIER does not have a entry as-is in the pt_regs
      but instead, SIER content is saved in the "dar" register of pt_regs.
      
      Patch adds another entry to the perf_regs structure to include the "SIER"
      printing which internally maps to the "dar" of pt_regs.
      
      It also check for the SIER availability in the platform and present
      value accordingly
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      333804dc
  13. 16 7月, 2018 2 次提交
  14. 27 3月, 2018 4 次提交
  15. 17 3月, 2018 1 次提交
  16. 16 3月, 2018 1 次提交
  17. 12 3月, 2018 1 次提交
    • P
      perf/core: Remove perf_event::group_entry · 8343aae6
      Peter Zijlstra 提交于
      Now that all the grouping is done with RB trees, we no longer need
      group_entry and can replace the whole thing with sibling_list.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8343aae6
  18. 19 1月, 2018 2 次提交
    • M
      powerpc/64: Change soft_enabled from flag to bitmask · 01417c6c
      Madhavan Srinivasan 提交于
      "paca->soft_enabled" is used as a flag to mask some of interrupts.
      Currently supported flags values and their details:
      
      soft_enabled    MSR[EE]
      
      0               0       Disabled (PMI and HMI not masked)
      1               1       Enabled
      
      "paca->soft_enabled" is initialized to 1 to make the interripts as
      enabled. arch_local_irq_disable() will toggle the value when
      interrupts needs to disbled. At this point, the interrupts are not
      actually disabled, instead, interrupt vector has code to check for the
      flag and mask it when it occurs. By "mask it", it update interrupt
      paca->irq_happened and return. arch_local_irq_restore() is called to
      re-enable interrupts, which checks and replays interrupts if any
      occured.
      
      Now, as mentioned, current logic doesnot mask "performance monitoring
      interrupts" and PMIs are implemented as NMI. But this patchset depends
      on local_irq_* for a successful local_* update. Meaning, mask all
      possible interrupts during local_* update and replay them after the
      update.
      
      So the idea here is to reserve the "paca->soft_enabled" logic. New
      values and details:
      
      soft_enabled    MSR[EE]
      
      1               0       Disabled  (PMI and HMI not masked)
      0               1       Enabled
      
      Reason for the this change is to create foundation for a third mask
      value "0x2" for "soft_enabled" to add support to mask PMIs. When
      ->soft_enabled is set to a value "3", PMI interrupts are mask and when
      set to a value of "1", PMI are not mask. With this patch also extends
      soft_enabled as interrupt disable mask.
      
      Current flags are renamed from IRQ_[EN?DIS}ABLED to
      IRQS_ENABLED and IRQS_DISABLED.
      
      Patch also fixes the ptrace call to force the user to see the softe
      value to be alway 1. Reason being, even though userspace has no
      business knowing about softe, it is part of pt_regs. Like-wise in
      signal context.
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      01417c6c
    • M
      powerpc/64: Add #defines for paca->soft_enabled flags · c2e480ba
      Madhavan Srinivasan 提交于
      Two #defines IRQS_ENABLED and IRQS_DISABLED are added to be used when
      updating paca->soft_enabled. Replace the hardcoded values used when
      updating paca->soft_enabled with IRQ_(EN|DIS)ABLED #define. No logic
      change.
      Reviewed-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c2e480ba
  19. 13 12月, 2017 1 次提交
    • R
      powerpc/perf: Dereference BHRB entries safely · f41d84dd
      Ravi Bangoria 提交于
      It's theoretically possible that branch instructions recorded in
      BHRB (Branch History Rolling Buffer) entries have already been
      unmapped before they are processed by the kernel. Hence, trying to
      dereference such memory location will result in a crash. eg:
      
          Unable to handle kernel paging request for data at address 0xd000000019c41764
          Faulting instruction address: 0xc000000000084a14
          NIP [c000000000084a14] branch_target+0x4/0x70
          LR [c0000000000eb828] record_and_restart+0x568/0x5c0
          Call Trace:
          [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
          [c0000000000ec378] perf_event_interrupt+0x298/0x460
          [c000000000027964] performance_monitor_exception+0x54/0x70
          [c000000000009ba4] performance_monitor_common+0x114/0x120
      
      Fix it by deferefencing the addresses safely.
      
      Fixes: 69123184 ("powerpc/perf: Fix setting of "to" addresses for BHRB")
      Cc: stable@vger.kernel.org # v3.10+
      Suggested-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reviewed-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      [mpe: Use probe_kernel_read() which is clearer, tweak change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f41d84dd
  20. 04 12月, 2017 1 次提交
    • R
      powerpc/perf: Fix oops when grouping different pmu events · 5aa04b3e
      Ravi Bangoria 提交于
      When user tries to group imc (In-Memory Collections) event with
      normal event, (sometime) kernel crashes with following log:
      
          Faulting instruction address: 0x00000000
          [link register   ] c00000000010ce88 power_check_constraints+0x128/0x980
          ...
          c00000000010e238 power_pmu_event_init+0x268/0x6f0
          c0000000002dc60c perf_try_init_event+0xdc/0x1a0
          c0000000002dce88 perf_event_alloc+0x7b8/0xac0
          c0000000002e92e0 SyS_perf_event_open+0x530/0xda0
          c00000000000b004 system_call+0x38/0xe0
      
      'event_base' field of 'struct hw_perf_event' is used as flags for
      normal hw events and used as memory address for imc events. While
      grouping these two types of events, collect_events() tries to
      interpret imc 'event_base' as a flag, which causes a corruption
      resulting in a crash.
      
      Consider only those events which belongs to 'perf_hw_context' in
      collect_events().
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Reviewed-By: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      5aa04b3e
  21. 20 9月, 2017 1 次提交
  22. 29 8月, 2017 1 次提交
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
  23. 19 4月, 2017 2 次提交
  24. 09 3月, 2017 1 次提交
  25. 17 2月, 2017 2 次提交
  26. 18 1月, 2017 1 次提交
  27. 25 12月, 2016 1 次提交
  28. 13 9月, 2016 1 次提交
  29. 14 7月, 2016 1 次提交
  30. 14 6月, 2016 1 次提交