1. 05 5月, 2018 2 次提交
    • P
      perf/x86: Fix possible Spectre-v1 indexing for x86_pmu::event_map() · 46b1b577
      Peter Zijlstra 提交于
      > arch/x86/events/intel/cstate.c:307 cstate_pmu_event_init() warn: potential spectre issue 'pkg_msr' (local cap)
      > arch/x86/events/intel/core.c:337 intel_pmu_event_map() warn: potential spectre issue 'intel_perfmon_event_map'
      > arch/x86/events/intel/knc.c:122 knc_pmu_event_map() warn: potential spectre issue 'knc_perfmon_event_map'
      > arch/x86/events/intel/p4.c:722 p4_pmu_event_map() warn: potential spectre issue 'p4_general_events'
      > arch/x86/events/intel/p6.c:116 p6_pmu_event_map() warn: potential spectre issue 'p6_perfmon_event_map'
      > arch/x86/events/amd/core.c:132 amd_pmu_event_map() warn: potential spectre issue 'amd_perfmon_event_map'
      
      Userspace controls @attr, sanitize @attr->config before passing it on
      to x86_pmu::event_map().
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      46b1b577
    • P
      perf/x86: Fix possible Spectre-v1 indexing for hw_perf_event cache_* · ef9ee4ad
      Peter Zijlstra 提交于
      > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids[cache_type]' (local cap)
      > arch/x86/events/core.c:319 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_event_ids' (local cap)
      > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs[cache_type]' (local cap)
      > arch/x86/events/core.c:328 set_ext_hw_attr() warn: potential spectre issue 'hw_cache_extra_regs' (local cap)
      
      Userspace controls @config which contains 3 (byte) fields used for a 3
      dimensional array deref.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ef9ee4ad
  2. 19 4月, 2018 1 次提交
    • D
      compat: Move compat_timespec/ timeval to compat_time.h · 0d55303c
      Deepa Dinamani 提交于
      All the current architecture specific defines for these
      are the same. Refactor these common defines to a common
      header file.
      
      The new common linux/compat_time.h is also useful as it
      will eventually be used to hold all the defines that
      are needed for compat time types that support non y2038
      safe types. New architectures need not have to define these
      new types as they will only use new y2038 safe syscalls.
      This file can be deleted after y2038 when we stop supporting
      non y2038 safe syscalls.
      
      The patch also requires an operation similar to:
      
      git grep "asm/compat\.h" | cut -d ":" -f 1 |  xargs -n 1 sed -i -e "s%asm/compat.h%linux/compat.h%g"
      
      Cc: acme@kernel.org
      Cc: benh@kernel.crashing.org
      Cc: borntraeger@de.ibm.com
      Cc: catalin.marinas@arm.com
      Cc: cmetcalf@mellanox.com
      Cc: cohuck@redhat.com
      Cc: davem@davemloft.net
      Cc: deller@gmx.de
      Cc: devel@driverdev.osuosl.org
      Cc: gerald.schaefer@de.ibm.com
      Cc: gregkh@linuxfoundation.org
      Cc: heiko.carstens@de.ibm.com
      Cc: hoeppner@linux.vnet.ibm.com
      Cc: hpa@zytor.com
      Cc: jejb@parisc-linux.org
      Cc: jwi@linux.vnet.ibm.com
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: linux-parisc@vger.kernel.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Cc: mark.rutland@arm.com
      Cc: mingo@redhat.com
      Cc: mpe@ellerman.id.au
      Cc: oberpar@linux.vnet.ibm.com
      Cc: oprofile-list@lists.sf.net
      Cc: paulus@samba.org
      Cc: peterz@infradead.org
      Cc: ralf@linux-mips.org
      Cc: rostedt@goodmis.org
      Cc: rric@kernel.org
      Cc: schwidefsky@de.ibm.com
      Cc: sebott@linux.vnet.ibm.com
      Cc: sparclinux@vger.kernel.org
      Cc: sth@linux.vnet.ibm.com
      Cc: ubraun@linux.vnet.ibm.com
      Cc: will.deacon@arm.com
      Cc: x86@kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NJames Hogan <jhogan@kernel.org>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      0d55303c
  3. 27 3月, 2018 1 次提交
  4. 20 3月, 2018 2 次提交
  5. 17 3月, 2018 1 次提交
  6. 16 3月, 2018 1 次提交
  7. 12 3月, 2018 1 次提交
    • P
      perf/core: Remove perf_event::group_entry · 8343aae6
      Peter Zijlstra 提交于
      Now that all the grouping is done with RB trees, we no longer need
      group_entry and can replace the whole thing with sibling_list.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8343aae6
  8. 09 3月, 2018 3 次提交
    • K
      perf/x86/intel: Disable userspace RDPMC usage for large PEBS · 1af22eba
      Kan Liang 提交于
      Userspace RDPMC cannot possibly work for large PEBS, which was introduced in:
      
        b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      
      When the PEBS interrupt threshold is larger than one, there is no way
      to get exact auto-reload times and value for userspace RDPMC.  Disable
      the userspace RDPMC usage when large PEBS is enabled.
      
      The only exception is when the PEBS interrupt threshold is 1, in which
      case user-space RDPMC works well even with auto-reload events.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: b8241d20699e ("perf/x86/intel: Implement batched PEBS interrupt handling (large PEBS interrupt threshold)")
      Link: http://lkml.kernel.org/r/1518474035-21006-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1af22eba
    • K
      perf/x86: Introduce a ->read() callback in 'struct x86_pmu' · bcfbe5c4
      Kan Liang 提交于
      Auto-reload needs to be specially handled when reading event counts.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1518474035-21006-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bcfbe5c4
    • K
      perf/x86/intel: Fix event update for auto-reload · d31fc13f
      Kan Liang 提交于
      There is a bug when reading event->count with large PEBS enabled.
      
      Here is an example:
      
        # ./read_count
        0x71f0
        0x122c0
        0x1000000001c54
        0x100000001257d
        0x200000000bdc5
      
      In fixed period mode, the auto-reload mechanism could be enabled for
      PEBS events, but the calculation of event->count does not take the
      auto-reload values into account.
      
      Anyone who reads event->count will get the wrong result, e.g x86_pmu_read().
      
      This bug was introduced with the auto-reload mechanism enabled since
      commit:
      
        851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      
      Introduce intel_pmu_save_and_restart_reload() to calculate the
      event->count only for auto-reload.
      
      Since the counter increments a negative counter value and overflows on
      the sign switch, giving the interval:
      
              [-period, 0]
      
      the difference between two consequtive reads is:
      
       A) value2 - value1;
          when no overflows have happened in between,
       B) (0 - value1) + (value2 - (-period));
          when one overflow happened in between,
       C) (0 - value1) + (n - 1) * (period) + (value2 - (-period));
          when @n overflows happened in between.
      
      Here A) is the obvious difference, B) is the extension to the discrete
      interval, where the first term is to the top of the interval and the
      second term is from the bottom of the next interval and C) the extension
      to multiple intervals, where the middle term is the whole intervals
      covered.
      
      The equation for all cases is:
      
          value2 - value1 + n * period
      
      Previously the event->count is updated right before the sample output.
      But for case A, there is no PEBS record ready. It needs to be specially
      handled.
      
      Remove the auto-reload code from x86_perf_event_set_period() since
      we'll not longer call that function in this case.
      
      Based-on-code-from: Peter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Fixes: 851559e3 ("perf/x86/intel: Use the PEBS auto reload mechanism when possible")
      Link: http://lkml.kernel.org/r/1518474035-21006-2-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d31fc13f
  9. 17 12月, 2017 1 次提交
  10. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  11. 24 10月, 2017 1 次提交
  12. 29 8月, 2017 2 次提交
  13. 25 8月, 2017 1 次提交
    • A
      perf/x86: Export some PMU attributes in caps/ directory · b00233b5
      Andi Kleen 提交于
      It can be difficult to figure out for user programs what features
      the x86 CPU PMU driver actually supports. Currently it requires
      grepping in dmesg, but dmesg is not always available.
      
      This adds a caps directory to /sys/bus/event_source/devices/cpu/,
      similar to the caps already used on intel_pt, which can be used to
      discover the available capabilities cleanly.
      
      Three capabilities are defined:
      
       - pmu_name:	Underlying CPU name known to the driver
       - max_precise:	Max precise level supported
       - branches:	Known depth of LBR.
      
      Example:
      
        % grep . /sys/bus/event_source/devices/cpu/caps/*
        /sys/bus/event_source/devices/cpu/caps/branches:32
        /sys/bus/event_source/devices/cpu/caps/max_precise:3
        /sys/bus/event_source/devices/cpu/caps/pmu_name:skylake
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822185201.9261-3-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b00233b5
  14. 10 8月, 2017 1 次提交
  15. 20 7月, 2017 1 次提交
    • A
      perf/x86: Shut up false-positive -Wmaybe-uninitialized warning · 11d8b058
      Arnd Bergmann 提交于
      The intialization function checks for various failure scenarios, but
      unfortunately the compiler gets a little confused about the possible
      combinations, leading to a false-positive build warning when
      -Wmaybe-uninitialized is set:
      
        arch/x86/events/core.c: In function ‘init_hw_perf_events’:
        arch/x86/events/core.c:264:3: warning: ‘reg_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
        arch/x86/events/core.c:264:3: warning: ‘val_fail’ may be used uninitialized in this function [-Wmaybe-uninitialized]
           pr_err(FW_BUG "the BIOS has corrupted hw-PMU resources (MSR %x is %Lx)\n",
      
      We can't actually run into this case, so this shuts up the warning
      by initializing the variables to a known-invalid state.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-2-arnd@arndb.de
      Link: https://patchwork.kernel.org/patch/9392595/Signed-off-by: NIngo Molnar <mingo@kernel.org>
      11d8b058
  16. 08 6月, 2017 1 次提交
  17. 05 6月, 2017 1 次提交
    • A
      x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
      Andy Lutomirski 提交于
      Lazy TLB state is currently managed in a rather baroque manner.
      AFAICT, there are three possible states:
      
       - Non-lazy.  This means that we're running a user thread or a
         kernel thread that has called use_mm().  current->mm ==
         current->active_mm == cpu_tlbstate.active_mm and
         cpu_tlbstate.state == TLBSTATE_OK.
      
       - Lazy with user mm.  We're running a kernel thread without an mm
         and we're borrowing an mm_struct.  We have current->mm == NULL,
         current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
         != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
         in mm_cpumask(current->active_mm).  CR3 points to
         current->active_mm->pgd.  The TLB is up to date.
      
       - Lazy with init_mm.  This happens when we call leave_mm().  We
         have current->mm == NULL, current->active_mm ==
         cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
         the scheduler is tracking it for refcounting.  cpu_tlbstate.state
         != TLBSTATE_OK.  The current cpu is clear in
         mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
         i.e. init_mm->pgd.
      
      This patch simplifies the situation.  Other than perf, x86 stops
      caring about current->active_mm at all.  We have
      cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
      TLB is always up to date for that mm.  leave_mm() just switches us
      to init_mm.  There are no longer any special cases for mm_cpumask,
      and switch_mm() switches mms without worrying about laziness.
      
      After this patch, cpu_tlbstate.state serves only to tell the TLB
      flush code whether it may switch to init_mm instead of doing a
      normal flush.
      
      This makes fairly extensive changes to xen_exit_mmap(), which used
      to look a bit like black magic.
      
      Perf is unchanged.  With or without this change, perf may behave a bit
      erratically if it tries to read user memory in kernel thread context.
      We should build on this patch to teach perf to never look at user
      memory when cpu_tlbstate.loaded_mm != current->mm.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3d28ebce
  18. 26 5月, 2017 1 次提交
  19. 23 5月, 2017 1 次提交
    • K
      perf/x86: Add sysfs entry to freeze counters on SMI · 6089327f
      Kan Liang 提交于
      Currently, the SMIs are visible to all performance counters, because
      many users want to measure everything including SMIs. But in some
      cases, the SMI cycles should not be counted - for example, to calculate
      the cost of an SMI itself. So a knob is needed.
      
      When setting FREEZE_WHILE_SMM bit in IA32_DEBUGCTL, all performance
      counters will be effected. There is no way to do per-counter freeze
      on SMI. So it should not use the per-event interface (e.g. ioctl or
      event attribute) to set FREEZE_WHILE_SMM bit.
      
      Adds sysfs entry /sys/device/cpu/freeze_on_smi to set FREEZE_WHILE_SMM
      bit in IA32_DEBUGCTL. When set, freezes perfmon and trace messages
      while in SMM.
      
      Value has to be 0 or 1. It will be applied to all processors.
      
      Also serialize the entire setting so we don't get multiple concurrent
      threads trying to update to different values.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: bp@alien8.de
      Cc: jolsa@kernel.org
      Link: http://lkml.kernel.org/r/1494600673-244667-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6089327f
  20. 15 5月, 2017 1 次提交
    • P
      x86/tsc: Remodel cyc2ns to use seqcount_latch() · 59eaef78
      Peter Zijlstra 提交于
      Replace the custom multi-value scheme with the more regular
      seqcount_latch() scheme. Along with scrapping a lot of lines, the latch
      scheme is better documented and used in more places.
      
      The immediate benefit however is not being limited on the update side.
      The current code has a limit where the writers block which is hit by
      future changes.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59eaef78
  21. 23 3月, 2017 1 次提交
  22. 17 3月, 2017 2 次提交
  23. 02 3月, 2017 2 次提交
  24. 14 1月, 2017 1 次提交
  25. 25 12月, 2016 1 次提交
  26. 11 12月, 2016 1 次提交
  27. 10 12月, 2016 1 次提交
    • T
      x86/ldt: Make all size computations unsigned · 990e9dc3
      Thomas Gleixner 提交于
      ldt->size can never be negative. The helper functions take 'unsigned int'
      arguments which are assigned from ldt->size. The related user space
      user_desc struct member entry_number is unsigned as well.
      
      But ldt->size itself and a few local variables which are related to
      ldt->size are type 'int' which makes no sense whatsoever and results in
      typecasts which make the eyes bleed.
      
      Clean it up and convert everything which is related to ldt->size to
      unsigned it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      990e9dc3
  28. 06 12月, 2016 1 次提交
    • P
      perf/x86: Fix full width counter, counter overflow · 7f612a7f
      Peter Zijlstra (Intel) 提交于
      Lukasz reported that perf stat counters overflow handling is broken on KNL/SLM.
      
      Both these parts have full_width_write set, and that does indeed have
      a problem. In order to deal with counter wrap, we must sample the
      counter at at least half the counter period (see also the sampling
      theorem) such that we can unambiguously reconstruct the count.
      
      However commit:
      
        069e0c3c ("perf/x86/intel: Support full width counting")
      
      sets the sampling interval to the full period, not half.
      
      Fixing that exposes another issue, in that we must not sign extend the
      delta value when we shift it right; the counter cannot have
      decremented after all.
      
      With both these issues fixed, counter overflow functions correctly
      again.
      Reported-by: NLukasz Odzioba <lukasz.odzioba@intel.com>
      Tested-by: NLiang, Kan <kan.liang@intel.com>
      Tested-by: NOdzioba, Lukasz <lukasz.odzioba@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@vger.kernel.org
      Fixes: 069e0c3c ("perf/x86/intel: Support full width counting")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7f612a7f
  29. 22 11月, 2016 1 次提交
    • J
      perf/x86: Restore TASK_SIZE check on frame pointer · ae31fe51
      Johannes Weiner 提交于
      The following commit:
      
        75925e1a ("perf/x86: Optimize stack walk user accesses")
      
      ... switched from copy_from_user_nmi() to __copy_from_user_nmi() with a manual
      access_ok() check.
      
      Unfortunately, copy_from_user_nmi() does an explicit check against TASK_SIZE,
      whereas the access_ok() uses whatever the current address limit of the task is.
      
      We are getting NMIs when __probe_kernel_read() has switched to KERNEL_DS, and
      then see vmalloc faults when we access what looks like pointers into vmalloc
      space:
      
        [] WARNING: CPU: 3 PID: 3685731 at arch/x86/mm/fault.c:435 vmalloc_fault+0x289/0x290
        [] CPU: 3 PID: 3685731 Comm: sh Tainted: G        W       4.6.0-5_fbk1_223_gdbf0f40 #1
        [] Call Trace:
        []  <NMI>  [<ffffffff814717d1>] dump_stack+0x4d/0x6c
        []  [<ffffffff81076e43>] __warn+0xd3/0xf0
        []  [<ffffffff81076f2d>] warn_slowpath_null+0x1d/0x20
        []  [<ffffffff8104a899>] vmalloc_fault+0x289/0x290
        []  [<ffffffff8104b5a0>] __do_page_fault+0x330/0x490
        []  [<ffffffff8104b70c>] do_page_fault+0xc/0x10
        []  [<ffffffff81794e82>] page_fault+0x22/0x30
        []  [<ffffffff81006280>] ? perf_callchain_user+0x100/0x2a0
        []  [<ffffffff8115124f>] get_perf_callchain+0x17f/0x190
        []  [<ffffffff811512c7>] perf_callchain+0x67/0x80
        []  [<ffffffff8114e750>] perf_prepare_sample+0x2a0/0x370
        []  [<ffffffff8114e840>] perf_event_output+0x20/0x60
        []  [<ffffffff8114aee7>] ? perf_event_update_userpage+0xc7/0x130
        []  [<ffffffff8114ea01>] __perf_event_overflow+0x181/0x1d0
        []  [<ffffffff8114f484>] perf_event_overflow+0x14/0x20
        []  [<ffffffff8100a6e3>] intel_pmu_handle_irq+0x1d3/0x490
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff81197191>] ? vunmap_page_range+0x1a1/0x2f0
        []  [<ffffffff811972f1>] ? unmap_kernel_range_noflush+0x11/0x20
        []  [<ffffffff814f2056>] ? ghes_copy_tofrom_phys+0x116/0x1f0
        []  [<ffffffff81040d1d>] ? x2apic_send_IPI_self+0x1d/0x20
        []  [<ffffffff8100411d>] perf_event_nmi_handler+0x2d/0x50
        []  [<ffffffff8101ea31>] nmi_handle+0x61/0x110
        []  [<ffffffff8101ef94>] default_do_nmi+0x44/0x110
        []  [<ffffffff8101f13b>] do_nmi+0xdb/0x150
        []  [<ffffffff81795187>] end_repeat_nmi+0x1a/0x1e
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  [<ffffffff8147daf7>] ? copy_user_enhanced_fast_string+0x7/0x10
        []  <<EOE>>  <IRQ>  [<ffffffff8115d05e>] ? __probe_kernel_read+0x3e/0xa0
      
      Fix this by moving the valid_user_frame() check to before the uaccess
      that loads the return address and the pointer to the next frame.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Fixes: 75925e1a ("perf/x86: Optimize stack walk user accesses")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ae31fe51
  30. 20 9月, 2016 1 次提交
  31. 15 9月, 2016 1 次提交
    • J
      x86/dumpstack: Add get_stack_info() interface · cb76c939
      Josh Poimboeuf 提交于
      valid_stack_ptr() is buggy: it assumes that all stacks are of size
      THREAD_SIZE, which is not true for exception stacks.  So the
      walk_stack() callbacks will need to know the location of the beginning
      of the stack as well as the end.
      
      Another issue is that in general the various features of a stack (type,
      size, next stack pointer, description string) are scattered around in
      various places throughout the stack dump code.
      
      Encapsulate all that information in a single place with a new stack_info
      struct and a get_stack_info() interface.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8164dd0db96b7e6a279fa17ae5e6dc375eecb4a9.1473905218.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb76c939
  32. 08 9月, 2016 1 次提交
  33. 10 8月, 2016 1 次提交
    • P
      perf/x86: Ensure perf_sched_cb_{inc,dec}() is only called from pmu::{add,del}() · 68f7082f
      Peter Zijlstra 提交于
      Currently perf_sched_cb_{inc,dec}() are called from
      pmu::{start,stop}(), which has the problem that this can happen from
      NMI context, this is making it hard to optimize perf_pmu_sched_task().
      
      Furthermore, we really only need this accounting on pmu::{add,del}(),
      so doing it from pmu::{start,stop}() is doing more work than we really
      need.
      
      Introduce x86_pmu::{add,del}() and wire up the LBR and PEBS.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      68f7082f