1. 02 4月, 2015 2 次提交
  2. 28 1月, 2015 1 次提交
  3. 29 10月, 2014 1 次提交
    • I
      perf/x86/intel: Revert incomplete and undocumented Broadwell client support · 1776b106
      Ingo Molnar 提交于
      These patches:
      
        86a349a2 ("perf/x86/intel: Add Broadwell core support")
        c46e665f ("perf/x86: Add INST_RETIRED.ALL workarounds")
        fdda3c4a ("perf/x86/intel: Use Broadwell cache event list for Haswell")
      
      introduced magic constants and unexplained changes:
      
        https://lkml.org/lkml/2014/10/28/1128
        https://lkml.org/lkml/2014/10/27/325
        https://lkml.org/lkml/2014/8/27/546
        https://lkml.org/lkml/2014/10/28/546
      
      Peter Zijlstra has attempted to help out, to clean up the mess:
      
        https://lkml.org/lkml/2014/10/28/543
      
      But has not received helpful and constructive replies which makes
      me doubt wether it can all be finished in time until v3.18 is
      released.
      
      Despite various review feedback the author (Andi Kleen) has answered
      only few of the review questions and has generally been uncooperative,
      only giving replies when prompted repeatedly, and only giving minimal
      answers instead of constructively explaining and helping along the effort.
      
      That kind of behavior is not acceptable.
      
      There's also a boot crash on Intel E5-1630 v3 CPUs reported for another
      commit from Andi Kleen:
      
        e735b9db ("perf/x86/intel/uncore: Add Haswell-EP uncore support")
      
        https://lkml.org/lkml/2014/10/22/730
      
      Which is not yet resolved. The uncore driver is independent in theory,
      but the crash makes me worry about how well all these patches were
      tested and makes me uneasy about the level of interminging that the
      Broadwell and Haswell code has received by the commits above.
      
      As a first step to resolve the mess revert the Broadwell client commits
      back to the v3.17 version, before we run out of time and problematic
      code hits a stable upstream kernel.
      
      ( If the Haswell-EP crash is not resolved via a simple fix then we'll have
        to revert the Haswell-EP uncore driver as well. )
      
      The Broadwell client series has to be submitted in a clean fashion, with
      single, well documented changes per patch. If they are submitted in time
      and are accepted during review then they can possibly go into v3.19 but
      will need additional scrutiny due to the rocky history of this patch set.
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: eranian@google.com
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1776b106
  4. 24 9月, 2014 5 次提交
  5. 27 8月, 2014 1 次提交
    • C
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter 提交于
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1 but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      89cbc767
  6. 13 8月, 2014 2 次提交
  7. 16 7月, 2014 2 次提交
  8. 02 7月, 2014 1 次提交
    • H
      perf/x86/intel: ignore CondChgd bit to avoid false NMI handling · b292d7a1
      HATAYAMA Daisuke 提交于
      Currently, any NMI is falsely handled by a NMI handler of NMI watchdog
      if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set.
      
      For example, we use external NMI to make system panic to get crash
      dump, but in this case, the external NMI is falsely handled do to the
      issue.
      
      This commit deals with the issue simply by ignoring CondChgd bit.
      
      Here is explanation in detail.
      
      On x86 NMI watchdog uses performance monitoring feature to
      periodically signal NMI each time performance counter gets overflowed.
      
      intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI
      handler of NMI watchdog, perf_event_nmi_handler(). It identifies an
      owner of a given NMI by looking at overflow status bits in
      MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it
      handles the given NMI as its own NMI.
      
      The problem is that the intel_pmu_handle_irq() doesn't distinguish
      CondChgd bit from other bits. Unlike the other status bits, CondChgd
      bit doesn't represent overflow status for performance counters. Thus,
      CondChgd bit cannot be thought of as a mark indicating a given NMI is
      NMI watchdog's.
      
      As a result, if CondChgd bit is set, any NMI is falsely handled by the
      NMI handler of NMI watchdog. Also, if type of the falsely handled NMI
      is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding
      action is never performed until CondChgd bit is cleared.
      
      I noticed this behavior on systems with Ivy Bridge processors: Intel
      Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems,
      CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set
      in the beginning at boot. Then the CondChgd bit is immediately cleared
      by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain
      0.
      
      On the other hand, on older processors such as Nehalem, Xeon E7540,
      CondChgd bit is not set in the beginning at boot.
      
      I'm not sure about exact behavior of CondChgd bit, in particular when
      this bit is set. Although I read Intel System Programmer's Manual to
      figure out that, the descriptions I found are:
      
        In 18.9.1:
      
        "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to
         indicate changes to the state of performancmonitoring hardware"
      
        In Table 35-2 IA-32 Architectural MSRs
      
        63 CondChg: status bits of this register has changed.
      
      These are different from the bahviour I see on the actual system as I
      explained above.
      
      At least, I think ignoring CondChgd bit should be enough for NMI
      watchdog perspective.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b292d7a1
  9. 07 5月, 2014 1 次提交
  10. 22 2月, 2014 2 次提交
  11. 04 10月, 2013 1 次提交
  12. 23 9月, 2013 1 次提交
  13. 13 9月, 2013 5 次提交
  14. 12 9月, 2013 1 次提交
    • S
      perf/x86: Add constraint for IVB CYCLE_ACTIVITY:CYCLES_LDM_PENDING · 6113af14
      Stephane Eranian 提交于
      The IvyBridge event CYCLE_ACTIVITY:CYCLES_LDM_PENDING can only
      be measured on counters 0-3 when HT is off. When HT is on, you
      only have counters 0-3.
      
      If you program it on the eight counters for 1s on a 3GHz
      IVB laptop running a noploop, you see:
      
                 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
                 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
                 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
                 2 747 527 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
             3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
             3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
             3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
             3 280 563 608 CYCLE_ACTIVITY:CYCLES_LDM_PENDING
      
      Clearly the last 4 values are bogus.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Cc: ak@linux.intel.com
      Cc: zheng.z.yan@intel.com
      Cc: dhsharp@google.com
      Link: http://lkml.kernel.org/r/20130911152222.GA28761@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6113af14
  15. 02 9月, 2013 2 次提交
  16. 12 8月, 2013 1 次提交
  17. 27 6月, 2013 1 次提交
    • S
      perf/x86: Fix shared register mutual exclusion enforcement · 2f7f73a5
      Stephane Eranian 提交于
      This patch fixes a problem with the shared registers mutual
      exclusion code and incremental event scheduling by the
      generic perf_event code.
      
      There was a bug whereby the mutual exclusion on the shared
      registers was not enforced because of incremental scheduling
      abort due to event constraints. As an example on Intel
      Nehalem, consider the following events:
      
      group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
      group2= L1D_CACHE_LD:I_STATE
      
      The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
      are 3 instances here. The first group can be scheduled and is committed.
      Then, the generic code tries to schedule group2 and this fails (because
      there is no more counter to support the 3rd instance of L1D_CACHE_LD).
      But in x86_schedule_events() error path, put_event_contraints() is invoked
      on ALL the events and not just the ones that just failed. That causes the
      "lock" on the shared offcore_response MSR to be released. Yet the first group
      is actually scheduled and is exposed to reprogramming of that shared msr by
      the sibling HT thread. In other words, there is no guarantee on what is
      measured.
      
      This patch fixes the problem by tagging committed events with the
      PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
      only the events NOT tagged have their constraint released. The tag
      is eventually removed when the event in descheduled.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20130620164254.GA3556@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2f7f73a5
  18. 26 6月, 2013 1 次提交
  19. 19 6月, 2013 6 次提交
  20. 04 5月, 2013 1 次提交
  21. 16 4月, 2013 1 次提交
    • S
      perf/x86: Fix offcore_rsp valid mask for SNB/IVB · f1923820
      Stephane Eranian 提交于
      The valid mask for both offcore_response_0 and
      offcore_response_1 was wrong for SNB/SNB-EP,
      IVB/IVB-EP. It was possible to write to
      reserved bit and cause a GP fault crashing
      the kernel.
      
      This patch fixes the problem by correctly marking the
      reserved bits in the valid mask for all the processors
      mentioned above.
      
      A distinction between desktop and server parts is introduced
      because bits 24-30 are only available on the server parts.
      
      This version of the  patch is just a rebase to perf/urgent tree
      and should apply to older kernels as well.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: gregkh@linuxfoundation.org
      Cc: security@kernel.org
      Cc: ak@linux.intel.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f1923820
  22. 10 4月, 2013 1 次提交