1. 02 9月, 2013 3 次提交
  2. 16 8月, 2013 3 次提交
  3. 13 8月, 2013 1 次提交
    • T
      x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 · 8c6b79bb
      Torsten Kaiser 提交于
      cpu_has_amd_erratum() is buggy, because it uses the per-cpu cpu_info
      before it is filled by smp_store_boot_cpu_info() / smp_store_cpu_info().
      
      If early microcode loading is enabled its collect_cpu_info_amd_early()
      will fill ->x86 and so the fallback to boot_cpu_data is not used. But
      ->x86_vendor was not filled and is still X86_VENDOR_INTEL resulting in
      no errata fixes getting applied and my system hangs on boot.
      
      Using cpu_info in cpu_has_amd_erratum() is wrong anyway: its only
      caller init_amd() will have a struct cpuinfo_x86 as parameter and the
      set_cpu_bug() that is controlled by cpu_has_amd_erratum() also only uses
      that struct.
      
      So pass the struct cpuinfo_x86 from init_amd() to cpu_has_amd_erratum()
      and the broken fallback can be dropped.
      
      [ Boris: Drop WARN_ON() since we're called only from init_amd() ]
      Signed-off-by: NTorsten Kaiser <just.for.lkml@googlemail.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      8c6b79bb
  4. 12 8月, 2013 1 次提交
  5. 07 8月, 2013 1 次提交
  6. 05 8月, 2013 1 次提交
    • V
      perf/x86: Fix intel QPI uncore event definitions · c9601247
      Vince Weaver 提交于
      John McCalpin reports that the "drs_data" and "ncb_data" QPI
      uncore events are missing the "extra bit" and always return zero
      values unless the bit is properly set.
      
      More details from him:
      
       According to the Xeon E5-2600 Product Family Uncore Performance
       Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
       (including the ones that "perf" calls "drs_data" and "ncb_data") require
       that the "extra bit" be set.
      
       This was confusing for a while -- a note at the bottom of page 94 says
       that the "extra bit" is bit 16 of the control register.
       Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
       be zero.  Looking around a bit, I found that bit 21 appears to be the
       correct "extra bit", and further investigation shows that "perf" actually
       agrees with me:
      	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
      	config:0-7,21
      
       So the command
      	# perf -e "uncore_qpi_0/event=drs_data/"
       Is the same as
      	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
       While it should be
      	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"
      
       I confirmed that this last version gives results that agree with the
       amount of data that I expected the STREAM benchmark to move across the QPI
       link in the second (cross-chip) test of the original script.
      Reported-by: NJohn McCalpin <mccalpin@tacc.utexas.edu>
      Signed-off-by: NVince Weaver <vincent.weaver@maine.edu>
      Cc: zheng.z.yan@intel.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c9601247
  7. 30 7月, 2013 1 次提交
  8. 23 7月, 2013 1 次提交
  9. 15 7月, 2013 1 次提交
    • P
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker 提交于
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  10. 05 7月, 2013 1 次提交
  11. 04 7月, 2013 1 次提交
  12. 28 6月, 2013 1 次提交
  13. 27 6月, 2013 2 次提交
  14. 26 6月, 2013 4 次提交
    • A
      perf/x86/intel: Support full width counting · 069e0c3c
      Andi Kleen 提交于
      Recent Intel CPUs like Haswell and IvyBridge have a new
      alternative MSR range for perfctrs that allows writing the full
      counter width. Enable this range if the hardware reports it
      using a new capability bit.
      
      Currently the perf code queries CPUID to get the counter width,
      and sign extends the counter values as needed. The traditional
      PERFCTR MSRs always limit to 32bit, even though the counter
      internally is larger (usually 48 bits on recent CPUs)
      
      When the new capability is set use the alternative range which
      do not have these restrictions.
      
      This lowers the overhead of perf stat slightly because it has to
      do less interrupts to accumulate the counter value. On Haswell
      it also avoids some problems with TSX aborting when the end of
      the counter range is reached.
      
      ( See the patch "perf/x86/intel: Avoid checkpointed counters
        causing excessive TSX aborts" for more details. )
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      069e0c3c
    • H
      x86, asm, cleanup: Replace open-coded control register values with symbolic · a3d7b7dd
      H. Peter Anvin 提交于
      Clean up an unnecessary open-coded control register values.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org
      a3d7b7dd
    • N
      mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem · 0644414e
      Naveen N. Rao 提交于
      There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
      per-cpu bitmaps.  Provide comments so that we all know exactly what these
      are used for, and why.
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      0644414e
    • Y
      x86: Fix /proc/mtrr with base/size more than 44bits · d5c78673
      Yinghai Lu 提交于
      On one sytem that mtrr range is more then 44bits, in dmesg we have
      [    0.000000] MTRR default type: write-back
      [    0.000000] MTRR fixed ranges enabled:
      [    0.000000]   00000-9FFFF write-back
      [    0.000000]   A0000-BFFFF uncachable
      [    0.000000]   C0000-DFFFF write-through
      [    0.000000]   E0000-FFFFF write-protect
      [    0.000000] MTRR variable ranges enabled:
      [    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
      [    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
      [    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
      [    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
      [    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
      [    0.000000]   8 disabled
      [    0.000000]   9 disabled
      
      but /proc/mtrr report wrong:
      reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
      reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
      reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
      reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
      reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
      reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
      reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
      reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
      reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining
      
      so bit 44 and bit 45 get cut off.
      
      We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
      1. for base, we miss cast base_lo to 64bit before shifting.
      Fix that by adding u64 casting.
      
      2. for size, it only can handle 44 bits aka 32bits + page_shift
      Fix that with 64bit mask instead of 32bit mask_lo, then range could be
      more than 44bits.
      At the same time, we need to update size_or_mask for old cpus that does
      support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
      to all 1s, otherwise will not get correct size for them.
      
      Also fix mtrr_add_page: it should check base and (base + size - 1)
      instead of base and size, as base and size could be small but
      base + size could bigger enough to be out of boundary. We can
      use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.
      
      So When are we going to have size more than 44bits? that is 16TiB.
      
      after patch we have right ouput:
      reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
      reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
      reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
      reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
      reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
      reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
      reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
      reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
      reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining
      
      -v2: simply checking in mtrr_add_page according to hpa.
      
      [ hpa: This probably wants to go into -stable only after having sat in
        mainline for a bit.  It is not a regression. ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d5c78673
  15. 23 6月, 2013 1 次提交
    • D
      perf: Drop sample rate when sampling is too slow · 14c63f17
      Dave Hansen 提交于
      This patch keeps track of how long perf's NMI handler is taking,
      and also calculates how many samples perf can take a second.  If
      the sample length times the expected max number of samples
      exceeds a configurable threshold, it drops the sample rate.
      
      This way, we don't have a runaway sampling process eating up the
      CPU.
      
      This patch can tend to drop the sample rate down to level where
      perf doesn't work very well.  *BUT* the alternative is that my
      system hangs because it spends all of its time handling NMIs.
      
      I'll take a busted performance tool over an entire system that's
      busted and undebuggable any day.
      
      BTW, my suspicion is that there's still an underlying bug here.
      Using the HPET instead of the TSC is definitely a contributing
      factor, but I suspect there are some other things going on.
      But, I can't go dig down on a bug like that with my machine
      hanging all the time.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: acme@ghostprotocols.net
      Cc: Dave Hansen <dave@sr71.net>
      [ Prettified it a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      14c63f17
  16. 21 6月, 2013 6 次提交
    • S
      x86, trace: Add irq vector tracepoints · cf910e83
      Seiji Aguchi 提交于
      [Purpose of this patch]
      
      As Vaibhav explained in the thread below, tracepoints for irq vectors
      are useful.
      
      http://www.spinics.net/lists/mm-commits/msg85707.html
      
      <snip>
      The current interrupt traces from irq_handler_entry and irq_handler_exit
      provide when an interrupt is handled.  They provide good data about when
      the system has switched to kernel space and how it affects the currently
      running processes.
      
      There are some IRQ vectors which trigger the system into kernel space,
      which are not handled in generic IRQ handlers.  Tracing such events gives
      us the information about IRQ interaction with other system events.
      
      The trace also tells where the system is spending its time.  We want to
      know which cores are handling interrupts and how they are affecting other
      processes in the system.  Also, the trace provides information about when
      the cores are idle and which interrupts are changing that state.
      <snip>
      
      On the other hand, my usecase is tracing just local timer event and
      getting a value of instruction pointer.
      
      I suggested to add an argument local timer event to get instruction pointer before.
      But there is another way to get it with external module like systemtap.
      So, I don't need to add any argument to irq vector tracepoints now.
      
      [Patch Description]
      
      Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
      But there is an above use case to trace specific irq_vector rather than tracing all events.
      In this case, we are concerned about overhead due to unwanted events.
      
      So, add following tracepoints instead of introducing irq_vector_entry/exit.
      so that we can enable them independently.
         - local_timer_vector
         - reschedule_vector
         - call_function_vector
         - call_function_single_vector
         - irq_work_entry_vector
         - error_apic_vector
         - thermal_apic_vector
         - threshold_apic_vector
         - spurious_apic_vector
         - x86_platform_ipi_vector
      
      Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
      makes a zero when tracepoints are disabled. Detailed explanations are as follows.
       - Create trace irq handlers with entering_irq()/exiting_irq().
       - Create a new IDT, trace_idt_table, at boot time by adding a logic to
         _set_gate(). It is just a copy of original idt table.
       - Register the new handlers for tracpoints to the new IDT by introducing
         macros to alloc_intr_gate() called at registering time of irq_vector handlers.
       - Add checking, whether irq vector tracing is on/off, into load_current_idt().
         This has to be done below debug checking for these reasons.
         - Switching to debug IDT may be kicked while tracing is enabled.
         - On the other hands, switching to trace IDT is kicked only when debugging
           is disabled.
      
      In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
      used for other purposes.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      cf910e83
    • S
      x86: Rename variables for debugging · 629f4f9d
      Seiji Aguchi 提交于
      Rename variables for debugging to describe meaning of them precisely.
      
      Also, introduce a generic way to switch IDT by checking a current state,
      debug on/off.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C323A8.7050905@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      629f4f9d
    • S
      x86, trace: Introduce entering/exiting_irq() · eddc0e92
      Seiji Aguchi 提交于
      When implementing tracepoints in interrupt handers, if the tracepoints are
      simply added in the performance sensitive path of interrupt handers,
      it may cause potential performance problem due to the time penalty.
      
      To solve the problem, an idea is to prepare non-trace/trace irq handers and
      switch their IDTs at the enabling/disabling time.
      
      So, let's introduce entering_irq()/exiting_irq() for pre/post-
      processing of each irq handler.
      
      A way to use them is as follows.
      
      Non-trace irq handler:
      smp_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      Trace irq_handler:
      smp_trace_irq_handler()
      {
      	entering_irq();		/* pre-processing of this handler */
      	trace_irq_entry();	/* tracepoint for irq entry */
      	__smp_irq_handler();	/*
      				 * common logic between non-trace and trace handlers
      				 * in a vector.
      				 */
      	trace_irq_exit();	/* tracepoint for irq exit */
      	exiting_irq();		/* post-processing of this handler */
      
      }
      
      If tracepoints can place outside entering_irq()/exiting_irq() as follows,
      it looks cleaner.
      
      smp_trace_irq_handler()
      {
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      }
      
      But it doesn't work.
      The problem is with irq_enter/exit() being called. They must be called before
      trace_irq_enter/exit(),  because of the rcu_irq_enter() must be called before
      any tracepoints are used, as tracepoints use  rcu to synchronize.
      
      As a possible alternative, we may be able to call irq_enter() first as follows
      if irq_enter() can nest.
      
      smp_trace_irq_hander()
      {
      	irq_entry();
      	trace_irq_entry();
      	smp_irq_handler();
      	trace_irq_exit();
      	irq_exit();
      }
      
      But it doesn't work, either.
      If irq_enter() is nested, it may have a time penalty because it has to check if it
      was already called or not. The time penalty is not desired in performance sensitive
      paths even if it is tiny.
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      eddc0e92
    • B
      x86: Add a static_cpu_has_safe variant · 4a90a99c
      Borislav Petkov 提交于
      We want to use this in early code where alternatives might not have run
      yet and for that case we fall back to the dynamic boot_cpu_has.
      
      For that, force a 5-byte jump since the compiler could be generating
      differently sized jumps for each label.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1370772454-6106-5-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4a90a99c
    • B
      x86: Sanity-check static_cpu_has usage · 5700f743
      Borislav Petkov 提交于
      static_cpu_has may be used only after alternatives have run. Before that
      it always returns false if constant folding with __builtin_constant_p()
      doesn't happen. And you don't want that.
      
      This patch is the result of me debugging an issue where I overzealously
      put static_cpu_has in code which executed before alternatives have run
      and had to spend some time with scratching head and cursing at the
      monitor.
      
      So add a jump to a warning which screams loudly when we use this
      function too early. The alternatives patch that check away in
      conjunction with patching the rest of the kernel image.
      
      [ hpa: factored this into its own configuration option.  If we want to
        have an overarching option, it should be an option which selects
        other options, not as a group option in the source code. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1370772454-6106-4-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5700f743
    • B
      x86, cpu: Add a synthetic, always true, cpu feature · c3b83598
      Borislav Petkov 提交于
      This will be used in alternatives later as an always-replace flag.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1370772454-6106-2-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c3b83598
  17. 20 6月, 2013 1 次提交
    • B
      x86/intel/cacheinfo: Shut up last long-standing warning · 719038de
      Borislav Petkov 提交于
      arch/x86/kernel/cpu/intel_cacheinfo.c: In function ‘init_intel_cacheinfo’:
      arch/x86/kernel/cpu/intel_cacheinfo.c:642:28: warning: ‘this_leaf.size’ may be used uninitialized in this function [-Wmaybe-uninitialized] arch/x86/kernel/cpu/intel_cacheinfo.c:643:29: warning: ‘this_leaf.eax.split.num_threads_sharing’ may be used uninitialized in this function [-Wmaybe-uninitialized]
      
      This keeps on happening during randbuilds and the compiler is
      wrong here:
      
      In the case where cpuid4_cache_lookup_regs() returns 0, both
      this_leaf.size and this_leaf.eax get initialized. In the case
      where the CPUID leaf doesn't contain valid cache info, we error
      out which init_intel_cacheinfo() handles correctly without
      touching the abovementioned fields.
      
      So shut up the warning by clearing out the struct which we hand
      down.
      
      While at it, reverse error handling and gain one indentation
      level.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1370710095-20547-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      719038de
  18. 19 6月, 2013 10 次提交