提交 · 6df1e7f2e96721dfdbfd8a034e52bc81916f978c · openanolis / cloud-kernel

07 11月, 2013 1 次提交

x86/apic: Disable I/O APIC before shutdown of the local APIC · 522e6646

由 Fenghua Yu 提交于 10月 23, 2013

In reboot and crash path, when we shut down the local APIC, the I/O APIC is
still active. This may cause issues because external interrupts
can still come in and disturb the local APIC during shutdown process.

To quiet external interrupts, disable I/O APIC before shutdown local APIC.
Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1382578212-4677-1-git-send-email-fenghua.yu@intel.com
Cc: <stable@kernel.org>
[ I suppose the 'issue' is a hang during shutdown. It's a fine change nevertheless. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

522e6646

06 11月, 2013 4 次提交

perf/x86/intel: Add Ivy Bridge-EP uncore IRP box support · f891d8cf

由 Yan, Zheng 提交于 10月 31, 2013

Unlike other uncore boxes, IRP boxes live in PCI buses with no UBOX
device. For PCI bus without UBOX device, we find the next bus that
has UBOX device and use its 'bus to socket' mapping.

Besides the counter/control registers in IRP boxes are not properly
aligned.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-2-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f891d8cf

perf/x86/intel/uncore: Add filter support for IvyBridge-EP QPI boxes · d1e8f4a8

由 Yan, Zheng 提交于 10月 31, 2013

The encoding for filter registers of IvyBridge-EP uncore QPI boxes is
completely the same as SandyBridge-EP.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Cc: "Yan Zheng" <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1383197815-17706-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d1e8f4a8

perf: Fix arch_perf_out_copy_user default · 0a196848

由 Peter Zijlstra 提交于 10月 30, 2013

The arch_perf_output_copy_user() default of
__copy_from_user_inatomic() returns bytes not copied, while all other
argument functions given DEFINE_OUTPUT_COPY() return bytes copied.

Since copy_from_user_nmi() is the odd duck out by returning bytes
copied where all other *copy_{to,from}* functions return bytes not
copied, change it over and ammend DEFINE_OUTPUT_COPY() to expect bytes
not copied.

Oddly enough DEFINE_OUTPUT_COPY() already returned bytes not copied
while expecting its worker functions to return bytes copied.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: will.deacon@arm.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20131030201622.GR16117@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

0a196848

x86/cpu: Always print SMP information in /proc/cpuinfo · a477c859

由 HATAYAMA Daisuke 提交于 11月 05, 2013

Currently show_cpuinfo_core() displays cpu core information only if
the number of threads per a whole cores is 2 or larger.

However, this condition doesn't care about the number of
sockets. For example, this condition doesn't hold on systems
with two logical cpus consisting of two sockets and a single
core on each socket - yet the topology information would be
interesting to see in that case as well.

I don't know whether or not there are processors in real world
by which such configurations are possible, but at least on
vitual machine environments, such configuration can occur,
typically when no explicit SMP information is provided in
advance.

For example, on qemu/KVM, SMP information is specified via -smp
command-line option, more specifically, its syntax is:

  -smp n[,cores=cores][,threads=threads][,sockets=sockets][,maxcpus=maxcpus]

If this is not specified, qemu tells configuration with
n-sockets, 1-core and 1-thread to the guest machine, on which
guest, MP information is not displayed in /proc/cpuinfo.

I saw this situation on VMWare guest environment, too.

To fix this issue, this patch simply removes the condition
because this information is useful even if there's only 1
thread.
Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/5277D644.4090707@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a477c859

30 10月, 2013 1 次提交

KVM: Fix modprobe failure for kvm_intel/kvm_amd · d780a312

由 Tim Gardner 提交于 10月 29, 2013

The x86 specific kvm init creates a new conflicting
debugfs directory which causes modprobe issues
with kvm_intel and kvm_amd. For example,

sudo modprobe kvm_amd
modprobe: ERROR: could not insert 'kvm_amd': Bad address

The simplest fix is to just rename the directory. The following
KVM config options are set:

CONFIG_KVM_GUEST=y
CONFIG_KVM_DEBUG_FS=y
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_HAVE_KVM_IRQ_ROUTING=y
CONFIG_HAVE_KVM_EVENTFD=y
CONFIG_KVM_APIC_ARCHITECTURE=y
CONFIG_KVM_MMIO=y
CONFIG_KVM_ASYNC_PF=y
CONFIG_HAVE_KVM_MSI=y
CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
CONFIG_KVM_DEVICE_ASSIGNMENT=y

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
[Change debugfs directory name. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d780a312

29 10月, 2013 1 次提交

perf/x86: Fix NMI measurements · e8a923cc

由 Peter Zijlstra 提交于 10月 17, 2013

OK, so what I'm actually seeing on my WSM is that sched/clock.c is
'broken' for the purpose we're using it for.

What triggered it is that my WSM-EP is broken :-(

  [    0.001000] tsc: Fast TSC calibration using PIT
  [    0.002000] tsc: Detected 2533.715 MHz processor
  [    0.500180] TSC synchronization [CPU#0 -> CPU#6]:
  [    0.505197] Measured 3 cycles TSC warp between CPUs, turning off TSC clock.
  [    0.004000] tsc: Marking TSC unstable due to check_tsc_sync_source failed

For some reason it consistently detects TSC skew, even though NHM+
should have a single clock domain for 'reasonable' systems.

This marks sched_clock_stable=0, which means that we do fancy stuff to
try and get a 'sane' clock. Part of this fancy stuff relies on the tick,
clearly that's gone when NOHZ=y. So for idle cpus time gets stuck, until
it either wakes up or gets kicked by another cpu.

While this is perfectly fine for the scheduler -- it only cares about
actually running stuff, and when we're running stuff we're obviously not
idle. This does somewhat break down for perf which can trigger events
just fine on an otherwise idle cpu.

So I've got NMIs get get 'measured' as taking ~1ms, which actually
don't last nearly that long:

          <idle>-0     [013] d.h.   886.311970: rcu_nmi_enter <-do_nmi
  ...
          <idle>-0     [013] d.h.   886.311997: perf_sample_event_took: HERE!!! : 1040990

So ftrace (which uses sched_clock(), not the fancy bits) only sees
~27us, but we measure ~1ms !!

Now since all this measurement stuff lives in x86 code, we can actually
fix it.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Don Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Link: http://lkml.kernel.org/r/20131017133350.GG3364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

e8a923cc

26 10月, 2013 2 次提交

x86/time: Honor ACPI FADT flag indicating absence of a CMOS RTC · ee5872be

由 Jan Beulich 提交于 10月 21, 2013

Even though the omission was found only during code review
(originally in the Xen hypervisor, looking through ACPI v5 flags
and their meanings and uses), we shouldn't be creating a
corresponding platform device in that case.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/5265029D02000078000FC4D2@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ee5872be

x86/cpu: Track legacy CPU model data only on 32-bit kernels · 09dc68d9

由 Jan Beulich 提交于 10月 21, 2013

struct cpu_dev's c_models is only ever set inside CONFIG_X86_32
conditionals (or code that's being built for 32-bit only), so
there's no use of reserving the (empty) space for the model
names in a 64-bit kernel.

Similarly, c_size_cache is only used in the #else of a
CONFIG_X86_64 conditional, so reserving space for (and in one
case even initializing) that field is pointless for 64-bit
kernels too.

While moving both fields to the end of the structure, I also
noticed that:

 - the c_models array size was one too small, potentially causing
   table_lookup_model() to return garbage on Intel CPUs (intel.c's
   instance was lacking the sentinel with family being zero), so the
   patch bumps that by one,

 - c_models' vendor sub-field was unused (and anyway redundant
   with the base structure's c_x86_vendor field), so the patch deletes it.

Also rename the legacy fields so that their legacy nature stands out
and comment their declarations.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5265036802000078000FC4DB@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

09dc68d9

16 10月, 2013 1 次提交

perf/x86: Optimize intel_pmu_pebs_fixup_ip() · 9536c8d2

由 Peter Zijlstra 提交于 10月 15, 2013

There's been reports of high NMI handler overhead, highlighted by
such kernel messages:

  [ 3697.380195] perf samples too long (10009 > 10000), lowering kernel.perf_event_max_sample_rate to 13000
  [ 3697.389509] INFO: NMI handler (perf_event_nmi_handler) took too long to run: 9.331 msecs

Don Zickus analyzed the source of the overhead and reported:

 > While there are a few places that are causing latencies, for now I focused on
 > the longest one first.  It seems to be 'copy_user_from_nmi'
 >
 > intel_pmu_handle_irq ->
 >	intel_pmu_drain_pebs_nhm ->
 >		__intel_pmu_drain_pebs_nhm ->
 >			__intel_pmu_pebs_event ->
 >				intel_pmu_pebs_fixup_ip ->
 >					copy_from_user_nmi
 >
 > In intel_pmu_pebs_fixup_ip(), if the while-loop goes over 50, the sum of
 > all the copy_from_user_nmi latencies seems to go over 1,000,000 cycles
 > (there are some cases where only 10 iterations are needed to go that high
 > too, but in generall over 50 or so).  At this point copy_user_from_nmi
 > seems to account for over 90% of the nmi latency.

The solution to that is to avoid having to call copy_from_user_nmi() for
every instruction.

Since we already limit the max basic block size, we can easily
pre-allocate a piece of memory to copy the entire thing into in one
go.

Don reported this test result:

 > Your patch made a huge difference in improvement.  The
 > copy_from_user_nmi() no longer hits the million of cycles.  I still
 > have a batch of 100,000-300,000 cycles.  My longest NMI paths used
 > to be dominated by copy_from_user_nmi, now it is not (I have to dig
 > up the new hot path).
Reported-and-tested-by: NDon Zickus <dzickus@redhat.com>
Cc: jmario@redhat.com
Cc: acme@infradead.org
Cc: dave.hansen@linux.intel.com
Cc: eranian@google.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131016105755.GX10651@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

9536c8d2

15 10月, 2013 2 次提交

KVM: Enable pvspinlock after jump_label_init() to avoid VM hang · 3dbef3e3

由 Raghavendra K T 提交于 10月 09, 2013

We use jump label to enable pv-spinlock. With the changes in (442e0973
Merge branch 'x86/jumplabel'), the jump label behaviour has changed
that would result in eventual hang of the VM since we would end up in a
situation where slow path locks would halt the vcpus but we will not be
able to wakeup the vcpu by lock releaser using unlock kick.

Similar problem in Xen and more detailed description is available in
a945928e (xen: Do not enable spinlocks before jump_label_init()
has executed)

This patch splits kvm_spinlock_init to separate jump label changes with
pvops patching and also make jump label enabling after jump_label_init().
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NGleb Natapov <gleb@redhat.com>

3dbef3e3

x86: Update UV3 hub revision ID · dd3c9c4b

由 Russ Anderson 提交于 10月 14, 2013

The UV3 hub revision ID is different than expected. The first
revision was supposed to start at 1 but instead will start at 0.
Signed-off-by: NRuss Anderson <rja@sgi.com>
Cc: <stable@kernel.org> # v3.9, v3.10, v3.11
Link: http://lkml.kernel.org/r/20131014161733.GA6274@sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

dd3c9c4b

09 10月, 2013 1 次提交

x86, msr: Use file_inode(), not f_mapping->host · 12249873

由 Andre Richter 提交于 10月 08, 2013

As discussed in [1], exchange f_mapping->host with file_inode().  This
is a bug, but happens to be non-manifest in this case.

[1] http://lkml.kernel.org/r/20131007190357.GA13318@ZenIV.linux.org.ukSigned-off-by: NAndre Richter <andre.o.richter@gmail.com>
Link: http://lkml.kernel.org/r/1381224142-3267-1-git-send-email-andre.o.richter@gmail.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

12249873

06 10月, 2013 1 次提交

x86/reboot: Add reboot quirk for Dell Latitude E5410 · 8412da75

由 Ville Syrjälä 提交于 10月 04, 2013

Dell Latitude E5410 needs reboot=pci to actually reboot.
Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Link: http://lkml.kernel.org/r/1380888964-14517-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8412da75

04 10月, 2013 3 次提交

perf/x86: Suppress duplicated abort LBR records · b7af41a1

由 Andi Kleen 提交于 9月 20, 2013

Haswell always give an extra LBR record after every TSX abort.
Suppress the extra record.

This only works when the abort is visible in the LBR
If the original abort has already left the 16 LBR entries
the extra entry will will stay.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-7-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

b7af41a1

perf/x86: Add Haswell specific transaction flag reporting · a405bad5

由 Andi Kleen 提交于 9月 20, 2013

In the PEBS handler report the transaction flags using the new
generic transaction flags facility. Most of them come from
the "tsx_tuning" field in PEBSv2, but the abort code is derived
from the RAX register reported in the PEBS record.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379688044-14173-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

a405bad5

perf/x86: Clean up cap_user_time* setting · d8b11a0c

由 Peter Zijlstra 提交于 10月 03, 2013

Currently the cap_user_time_zero capability has different tests than
cap_user_time; even though they expose the exact same data.

Switch from CONSTANT && NONSTOP to sched_clock_stable to also deal
with multi cabinet machines and drop the tsc_disabled() check.. non of
this will work sanely without tsc anyway.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-nmgn0j0muo1r4c94vlfh23xy@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d8b11a0c

03 10月, 2013 1 次提交

x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning · 29d274b8

由 David Herrmann 提交于 10月 02, 2013

IORESOURCE_BUSY is used to mark temporary driver mem-resources
instead of global regions. This suppresses warnings if regions
overlap with a region marked as BUSY.

This was always the case for VESA/VGA/EFI framebuffer regions so
do the same for simplefb regions. The reason we do this is to
allow device handover to real GPU drivers like
i915/radeon/nouveau which get the same regions via PCI BARs.

Maybe at some point we will be able to unregister platform
devices properly during the handover. In this case the simplefb
region would get removed before the new region is created.
However, this is currently not the case and would require rather
huge changes in remove_conflicting_framebuffers(). Add the BUSY
marker now and try to eventually rewrite the handover for a next release.

Also see kernel/resource.c for more information:

/*
* if a resource is "BUSY", it's not a hardware resource
* but a driver mapping of such a resource; we don't want
* to warn for those; some drivers legitimately map only
* partial hardware resources. (example: vesafb)
*/

This suppresses warnings like:

------------[ cut here ]------------
WARNING: CPU: 2 PID: 199 at arch/x86/mm/ioremap.c:171 __ioremap_caller+0x2e3/0x390()
Info: mapping multiple BARs. Your kernel is fine.
Call Trace:
dump_stack+0x54/0x8d
warn_slowpath_common+0x7d/0xa0
warn_slowpath_fmt+0x4c/0x50
iomem_map_sanity_check+0xac/0xe0
__ioremap_caller+0x2e3/0x390
ioremap_wc+0x32/0x40
i915_driver_load+0x670/0xf50 [i915]
...
Reported-by: NTom Gundersen <teg@jklm.no>
Tested-by: NTom Gundersen <teg@jklm.no>
Tested-by: NPavel Roskin <proski@gnu.org>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
Link: http://lkml.kernel.org/r/1380724864-1757-1-git-send-email-dh.herrmann@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

29d274b8

02 10月, 2013 1 次提交

x86/simplefb: Fix overflow causing bogus fall-back · e33a29a5

由 Tom Gundersen 提交于 10月 01, 2013

On my MacBook Air lfb_size is 4M, which makes the bitshit
overflow (to 256GB - larger than 32 bits), meaning we fall
back to efifb unnecessarily.

Cast to u64 to avoid the overflow.
Signed-off-by: NTom Gundersen <teg@jklm.no>
Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Stephen Warren <swarren@nvidia.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Link: http://lkml.kernel.org/r/1380644320-1026-1-git-send-email-teg@jklm.noSigned-off-by: NIngo Molnar <mingo@kernel.org>

e33a29a5

01 10月, 2013 2 次提交

irq: Consolidate do_softirq() arch overriden implementations · 7d65f4a6

由 Frederic Weisbecker 提交于 9月 05, 2013

All arch overriden implementations of do_softirq() share the following
common code: disable irqs (to avoid races with the pending check),
check if there are softirqs pending, then execute __do_softirq() on
a specific stack.

Consolidate the common parts such that archs only worry about the
stack switch.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@au1.ibm.com>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Andrew Morton <akpm@linux-foundation.org>

7d65f4a6

x86/boot: Further compress CPUs bootup message · a17bce4d

由 Borislav Petkov 提交于 9月 30, 2013

Turn it into (for example):

[    0.073380] x86: Booting SMP configuration:
[    0.074005] .... node   #0, CPUs:          #1   #2   #3   #4   #5   #6   #7
[    0.603005] .... node   #1, CPUs:     #8   #9  #10  #11  #12  #13  #14  #15
[    1.200005] .... node   #2, CPUs:    #16  #17  #18  #19  #20  #21  #22  #23
[    1.796005] .... node   #3, CPUs:    #24  #25  #26  #27  #28  #29  #30  #31
[    2.393005] .... node   #4, CPUs:    #32  #33  #34  #35  #36  #37  #38  #39
[    2.996005] .... node   #5, CPUs:    #40  #41  #42  #43  #44  #45  #46  #47
[    3.600005] .... node   #6, CPUs:    #48  #49  #50  #51  #52  #53  #54  #55
[    4.202005] .... node   #7, CPUs:    #56  #57  #58  #59  #60  #61  #62  #63
[    4.811005] .... node   #8, CPUs:    #64  #65  #66  #67  #68  #69  #70  #71
[    5.421006] .... node   #9, CPUs:    #72  #73  #74  #75  #76  #77  #78  #79
[    6.032005] .... node  #10, CPUs:    #80  #81  #82  #83  #84  #85  #86  #87
[    6.648006] .... node  #11, CPUs:    #88  #89  #90  #91  #92  #93  #94  #95
[    7.262005] .... node  #12, CPUs:    #96  #97  #98  #99 #100 #101 #102 #103
[    7.865005] .... node  #13, CPUs:   #104 #105 #106 #107 #108 #109 #110 #111
[    8.466005] .... node  #14, CPUs:   #112 #113 #114 #115 #116 #117 #118 #119
[    9.073006] .... node  #15, CPUs:   #120 #121 #122 #123 #124 #125 #126 #127
[    9.679901] x86: Booted up 16 nodes, 128 CPUs

and drop useless elements.

Change num_digits() to hpa's division-avoiding, cell-phone-typed
version which he went at great lengths and pains to submit on a
Saturday evening.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20130930095624.GB16383@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

a17bce4d

28 9月, 2013 2 次提交

perf/x86: Fix PMU detection printout when no PMU is detected · 8a3da6c7

由 Ingo Molnar 提交于 9月 28, 2013

Ran into this cryptic PMU bootup log recently:

[    0.124047] Performance Events:
[    0.125000] smpboot: ...

Turns out we print this if no PMU is detected. Fall back to
the right condition so that the following is printed:

[    0.122381] Performance Events: no PMU driver, software events only.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-u2fwaUffakjp0qkpRfqljgsn@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

8a3da6c7

x86: Improve the printout of the SMP bootup CPU table · 646e29a1

由 Borislav Petkov 提交于 9月 27, 2013

As the new x86 CPU bootup printout format code maintainer, I am
taking immediate action to improve and clean (and thus indulge
my OCD) the reporting of the cores when coming up online.

Fix padding to a right-hand alignment, cleanup code and bind
reporting width to the max number of supported CPUs on the
system, like this:

 [    0.074509] smpboot: Booting Node   0, Processors:      #1  #2  #3  #4  #5  #6  #7 OK
 [    0.644008] smpboot: Booting Node   1, Processors:  #8  #9 #10 #11 #12 #13 #14 #15 OK
 [    1.245006] smpboot: Booting Node   2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK
 [    1.864005] smpboot: Booting Node   3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK
 [    2.489005] smpboot: Booting Node   4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK
 [    3.093005] smpboot: Booting Node   5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK
 [    3.698005] smpboot: Booting Node   6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK
 [    4.304005] smpboot: Booting Node   7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK
 [    4.961413] Brought up 64 CPUs

and this:

 [    0.072367] smpboot: Booting Node   0, Processors:    #1 #2 #3 #4 #5 #6 #7 OK
 [    0.686329] Brought up 8 CPUs
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Libin <huawei.libin@huawei.com>
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143554.GF4422@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>

646e29a1

27 9月, 2013 1 次提交

x86/microcode/AMD: Fix patch level reporting for family 15h · accd1e82

由 Suravee Suthikulpanit 提交于 9月 29, 2010

On AMD family 14h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute
unit. The driver would skip applying the patch on core1, but it
still need to update kernel structures to reflect the proper
patch level.

The current logic is not updating the struct
ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the
/sys/devices/system/cpu/cpu1/microcode/version to report
incorrect patch level as shown below:

  $ grep . cpu?/microcode/version
  cpu0/microcode/version:0x600063d
  cpu1/microcode/version:0x6000626
  cpu2/microcode/version:0x600063d
  cpu3/microcode/version:0x6000626
  cpu4/microcode/version:0x600063d
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Cc: <bp@alien8.de>
Cc: <jacob.w.shin@gmail.com>
Cc: <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

accd1e82

25 9月, 2013 5 次提交

sched, x86: Optimize the preempt_schedule() call · 1a338ac3

由 Peter Zijlstra 提交于 8月 14, 2013

Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

1a338ac3

sched, x86: Provide a per-cpu preempt_count implementation · c2daa3be

由 Peter Zijlstra 提交于 8月 14, 2013

Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c2daa3be

sched: Extract the basic add/sub preempt_count modifiers · bdb43806

由 Peter Zijlstra 提交于 9月 10, 2013

Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:

  __preempt_count_add()
  __preempt_count_sub()

and the new:

  __preempt_count_dec_and_test()

And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.

Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

bdb43806

sched, idle: Fix the idle polling state logic · ea811747

由 Peter Zijlstra 提交于 9月 11, 2013

Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.
Reported-by: NMike Galbraith <bitbucket@online.de>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

ea811747

x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround · 7a20c2fa

由 Dave Jones 提交于 9月 24, 2013

This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.
Signed-off-by: NDave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7a20c2fa

23 9月, 2013 2 次提交

x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically · 4f0acd31

由 Masoud Sharbiani 提交于 9月 20, 2013

Dell PowerEdge C6100 machines fail to completely reboot about 20% of the time.
Signed-off-by: NMasoud Sharbiani <msharbiani@twitter.com>
Signed-off-by: NVinson Lee <vlee@twitter.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1379717947-18042-1-git-send-email-vlee@freedesktop.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4f0acd31

perf/x86/intel: Add model number for Avoton Silvermont · cf3b425d

由 Yan, Zheng 提交于 9月 22, 2013

Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Cc: a.p.zijlstra@chello.nl
Cc: eranian@google.com
Cc: ak@linux.intel.com
Link: http://lkml.kernel.org/r/1379837953-17755-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cf3b425d

20 9月, 2013 4 次提交

perf/x86/intel: Fix build warning in intel_pmu_drain_pebs_nhm() · 92519bbc

由 Peter Zijlstra 提交于 9月 20, 2013

Fengguang Wu reported this build warning:

arch/x86/kernel/cpu/perf_event_intel_ds.c: In function 'intel_pmu_drain_pebs_nhm':
arch/x86/kernel/cpu/perf_event_intel_ds.c:964:2: warning: format '%ld' expects argument of type 'long int', but argument 4 has type 'int'

Because pointer arithmetics result type is bitness dependent there's no natural
type to use here, cast it to long.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-jbpauwxJqtf24luewcsdFith@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

92519bbc

perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page' · fa731587

由 Peter Zijlstra 提交于 9月 19, 2013

Solve the problems around the broken definition of perf_event_mmap_page::
cap_usr_time and cap_usr_rdpmc fields which used to overlap, partially
fixed by:

  860f085b ("perf: Fix broken union in 'struct perf_event_mmap_page'")

The problem with the fix (merged in v3.12-rc1 and not yet released
officially), noticed by Vince Weaver is that the new behavior is
not detectable by new user-space, and that due to the reuse of the
field names it's easy to mis-compile a binary if old headers are used
on a new kernel or new headers are used on an old kernel.

To solve all that make this change explicit, detectable and self-contained,
by iterating the ABI the following way:

 - Always clear bit 0, and rename it to usrpage->cap_bit0, to at least not
   confuse old user-space binaries. RDPMC will be marked as unavailable
   to old binaries but that's within the ABI, this is a capability bit.

 - Rename bit 1 to ->cap_bit0_is_deprecated and always set it to 1, so new
   libraries can reliably detect that bit 0 is deprecated and perma-zero
   without having to check the kernel version.

 - Use bits 2, 3, 4 for the newly defined, correct functionality:

	cap_user_rdpmc		: 1, /* The RDPMC instruction can be used to read counts */
	cap_user_time		: 1, /* The time_* fields are used */
	cap_user_time_zero	: 1, /* The time_zero field is used */

 - Rename all the bitfield names in perf_event.h to be different from the
   old names, to make sure it's not possible to mis-compile it
   accidentally with old assumptions.

The 'size' field can then be used in the future to add new fields and it
will act as a natural ABI version indicator as well.

Also adjust tools/perf/ userspace for the new definitions, noticed by
Adrian Hunter.
Reported-by: NVince Weaver <vincent.weaver@maine.edu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Also-Fixed-by: NAdrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/n/tip-zr03yxjrpXesOzzupszqglbv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

fa731587

perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group() · 73c4427c

由 Yan, Zheng 提交于 9月 17, 2013

uncore_validate_group() can't call smp_processor_id() because it is
in preemptible context. Pass NUMA_NO_NODE to the allocator instead.
Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1379400493-11505-1-git-send-email-zheng.z.yan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

73c4427c

perf/x86/intel: Remove division from the intel_pmu_drain_pebs_nhm() hot path · eb8417aa

由 Peter Zijlstra 提交于 9月 16, 2013

Only do the division in case we have to print the result out in a warning.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-43nl31erfbajwpfj254f6zji@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

eb8417aa

14 9月, 2013 1 次提交

perf/x86/intel: Mark MEM_LOAD_UOPS_MISS_RETIRED as precise on SNB · 9d8e3f96

由 Stephane Eranian 提交于 9月 13, 2013

On Intel SNB (SNB, SNB-EP), the event MEM_LOAD_UOPS_MISS_RETIRED
supports PEBS. It was missing for the SNB PEBS event constraint
table thereby preventing any measurement with PEBS for it.

This patch adds the event to the PEBS table for SNB.

WARNING: it should be noted that this event like a few others
are subject to the erratum BT241 for Xeon E5 (SNB-EP). As such,
the event may undercount when used with PEBS unless the
workaround is implemented. But without this patch and just the
workaround, the kernel would not allow precise sampling on this
event. BT241 is documented in:

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdfSigned-off-by: NStephane Eranian <eranian@google.com>
Cc: peterz@infradead.org
Cc: ak@linux.intel.com
Cc: zheng.z.yan@intel.com
Link: http://lkml.kernel.org/r/20130913201646.GA23981@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9d8e3f96

13 9月, 2013 4 次提交

perf/x86/intel: Clean up EVENT_ATTR_STR() muck · 7f2ee91f

由 Ingo Molnar 提交于 9月 12, 2013

Make the code a bit more readable by removing stray whitespaces et al.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/n/tip-lzEnychz1ylqy8zjenxOmeht@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f2ee91f

perf/x86/intel: Clean-up/reduce PEBS code · d2beea4a

由 Peter Zijlstra 提交于 9月 12, 2013

Get rid of some pointless duplication introduced by the Haswell code.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-8q6y4davda9aawwv5yxe7klp@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

d2beea4a

perf/x86/intel: Clean up checkpoint-interrupt bits · 2b9e344d

由 Peter Zijlstra 提交于 9月 12, 2013

Clean up the weird CP interrupt exception code by keeping a CP mask.

Andi suggested this implementation but weirdly didn't actually
implement it himself, do so now because it removes the conditional in
the interrupt handler and avoids the assumption its only on cnt2.
Suggested-by: NAndi Kleen <andi@firstfloor.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-dvb4q0rydkfp00kqat4p5bah@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

2b9e344d

perf/x86/intel: Add Haswell TSX event aliases · 4b2c4f1f

由 Andi Kleen 提交于 9月 05, 2013

Add TSX event aliases, and export them from the kernel to perf.

These are used by perf stat -T and to allow
more user friendly access to events. The events are designed to
be fairly generic and may also apply to other architectures
implementing HTM. They all cover common situations that
happens during tuning of transactional code.

For Haswell we have to separate the HLE and RTM events,
as they are separate in the PMU.

This adds the following events:

tx-start Count start transaction (used by perf stat -T)
tx-commit Count commit of transaction
tx-abort Count all aborts
tx-conflict Count aborts due to conflict with another CPU.
tx-capacity Count capacity aborts (transaction too large)

Then matching el-* events for HLE

cycles-t Transactional cycles (used by perf stat -T)
* also exists on POWER8

cycles-ct Transactional cycles commited (used by perf stat -T)
* according to Michael Ellerman POWER8 has a cycles-transactional-committed,
* perf stat -T handles both cases

Note for useful abort profiling often precise has to be set,
as Haswell can only report the point inside the transaction
with precise=2.

For some classes of aborts, like conflicts, this is not needed,
as it makes more sense to look at the complete critical section.

This gives a clean set of generalized events to examine transaction
success and aborts. Haswell has additional events for TSX, but those
are more specialized for very specific situations.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1378438661-24765-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4b2c4f1f

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功