提交 · a21ca2cac582886a3e95c8bb84ff7c52d4d15e54 · openeuler / raspberrypi-kernel

06 6月, 2009 1 次提交

perf_counter: Separate out attr->type from attr->config · a21ca2ca

由 Ingo Molnar 提交于 6月 06, 2009

Counter type is a frequently used value and we do a lot of
bit juggling by encoding and decoding it from attr->config.

Clean this up by creating a separate attr->type field.

Also clean up the various similarly complex user-space bits
all around counter attribute management.

The net improvement is significant, and it will be easier
to add a new major type (which is what triggered this cleanup).

(This changes the ABI, all tools are adapted.)
(PowerPC build-tested.)

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a21ca2ca

05 6月, 2009 1 次提交

x86: Set context.vdso before installing the mapping · f7b6eb3f

由 Peter Zijlstra 提交于 6月 05, 2009

In order to make arch_vma_name() work from inside
install_special_mapping() we need to set the context.vdso
before calling it.

( This is needed for performance counters to be able to track
  this special executable area. )
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f7b6eb3f

04 6月, 2009 1 次提交

perf_counter: Fix throttling lock-up · 128f048f

由 Ingo Molnar 提交于 6月 03, 2009

Throttling logic is broken and we can lock up with too small
hw sampling intervals.

Make the throttling code more robust: disable counters even
if we already disabled them.

( Also clean up whitespace damage i noticed while reading
  various pieces of code related to throttling. )

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

128f048f

03 6月, 2009 5 次提交

perf_counter/x86: Remove the IRQ (non-NMI) handling bits · a3288106

由 Yong Wang 提交于 6月 03, 2009

Remove the IRQ (non-NMI) handling bits as NMI will be used always.
Signed-off-by: NYong Wang <yong.y.wang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090603051255.GA2791@ywang-moblin2.bj.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a3288106

perf_counter: Rename perf_counter_hw_event => perf_counter_attr · 0d48696f

由 Peter Zijlstra 提交于 6月 02, 2009

The structure isn't hw only and when I read event, I think about those
things that fall out the other end. Rename the thing.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Stephane Eranian <eranian@googlemail.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0d48696f

perf_counter: x86: Emulate longer sample periods · e4abb5d4

由 Peter Zijlstra 提交于 6月 02, 2009

Do as Power already does, emulate sample periods up to 2^63-1 by
composing them of smaller values limited by hardware capabilities.
Only once we wrap the software period do we generate an overflow
event.

Just 10 lines of new code.
Reported-by: NStephane Eranian <eranian@googlemail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e4abb5d4

perf_counter: Remove the last nmi/irq bits · 8a016db3

由 Peter Zijlstra 提交于 6月 02, 2009

IRQ (non-NMI) sampling is not used anymore - remove the last few bits.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

8a016db3

perf_counter: Rename various fields · b23f3325

由 Peter Zijlstra 提交于 6月 02, 2009

A few renames:

  s/irq_period/sample_period/
  s/irq_freq/sample_freq/
  s/PERF_RECORD_/PERF_SAMPLE_/
  s/record_type/sample_type/

And change both the new sample_type and read_format to u64.
Reported-by: NStephane Eranian <eranian@googlemail.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b23f3325

29 5月, 2009 2 次提交

x86: ignore VM_LOCKED when determining if hugetlb-backed page tables can be shared or not · 32b154c0

由 Mel Gorman 提交于 5月 28, 2009

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13302

On x86 and x86-64, it is possible that page tables are shared beween
shared mappings backed by hugetlbfs.  As part of this,
page_table_shareable() checks a pair of vma->vm_flags and they must match
if they are to be shared.  All VMA flags are taken into account, including
VM_LOCKED.

The problem is that VM_LOCKED is cleared on fork().  When a process with a
shared memory segment forks() to exec() a helper, there will be shared
VMAs with different flags.  The impact is that the shared segment is
sometimes considered shareable and other times not, depending on what
process is checking.

What happens is that the segment page tables are being shared but the
count is inaccurate depending on the ordering of events.  As the page
tables are freed with put_page(), bad pmd's are found when some of the
children exit.  The hugepage counters also get corrupted and the Total and
Free count will no longer match even when all the hugepage-backed regions
are freed.  This requires a reboot of the machine to "fix".

This patch addresses the problem by comparing all flags except VM_LOCKED
when deciding if pagetables should be shared or not for hugetlbfs-backed
mapping.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <starlight@binnacle.cx>
Cc: Eric B Munson <ebmunson@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32b154c0

perf_counter/x86: Always use NMI for performance-monitoring interrupt · c323d95f

由 Yong Wang 提交于 5月 29, 2009

Always use NMI for performance-monitoring interrupt as there could be
racy situations if we switch between irq and nmi mode frequently.
Signed-off-by: NYong Wang <yong.y.wang@intel.com>
LKML-Reference: <20090529052835.GA13657@ywang-moblin2.bj.intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c323d95f

27 5月, 2009 5 次提交

x86: avoid back to back on_each_cpu in cpa_flush_array · 2171787b

由 Pallipadi, Venkatesh 提交于 5月 26, 2009

Cleanup cpa_flush_array() to avoid back to back on_each_cpu() calls.

[ Impact: optimizes fix 0af48f42 ]
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

2171787b

[CPUFREQ] powernow-k8: determine exact CPU frequency for HW Pstates · ca446d06

由 Andreas Herrmann 提交于 4月 22, 2009

Slightly modified by trenn@suse.de -> only do this on fam 10h and fam 11h.

Currently powernow-k8 determines CPU frequency from ACPI PSS objects, but
according to AMD family 11h BKDG this frequency is just a rounded value:

  "CoreFreq (MHz) = The CPU COF specified by MSRC001_00[6B:64][CpuFid]
  rounded to the nearest 100 Mhz."

As a consequnce powernow-k8 reports wrong CPU frequency on some systems,
e.g. on Turion X2 Ultra:

  powernow-k8: Found 1 AMD Turion(tm)X2 Ultra DualCore Mobile ZM-82
               processors (2 cpu cores) (version 2.20.00)
  powernow-k8:    0 : pstate 0 (2200 MHz)
  powernow-k8:    1 : pstate 1 (1100 MHz)
  powernow-k8:    2 : pstate 2 (600 MHz)

But this is wrong as frequency for Pstate2 is 550 MHz. x86info reports it
correctly:

  #x86info -a |grep Pstate
  ...
  Pstate-0: fid=e, did=0, vid=24 (2200MHz)
  Pstate-1: fid=e, did=1, vid=30 (1100MHz)
  Pstate-2: fid=e, did=2, vid=3c (550MHz) (current)

Solution is to determine the frequency directly from Pstate MSRs instead
of using rounded values from ACPI table.
Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NThomas Renninger <trenn@suse.de>
Signed-off-by: NDave Jones <davej@redhat.com>

ca446d06

[CPUFREQ] powernow-k8 cleanup msg if BIOS does not export ACPI _PSS cpufreq data · df182977

由 Thomas Renninger 提交于 4月 22, 2009

- Make the message shorter and easier to grep for
- Use printk_once instead of WARN_ONCE (functionality of these was mixed)
Signed-off-by: NThomas Renninger <trenn@suse.de>
Cc: Langsdorf, Mark <mark.langsdorf@amd.com>
Signed-off-by: NDave Jones <davej@redhat.com>

df182977

[CPUFREQ] powernow-k7 build fix when ACPI=n · d38e73e8

由 Dave Jones 提交于 4月 23, 2009

arch/x86/kernel/cpu/cpufreq/powernow-k7.c:172: warning: 'invalidate_entry' defined but not used
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NDave Jones <davej@redhat.com>

d38e73e8

[CPUFREQ] add atom family to p4-clockmod · 43195037

由 Jarod Wilson 提交于 3月 06, 2009

Some atom procs don't do freq scaling (such as the atom 330 on my own
littlefalls2 board). By adding the atom family here, we at least get
the benefit of passive cooling in a thermal emergency. Not sure how
to see that its actually helping any, but the driver does bind and
claim its functioning on my atom 330.
Signed-off-by: NJarod Wilson <jarod@redhat.com>
Signed-off-by: NDave Jones <davej@redhat.com>

43195037

26 5月, 2009 9 次提交

perf_counter, x86: Make NMI lockups more robust · aaba9801

由 Ingo Molnar 提交于 5月 26, 2009

We have a debug check that detects stuck NMIs and returns with
the PMU disabled in the global ctrl MSR - but i managed to trigger
a situation where this was not enough to deassert the NMI.

So clear/reset the full PMU and keep the disable count balanced when
exiting from here. This way the box produces a debug warning but
stays up and is more debuggable.

[ Impact: in case of PMU related bugs, recover more gracefully ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

aaba9801

perf_counter, x86: Fix APIC NMI programming · 79202ba9

由 Ingo Molnar 提交于 5月 26, 2009

My Nehalem box locks up in certain situations (with an
always-asserted NMI causing a lockup) if the PMU LVT
entry is programmed between NMI and IRQ mode with a
high frequency.

Standardize exlusively on NMIs instead.

[ Impact: fix lockup ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

79202ba9

x86, relocs: ignore R_386_NONE in kernel relocation entries · 46176b4f

由 Tejun Heo 提交于 5月 26, 2009

For relocatable 32bit kernels, boot/compressed/relocs.c processes
relocation entries in the kernel image and appends it to the kernel
image such that boot/compressed/head_32.S can relocate the kernel.
The kernel image is one statically linked object and only uses two
relocation types - R_386_PC32 and R_386_32, of the two only the latter
needs massaging during kernel relocation and thus handled by relocs.
R_386_PC32 is ignored and all other relocation types are considered
error.

When the target of a relocation resides in a discarded section,
binutils doesn't throw away the relocation record but nullifies it by
changing it to R_386_NONE, which unfortunately makes relocs fail.

The problem was triggered by yet out-of-tree x86 stack unwind patches
but given the binutils behavior, ignoring R_386_NONE is the right
thing to do.

The problem has been tracked down to binutils behavior by Jan Beulich.

[ Impact: fix build with certain binutils by ignoring R_386_NONE ]
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <4A1B8150.40702@kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

46176b4f

Revert "perf_counter, x86: speed up the scheduling fast-path" · 53b441a5

由 Ingo Molnar 提交于 5月 25, 2009

This reverts commit b68f1d2e.

It is causing problems (stuck/stuttering profiling) - when mixed
NMI and non-NMI counters are used.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.703093461@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

53b441a5

perf_counter: Generic per counter interrupt throttle · a78ac325

由 Peter Zijlstra 提交于 5月 25, 2009

Introduce a generic per counter interrupt throttle.

This uses the perf_counter_overflow() quick disable to throttle a specific
counter when its going too fast when a pmu->unthrottle() method is provided
which can undo the quick disable.

Power needs to implement both the quick disable and the unthrottle method.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.703093461@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a78ac325

perf_counter: x86: Remove interrupt throttle · 48e22d56

由 Peter Zijlstra 提交于 5月 25, 2009

remove the x86 specific interrupt throttle
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.616671838@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

48e22d56

perf_counter: x86: Expose INV and EDGE bits · ff99be57

由 Peter Zijlstra 提交于 5月 25, 2009

Expose the INV and EDGE bits of the PMU to raw configs.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
LKML-Reference: <20090525153931.494709027@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ff99be57

KVM: Fix PDPTR reloading on CR4 writes · a2edf57f

由 Avi Kivity 提交于 5月 24, 2009

The processor is documented to reload the PDPTRs while in PAE mode if any
of the CR4 bits PSE, PGE, or PAE change.  Linux relies on this
behaviour when zapping the low mappings of PAE kernels during boot.

The code already handled changes to CR4.PAE; augment it to also notice changes
to PSE and PGE.

This triggered while booting an F11 PAE kernel; the futex initialization code
runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem
ended up uninitialized, killing PI futexes and pulseaudio which uses them.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

a2edf57f

KVM: Make paravirt tlb flush also reload the PAE PDPTRs · a8cd0244

由 Avi Kivity 提交于 5月 24, 2009

The paravirt tlb flush may be used not only to flush TLBs, but also
to reload the four page-directory-pointer-table entries, as it is used
as a replacement for reloading CR3.  Change the code to do the entire
CR3 reloading dance instead of simply flushing the TLB.

Cc: stable@kernel.org
Signed-off-by: NAvi Kivity <avi@redhat.com>

a8cd0244

25 5月, 2009 1 次提交

x86: Remove remap percpu allocator for the time being · 71c9d8b6

由 Tejun Heo 提交于 5月 25, 2009

Remap percpu allocator has subtle bug when combined with page
attribute changing.  Remap percpu allocator aliases PMD pages for the
first chunk and as pageattr doesn't know about the alias it ends up
updating page attributes of the original mapping thus leaving the
alises in inconsistent state which might lead to subtle data
corruption.  Please read the following threads for more information:

  http://thread.gmane.org/gmane.linux.kernel/835783

The following is the proposed fix which teaches pageattr about percpu
aliases.

  http://thread.gmane.org/gmane.linux.kernel/837157

However, the above changes are deemed too pervasive for upstream
inclusion for 2.6.30 release, so this patch essentially disables
the remap allocator for the time being.
Signed-off-by: NTejun Heo <tj@kernel.org>
LKML-Reference: <4A1A0A27.4050301@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

71c9d8b6

23 5月, 2009 4 次提交

x86: cpa_flush_array wbinvd should be done on all CPUs · 0af48f42

由 venkatesh.pallipadi@intel.com 提交于 5月 22, 2009

cpa_flush_array seems to prefer wbinvd() over clflush at 4M threshold.
clflush needs to be done on only one CPU as per instruction definition.
wbinvd() however, should be done on all CPUs.

[ Impact: fix missing flush which could cause data corruption ]
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

0af48f42

x86: bugfix wbinvd() model check instead of family check · 0b827537

由 venkatesh.pallipadi@intel.com 提交于 5月 22, 2009

wbinvd is supported on all CPUs 486 or later. But,
pageattr.c is checking x86_model >= 4 before wbinvd(), which looks like
an oversight bug. It was first introduced at one place by changeset
d7c8f21a and got copied over to second
place in the same file later.

[ Impact: fix missing cache flush on early-model CPUs, potential data corruption ]
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

0b827537

x86: introduce noxsave boot parameter · 0c752a93

由 Suresh Siddha 提交于 5月 22, 2009

Introduce "noxsave" boot parameter which will disable the cpu's xsave/xrstor
capabilities. Useful for debugging and working around xsave related issues.

[ Impact: make it possible to debug problems in the field ]
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

0c752a93

x86, setup: revert ACPI 3 E820 extended attributes support · bca23dba

由 H. Peter Anvin 提交于 5月 21, 2009

Remove ACPI 3 E820 extended memory attributes support.  At least one
vendor actively set all the flags to zero, but left ECX on return at
24.  This bug may be present in other BIOSes.

The breakage functionally means the ACPI 3 flags are probably
completely useless, and that no OS any time soon is going to rely on
their existence.  Therefore, drop support completely.  We may want to
revisit this question in the future, if we find ourselves actually
needing the flags.

This reverts all or part of the following checkins:

     cd670599
     c549e71d

However, retain the part from the latter commit that copies e820 into
a temporary buffer; that is an unrelated BIOS workaround.  Put in a
comment to explain that part.

See https://bugzilla.redhat.com/show_bug.cgi?id=499396 for some
additional information.

[ Impact: detect all memory on affected machines ]
Reported-by: NThomas J. Baker <tjb@unh.edu>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
Acked-by: NLen Brown <len.brown@intel.com>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Kyle McMartin <kmcmartin@redhat.com>
Cc: Matt Domsch <matt_domsch@dell.com>

bca23dba

22 5月, 2009 2 次提交

perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34

由 Paul Mackerras 提交于 5月 22, 2009

This replaces the struct perf_counter_context in the task_struct with
a pointer to a dynamically allocated perf_counter_context struct.  The
main reason for doing is this is to allow us to transfer a
perf_counter_context from one task to another when we do lazy PMU
switching in a later patch.

This has a few side-benefits: the task_struct becomes a little smaller,
we save some memory because only tasks that have perf_counters attached
get a perf_counter_context allocated for them, and we can remove the
inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
end up recompiling nearly everything whenever perf_counter.h changes.

The perf_counter_context structures are reference-counted and freed
when the last reference is dropped.  A context can have references
from its task and the counters on its task.  Counters can outlive the
task so it is possible that a context will be freed well after its
task has exited.

Contexts are allocated on fork if the parent had a context, or
otherwise the first time that a per-task counter is created on a task.
In the latter case, we set the context pointer in the task struct
locklessly using an atomic compare-and-exchange operation in case we
raced with some other task in creating a context for the subject task.

This also removes the task pointer from the perf_counter struct.  The
task pointer was not used anywhere and would make it harder to move a
context from one task to another.  Anything that needed to know which
task a counter was attached to was already using counter->ctx->task.

The __perf_counter_init_context function moves up in perf_counter.c
so that it can be called from find_get_context, and now initializes
the refcount, but is otherwise unchanged.

We were potentially calling list_del_counter twice: once from
__perf_counter_exit_task when the task exits and once from
__perf_counter_remove_from_context when the counter's fd gets closed.
This adds a check in list_del_counter so it doesn't do anything if
the counter has already been removed from the lists.

Since perf_counter_task_sched_in doesn't do anything if the task doesn't
have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
__perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
the case where the current task adds the first counter to itself and
thus creates a context for itself.

This also adds similar code to __perf_counter_enable to handle a
similar situation which can arise when the counters have been disabled
using prctl; that also leaves cpuctx->task_ctx = NULL.

[ Impact: refactor counter context management to prepare for new feature ]
Signed-off-by: NPaul Mackerras <paulus@samba.org>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a63eaf34

x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot · 88dff493

由 Zhang Rui 提交于 5月 22, 2009

x86: DMI match for the Sony VGN-Z540N as it needs BIOS reboot,
see:

  http://bugzilla.kernel.org/show_bug.cgi?id=12901

[ Impact: fix hung reboot on certain systems ]
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
LKML-Reference: <1242963350.32574.53.camel@rzhang-dt>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88dff493

21 5月, 2009 1 次提交

perf_counter: Fix context removal deadlock · 34adc806

由 Ingo Molnar 提交于 5月 20, 2009

Disable the PMU globally before removing a counter from a
context. This fixes the following lockup:

[22081.741922] ------------[ cut here ]------------
[22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e()
[22081.755624] Hardware name: X8DTN
[22081.758903] perfcounters: irq loop stuck!
[22081.762985] Modules linked in:
[22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226
[22081.772432] Call Trace:
[22081.774940]  <NMI>  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.781993]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.788368]  [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3
[22081.794649]  [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45
[22081.800696]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
[22081.807080]  [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a
[22081.813751]  [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86
[22081.819951]  [<ffffffff8105b250>] ? notify_die+0x2d/0x32
[22081.825392]  [<ffffffff814d1414>] ? do_nmi+0x8e/0x242
[22081.830538]  [<ffffffff814d0f0a>] ? nmi+0x1a/0x20
[22081.835342]  [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a
[22081.842105]  [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41
[22081.848673]  <<EOE>>  [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103
[22081.855512]  [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe
[22081.862926]  [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce
[22081.868909]  [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe
[22081.876382]  [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6
[22081.882955]  [<ffffffff81091b96>] ? perf_release+0x86/0x14c
[22081.888600]  [<ffffffff810c4c84>] ? __fput+0xe7/0x195
[22081.893718]  [<ffffffff810c213e>] ? filp_close+0x5b/0x62
[22081.899107]  [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2
[22081.905031]  [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef
[22081.910360]  [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe
[22081.916292]  [<ffffffff8104898e>] ? do_group_exit+0x67/0x93
[22081.921953]  [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16
[22081.927759]  [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b
[22081.934076] ---[ end trace 3a3936ce3e1b4505 ]---

And could potentially also fix the lockup reported by Marcelo Tosatti.

Also, print more debug info in case of a detected lockup.

[ Impact: fix lockup ]
Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

34adc806

18 5月, 2009 1 次提交

perf_counter, x86: speed up the scheduling fast-path · b68f1d2e

由 Ingo Molnar 提交于 5月 17, 2009

We have to set up the LVT entry only at counter init time, not at
every switch-in time.

There's friction between NMI and non-NMI use here - we'll probably
remove the per counter configurability of it - but until then, dont
slow down things ...

[ Impact: micro-optimization ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b68f1d2e

17 5月, 2009 1 次提交

perf_counter, x86: fix zero irq_period counters · d2517a49

由 Ingo Molnar 提交于 5月 17, 2009

The quirk to irq_period unearthed an unrobustness we had in the
hw_counter initialization sequence: we left irq_period at 0, which
was then quirked up to 2 ... which then generated a _lot_ of
interrupts during 'perf stat' runs, slowed them down and skewed
the counter results in general.

Initialize irq_period to the maximum instead.

[ Impact: fix perf stat results ]

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d2517a49

16 5月, 2009 1 次提交

x86: Fix performance regression caused by paravirt_ops on native kernels · b4ecc126

由 Jeremy Fitzhardinge 提交于 5月 13, 2009

Xiaohui Xin and some other folks at Intel have been looking into what's
behind the performance hit of paravirt_ops when running native.

It appears that the hit is entirely due to the paravirtualized
spinlocks introduced by:

 | commit 8efcbab6
 | Date:   Mon Jul 7 12:07:51 2008 -0700
 |
 |     paravirt: introduce a "lock-byte" spinlock implementation

The extra call/return in the spinlock path is somehow
causing an increase in the cycles/instruction of somewhere around 2-7%
(seems to vary quite a lot from test to test).  The working theory is
that the CPU's pipeline is getting upset about the
call->call->locked-op->return->return, and seems to be failing to
speculate (though I haven't seen anything definitive about the precise
reasons).  This doesn't entirely make sense, because the performance
hit is also visible on unlock and other operations which don't involve
locked instructions.  But spinlock operations clearly swamp all the
other pvops operations, even though I can't imagine that they're
nearly as common (there's only a .05% increase in instructions
executed).

If I disable just the pv-spinlock calls, my tests show that pvops is
identical to non-pvops performance on native (my measurements show that
it is actually about .1% faster, but Xiaohui shows a .05% slowdown).

Summary of results, averaging 10 runs of the "mmperf" test, using a
no-pvops build as baseline:

		nopv		Pv-nospin	Pv-spin
CPU cycles	100.00%		99.89%		102.18%
instructions	100.00%		100.10%		100.15%
CPI		100.00%		99.79%		102.03%
cache ref	100.00%		100.84%		100.28%
cache miss	100.00%		90.47%		88.56%
cache miss rate	100.00%		89.72%		88.31%
branches	100.00%		99.93%		100.04%
branch miss	100.00%		103.66%		107.72%
branch miss rt	100.00%		103.73%		107.67%
wallclock	100.00%		99.90%		102.20%

The clear effect here is that the 2% increase in CPI is
directly reflected in the final wallclock time.

(The other interesting effect is that the more ops are
out of line calls via pvops, the lower the cache access
and miss rates.  Not too surprising, but it suggests that
the non-pvops kernel is over-inlined.  On the flipside,
the branch misses go up correspondingly...)

So, what's the fix?

Paravirt patching turns all the pvops calls into direct calls, so
_spin_lock etc do end up having direct calls.  For example, the compiler
generated code for paravirtualized _spin_lock is:

<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq  *0xffffffff805a5b30
<_spin_lock+22>:	retq

The indirect call will get patched to:
<_spin_lock+0>:		mov    %gs:0xb4c8,%rax
<_spin_lock+9>:		incl   0xffffffffffffe044(%rax)
<_spin_lock+15>:	callq <__ticket_spin_lock>
<_spin_lock+20>:	nop; nop		/* or whatever 2-byte nop */
<_spin_lock+22>:	retq

One possibility is to inline _spin_lock, etc, when building an
optimised kernel (ie, when there's no spinlock/preempt
instrumentation/debugging enabled).  That will remove the outer
call/return pair, returning the instruction stream to a single
call/return, which will presumably execute the same as the non-pvops
case.  The downsides arel 1) it will replicate the
preempt_disable/enable code at eack lock/unlock callsite; this code is
fairly small, but not nothing; and 2) the spinlock definitions are
already a very heavily tangled mass of #ifdefs and other preprocessor
magic, and making any changes will be non-trivial.

The other obvious answer is to disable pv-spinlocks.  Making them a
separate config option is fairly easy, and it would be trivial to
enable them only when Xen is enabled (as the only non-default user).
But it doesn't really address the common case of a distro build which
is going to have Xen support enabled, and leaves the open question of
whether the native performance cost of pv-spinlocks is worth the
performance improvement on a loaded Xen system (10% saving of overall
system CPU when guests block rather than spin).  Still it is a
reasonable short-term workaround.

[ Impact: fix pvops performance regression when running native ]
Analysed-by: N"Xin Xiaohui" <xiaohui.xin@intel.com>
Analysed-by: N"Li Xin" <xin.li@intel.com>
Analysed-by: N"Nakajima Jun" <jun.nakajima@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Xen-devel <xen-devel@lists.xensource.com>
LKML-Reference: <4A0B62F7.5030802@goop.org>
[ fixed the help text ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b4ecc126

15 5月, 2009 5 次提交

perf_counter: frequency based adaptive irq_period · 60db5e09

由 Peter Zijlstra 提交于 5月 15, 2009

Instead of specifying the irq_period for a counter, provide a target interrupt
frequency and dynamically adapt the irq_period to match this frequency.

[ Impact: new perf-counter attribute/feature ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <20090515132018.646195868@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60db5e09

kgdb,i386: use address that SP register points to in the exception frame · 33ab1979

由 Jason Wessel 提交于 2月 11, 2009

The treatment of the SP register is different on x86_64 and i386.
This is a regression fix that lived outside the mainline kernel from
2.6.27 to now.  The regression was a result of the original merge
consolidation of the i386 and x86_64 archs to x86.

The incorrectly reported SP on i386 prevented stack tracebacks from
working correctly in gdb.
Signed-off-by: NJason Wessel <jason.wessel@windriver.com>

33ab1979

perf_counter: x86: Protect against infinite loops in intel_pmu_handle_irq() · 9029a5e3

由 Ingo Molnar 提交于 5月 15, 2009

intel_pmu_handle_irq() can lock up in an infinite loop if the hardware
does not allow the acking of irqs. Alas, this happened in testing so
make this robust and emit a warning if it happens in the future.

Also, clean up the IRQ handlers a bit.

[ Impact: improve perfcounter irq/nmi handling robustness ]
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9029a5e3

perf_counter: x86: Disallow interval of 1 · 1c80f4b5

由 Ingo Molnar 提交于 5月 15, 2009

On certain CPUs i have observed a stuck PMU if interval was set to
1 and NMIs were used. The PMU had PMC0 set in MSR_CORE_PERF_GLOBAL_STATUS,
but it was not possible to ack it via MSR_CORE_PERF_GLOBAL_OVF_CTRL,
and the NMI loop got stuck infinitely.

[ Impact: fix rare hangs during high perfcounter load ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c80f4b5

perf_counter: x86: Robustify interrupt handling · a4016a79

由 Peter Zijlstra 提交于 5月 14, 2009

Two consecutive NMIs could daze and confuse the machine when the
first would handle the overflow of both counters.

[ Impact: fix false-positive syslog messages under multi-session profiling ]
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a4016a79