提交 · 54c2f3fdb941204cad136024c7b854b7ad112ab6 · openeuler / raspberrypi-kernel

07 8月, 2013 9 次提交

x86, asmlinkage, apm: Make APM data structure used from assembler visible · 54c2f3fd

由 Andi Kleen 提交于 8月 05, 2013

Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-12-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

54c2f3fd

x86, asmlinkage: Make syscall tables visible · e0e745e4

由 Andi Kleen 提交于 8月 05, 2013

They are referenced from entry*.S.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-11-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e0e745e4

x86, asmlinkage: Make several variables used from assembler/linker script visible · 277d5b40

由 Andi Kleen 提交于 8月 05, 2013

Plus one function, load_gs_index().
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-10-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

277d5b40

x86, asmlinkage: Make kprobes code visible and fix assembler code · 04bb591c

由 Andi Kleen 提交于 8月 05, 2013

- Make all the external assembler template symbols __visible
- Move the templates inline assembler code into a top level
  assembler statement, not inside a function. This avoids it being
  optimized away or cloned.

Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-8-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

04bb591c

x86, asmlinkage: Make various syscalls asmlinkage · ff49103f

由 Andi Kleen 提交于 8月 05, 2013

FWIW I suspect sys_rt_sigreturn/sys_sigreturn should use
standard SYSCALL wrappers. But I didn't do that change in this
patch.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-7-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

ff49103f

x86, asmlinkage: Make 32bit/64bit __switch_to visible · 35ea7903

由 Andi Kleen 提交于 8月 05, 2013

This function is called from inline assembler, so has to be visible.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-6-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

35ea7903

x86, asmlinkage: Make _*_start_kernel visible · a1ed4ddf

由 Andi Kleen 提交于 8月 05, 2013

Obviously these functions have to be visible, otherwise
the whole kernel could be optimized away.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-5-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

a1ed4ddf

x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible · 1d9090e2

由 Andi Kleen 提交于 8月 05, 2013

These handlers are all referenced from assembler stubs, so need
to be visible.

The handlers without arguments become asmlinkage, the others __visible
to not force regparms(0) on x86-32.

I put it all into a single patch, please let me know if you want
it it split up.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-4-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1d9090e2

x86: Fix sys_call_table type in asm/syscall.h · 1599e8fc

由 Andi Kleen 提交于 8月 05, 2013

Make the sys_call_table type defined in asm/syscall.h match
the definition in syscall_64.c

v2: include asm/syscall.h in syscall_64.c too. I left uml alone
because it doesn't have an syscall.h on its own and including
the native one leads to other errors.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1375740170-7446-2-git-send-email-andi@firstfloor.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Richard Weinberger <richard@nod.at>

1599e8fc

30 7月, 2013 1 次提交

x86/mce: Fix mce regression from recent cleanup · 1a7f0e3c

由 Tony Luck 提交于 7月 24, 2013

In commit 33d7885b
   x86/mce: Update MCE severity condition check

We simplified the rules to recognise each classification of recoverable
machine check combining the instruction and data fetch rules into a
single entry based on clarifications in the June 2013 SDM that all
recoverable events would be reported on the unaffected processor with
MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1.  Unfortunately the simplified
rule has a couple of bugs.  Fix them here.
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

1a7f0e3c

17 7月, 2013 1 次提交

x86: Make sure IDT is page aligned · 4df05f36

由 Kees Cook 提交于 7月 16, 2013

Since the IDT is referenced from a fixmap, make sure it is page aligned.
Merge with 32-bit one, since it was already aligned to deal with F00F
bug. Since bss is cleared before IDT setup, it can live there. This also
moves the other *_idt_table variables into common locations.

This avoids the risk of the IDT ever being moved in the bss and having
the mapping be offset, resulting in calling incorrect handlers. In the
current upstream kernel this is not a manifested bug, but heavily patched
kernels (such as those using the PaX patch series) did encounter this bug.

The tables other than idt_table technically do not need to be page
aligned, at least not at the current time, but using a common
declaration avoids mistakes. On 64 bits the table is exactly one page
long, anyway.
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.netReported-by: NPaX Team <pageexec@gmail.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

4df05f36

16 7月, 2013 1 次提交

x86, suspend: Handle CPUs which fail to #GP on RDMSR · 5ff560fd

由 H. Peter Anvin 提交于 7月 12, 2013

There are CPUs which have errata causing RDMSR of a nonexistent MSR to
not fault.  We would then try to WRMSR to restore the value of that
MSR, causing a crash.  Specifically, some Pentium M variants would
have this problem trying to save and restore the non-existent EFER,
causing a crash on resume.

Work around this by making sure we can write back the result at
suspend time.

Huge thanks to Christian Sünkenberg for finding the offending erratum
that finally deciphered the mystery.
Reported-and-tested-by: NJohan Heinrich <onny@project-insanity.org>
Debugged-by: NChristian Sünkenberg <christian.suenkenberg@student.kit.edu>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Link: http://lkml.kernel.org/r/51DDC972.3010005@student.kit.edu
Cc: <stable@vger.kernel.org> # v3.7+

5ff560fd

15 7月, 2013 1 次提交

x86: delete __cpuinit usage from all x86 files · 148f9bb8

由 Paul Gortmaker 提交于 6月 18, 2013

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications.  For example, the fix in
commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out.  Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit  -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings.  In any case, they are temporary and harmless.

This removes all the arch/x86 uses of the __cpuinit macros from
all C files.  x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

148f9bb8

12 7月, 2013 1 次提交

perf/x86: Fix incorrect use of do_div() in NMI warning · baf64b85

由 Dave Hansen 提交于 7月 08, 2013

I completely botched understanding the calling conventions of
do_div(). I assumed that do_div() returned the result instead
of realizing that it modifies its argument and returns a
remainder. The side-effect from this would be bogus numbers
for the "msecs" value in the warning messages:

INFO: NMI handler (perf_event_nmi_handler) took too long to run: 0.114 msecs

Note, there was a second fix posted by Stephane Eranian for
a separate patch which I also botched:

http://lkml.kernel.org/r/20130704223010.GA30625@quadSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20130708214404.B0B6EA66@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

baf64b85

10 7月, 2013 9 次提交

reboot: move arch/x86 reboot= handling to generic kernel · 1b3a5d02

由 Robin Holt 提交于 7月 08, 2013

Merge together the unicore32, arm, and x86 reboot= command line
parameter handling.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NGuan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1b3a5d02

reboot: x86: prepare reboot_mode for moving to generic kernel code · edf2b139

由 Robin Holt 提交于 7月 08, 2013

Prepare for the moving the parsing of reboot= to the generic kernel code
by making reboot_mode into a more generic form.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Miguel Boton <mboton.lkml@gmail.com>
Cc: Russ Anderson <rja@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

edf2b139

ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers · f7da04c9

由 Oleg Nesterov 提交于 7月 08, 2013

flush_ptrace_hw_breakpoint() destroys the counters set by ptrace, but
"leaks" ->debugreg6 and ->ptrace_dr7.

The problem is minor, but still it doesn't look right and flush_thread()
did this until commit 66cb5917 ("hw-breakpoints: use the new wrapper
routines to access debug registers in process/thread code").  Now that
PTRACE_DETACH does flush_ too this makes even more sense.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f7da04c9

ptrace/x86: cleanup ptrace_set_debugreg() · 61e305c7

由 Oleg Nesterov 提交于 7月 08, 2013

ptrace_set_debugreg() is trivial but looks horrible.  Kill the unnecessary
goto's and return's to cleanup the code.

This matches ptrace_get_debugreg() which also needs the trivial whitespace
cleanups.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61e305c7

ptrace/x86: ptrace_write_dr7() should create bp if !disabled · b87a95ad

由 Oleg Nesterov 提交于 7月 08, 2013

Commit 24f1e32c ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events") introduced the minor regression.  Before this
commit

	PTRACE_POKEUSER DR7, enableDR0
	PTRACE_POKEUSER DR0, address

was perfectly valid, now PTRACE_POKEUSER(DR7) fails if DR0 was not
previously initialized by PTRACE_POKEUSER(DR0).

Change ptrace_write_dr7() to do ptrace_register_breakpoint(addr => 0) if
!bp && !disabled.

This fixes watchpoint-zeroaddr from ptrace-tests, see

    https://bugzilla.redhat.com/show_bug.cgi?id=660204.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NJan Kratochvil <jan.kratochvil@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b87a95ad

ptrace/x86: introduce ptrace_register_breakpoint() · 9afe33ad

由 Oleg Nesterov 提交于 7月 08, 2013

No functional changes, preparation.

Extract the "register breakpoint" code from ptrace_get_debugreg() into
the new/generic helper, ptrace_register_breakpoint().  It will have more
users.

The patch also adds another simple helper, ptrace_fill_bp_fields(), to
factor out the arch_bp_generic_fields() logic in register/modify.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9afe33ad

ptrace/x86: dont delay "disable" till second pass in ptrace_write_dr7() · 29a55513

由 Oleg Nesterov 提交于 7月 08, 2013

ptrace_write_dr7() skips ptrace_modify_breakpoint(disabled => true)
unless second_pass, this buys nothing but complicates the code and means
that we always do the main loop twice even if "disabled" was never true.

The comment says:

	Don't unregister the breakpoints right-away,
	unless all register_user_hw_breakpoint()
	requests have succeeded.

Firstly, we do not do register_user_hw_breakpoint(), it was removed by
commit 24f1e32c ("hw-breakpoints: Rewrite the hw-breakpoints layer
on top of perf events").

We are going to restore register_user_hw_breakpoint() (see the next
patch) but this doesn't matter: after commit 44234adc
("hw-breakpoints: Modify breakpoints without unregistering them")
perf_event_disable() can not hurt, hw_breakpoint_del() does not free the
slot.

Remove the "second_pass" check from the main loop and simplify the code.
Since we have to check "bp != NULL" anyway, the patch also removes the
same check in ptrace_modify_breakpoint() and moves the comment into
ptrace_write_dr7().

With this patch the second pass is only needed to restore the saved
old_dr7.  This should never fail, so the patch adds WARN_ON() to catch
the potential problems as Frederic suggested.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29a55513

ptrace/x86: simplify the "disable" logic in ptrace_write_dr7() · e6a7d607

由 Oleg Nesterov 提交于 7月 08, 2013

ptrace_write_dr7() looks unnecessarily overcomplicated.  We can factor
out ptrace_modify_breakpoint() and do not do "continue" twice, just we
need to pass the proper "disabled" argument to
ptrace_modify_breakpoint().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6a7d607

ptrace/x86: revert "hw_breakpoints: Fix racy access to ptrace breakpoints" · 02be46fb

由 Oleg Nesterov 提交于 7月 08, 2013

This reverts commit 87dc669b ("hw_breakpoints: Fix racy access to
ptrace breakpoints").

The patch was fine but we can no longer race with SIGKILL after commit
9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race
with SIGKILL"), the __TASK_TRACED tracee can't be woken up and
->ptrace_bps[] can't go away.

The patch only removes ptrace_get_breakpoints/ptrace_put_breakpoints and
does a couple of "while at it" cleanups, it doesn't remove other changes
from the reverted commit.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

02be46fb

05 7月, 2013 1 次提交

perf/x86/amd: Do not print an error when the device is not present · 100ac533

由 Peter Zijlstra 提交于 7月 03, 2013

As Linus said its not an error to not have an AMD IOMMU; esp.
when you're not even running on an AMD platform.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
Link: http://lkml.kernel.org/r/20130703075542.GF23916@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

100ac533

04 7月, 2013 1 次提交

mm/x86: prepare for removing num_physpages and simplify mem_init() · 46a84132

由 Jiang Liu 提交于 7月 03, 2013

Prepare for removing num_physpages and simplify mem_init().
Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jianguo Wu <wujianguo@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

46a84132

02 7月, 2013 1 次提交

x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt() · 4787c368

由 Seiji Aguchi 提交于 6月 28, 2013

Reschedule vector tracepoints may be called in cpu idle state.
This causes lockdep check warning below.

The tracepoint requires rcu but for accuracy it also
requires irq_enter() (tracepoints record the irq context), thus,
the tracepoint interrupt handler should be calling irq_enter()
and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).

So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
with common pre/post processing functions, smp_entering_irq()
and exiting_irq() (exiting_irq() calls just irq_exit()
 in arch/x86/include/asm/apic.h),
because these can be shared among reschedule, call_function,
and call_function_single vectors.

[   50.720557] Testing event reschedule_exit:
[   50.721349]
[   50.721502] ===============================
[   50.721835] [ INFO: suspicious RCU usage. ]
[   50.722169] 3.10.0-rc6-00004-gcf910e83 #190 Not tainted
[   50.722582] -------------------------------
[   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
[   50.723770]
[   50.723770] other info that might help us debug this:
[   50.723770]
[   50.724385]
[   50.724385] RCU used illegally from idle CPU!
[   50.724385] rcu_scheduler_active = 1, debug_locks = 0
[   50.725232] RCU used illegally from extended quiescent state!
[   50.725690] no locks held by swapper/0/0.
[   50.726010]
[   50.726010] stack backtrace:
[...]
Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4787c368

28 6月, 2013 2 次提交

x86/tboot: Provide debugfs interfaces to access TXT log · 13bfd47a

由 Qiaowei Ren 提交于 6月 24, 2013

These logs come from tboot (Trusted Boot, an open source,
pre-kernel/VMM module that uses Intel TXT to perform a
measured and verified launch of an OS kernel/VMM.).
Signed-off-by: NQiaowei Ren <qiaowei.ren@intel.com>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Cc: Gang Wei <gang.wei@intel.com>
Link: http://lkml.kernel.org/r/1372053333-21788-1-git-send-email-qiaowei.ren@intel.com
[ Beautified the code a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

13bfd47a

x86/mce: Update MCE severity condition check · 33d7885b

由 Chen Gong 提交于 6月 20, 2013

Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.
Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

33d7885b

27 6月, 2013 3 次提交

x86, microcode, amd: Another early loading fixup · 9608d33b

由 Jacob Shin 提交于 6月 20, 2013

commit cd1c32ca is an early premature
rendition of the patch. Augment it with this delta patch to:
  * correctly mark offset and size of the matching bin file
  * use __pa instead of __pa_nodebug during AP load
  * check for !initrd_start before using it
Signed-off-by: NJacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/20130620152414.GA6676@jshin-ToonieSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

9608d33b

perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable() · 983433b5

由 Stephane Eranian 提交于 6月 21, 2013

Make sure intel_pmu_pebs_disable() and intel_pmu_pebs_enable()
are symmetrical w.r.t. PEBS-LL and precise store.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1371824448-7306-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

983433b5

perf/x86: Fix shared register mutual exclusion enforcement · 2f7f73a5

由 Stephane Eranian 提交于 6月 20, 2013

This patch fixes a problem with the shared registers mutual
exclusion code and incremental event scheduling by the
generic perf_event code.

There was a bug whereby the mutual exclusion on the shared
registers was not enforced because of incremental scheduling
abort due to event constraints. As an example on Intel
Nehalem, consider the following events:

group1= L1D_CACHE_LD:E_STATE,OFFCORE_RESPONSE_0:PF_RFO,L1D_CACHE_LD:I_STATE
group2= L1D_CACHE_LD:I_STATE

The L1D_CACHE_LD event can only be measured by 2 counters. Yet, there
are 3 instances here. The first group can be scheduled and is committed.
Then, the generic code tries to schedule group2 and this fails (because
there is no more counter to support the 3rd instance of L1D_CACHE_LD).
But in x86_schedule_events() error path, put_event_contraints() is invoked
on ALL the events and not just the ones that just failed. That causes the
"lock" on the shared offcore_response MSR to be released. Yet the first group
is actually scheduled and is exposed to reprogramming of that shared msr by
the sibling HT thread. In other words, there is no guarantee on what is
measured.

This patch fixes the problem by tagging committed events with the
PERF_X86_EVENT_COMMITTED tag. In the error path of x86_schedule_events(),
only the events NOT tagged have their constraint released. The tag
is eventually removed when the event in descheduled.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130620164254.GA3556@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

2f7f73a5

26 6月, 2013 5 次提交

perf/x86/intel: Support full width counting · 069e0c3c

由 Andi Kleen 提交于 6月 25, 2013

Recent Intel CPUs like Haswell and IvyBridge have a new
alternative MSR range for perfctrs that allows writing the full
counter width. Enable this range if the hardware reports it
using a new capability bit.

Currently the perf code queries CPUID to get the counter width,
and sign extends the counter values as needed. The traditional
PERFCTR MSRs always limit to 32bit, even though the counter
internally is larger (usually 48 bits on recent CPUs)

When the new capability is set use the alternative range which
do not have these restrictions.

This lowers the overhead of perf stat slightly because it has to
do less interrupts to accumulate the counter value. On Haswell
it also avoids some problems with TSX aborting when the end of
the counter range is reached.

( See the patch "perf/x86/intel: Avoid checkpointed counters
  causing excessive TSX aborts" for more details. )
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NStephane Eranian <eranian@google.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1372173153-20215-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

069e0c3c

x86, asm, cleanup: Replace open-coded control register values with symbolic · a3d7b7dd

由 H. Peter Anvin 提交于 4月 27, 2013

Clean up an unnecessary open-coded control register values.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-um7za1nzf6brb17o0h4om6e3@git.kernel.org

a3d7b7dd

x86, flags: Rename X86_EFLAGS_BIT1 to X86_EFLAGS_FIXED · 1adfa76a

由 H. Peter Anvin 提交于 4月 27, 2013

Bit 1 in the x86 EFLAGS is always set.  Name the macro something that
actually tries to explain what it is all about, rather than being a
tautology.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Link: http://lkml.kernel.org/n/tip-f10rx5vjjm6tfnt8o1wseb3v@git.kernel.org

1adfa76a

mce: acpi/apei: Add comments to clarify usage of the various bitfields in the MCA subsystem · 0644414e

由 Naveen N. Rao 提交于 6月 25, 2013

There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps.  Provide comments so that we all know exactly what these
are used for, and why.
Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

0644414e

x86: Fix /proc/mtrr with base/size more than 44bits · d5c78673

由 Yinghai Lu 提交于 6月 13, 2013

On one sytem that mtrr range is more then 44bits, in dmesg we have
[    0.000000] MTRR default type: write-back
[    0.000000] MTRR fixed ranges enabled:
[    0.000000]   00000-9FFFF write-back
[    0.000000]   A0000-BFFFF uncachable
[    0.000000]   C0000-DFFFF write-through
[    0.000000]   E0000-FFFFF write-protect
[    0.000000] MTRR variable ranges enabled:
[    0.000000]   0 [000080000000-0000FFFFFFFF] mask 3FFF80000000 uncachable
[    0.000000]   1 [380000000000-38FFFFFFFFFF] mask 3F0000000000 uncachable
[    0.000000]   2 [000099000000-000099FFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   3 [00009A000000-00009AFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   4 [381FFA000000-381FFBFFFFFF] mask 3FFFFE000000 write-through
[    0.000000]   5 [381FFC000000-381FFC0FFFFF] mask 3FFFFFF00000 write-through
[    0.000000]   6 [0000AD000000-0000ADFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   7 [0000BD000000-0000BDFFFFFF] mask 3FFFFF000000 write-through
[    0.000000]   8 disabled
[    0.000000]   9 disabled

but /proc/mtrr report wrong:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x80000000000 (8388608MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x81ffa000000 (8519584MB), size=   32MB, count=1: write-through
reg05: base=0x81ffc000000 (8519616MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

so bit 44 and bit 45 get cut off.

We have problems in arch/x86/kernel/cpu/mtrr/generic.c::generic_get_mtrr().
1. for base, we miss cast base_lo to 64bit before shifting.
Fix that by adding u64 casting.

2. for size, it only can handle 44 bits aka 32bits + page_shift
Fix that with 64bit mask instead of 32bit mask_lo, then range could be
more than 44bits.
At the same time, we need to update size_or_mask for old cpus that does
support cpuid 0x80000008 to get phys_addr. Need to set high 32bits
to all 1s, otherwise will not get correct size for them.

Also fix mtrr_add_page: it should check base and (base + size - 1)
instead of base and size, as base and size could be small but
base + size could bigger enough to be out of boundary. We can
use boot_cpu_data.x86_phys_bits directly to avoid size_or_mask.

So When are we going to have size more than 44bits? that is 16TiB.

after patch we have right ouput:
reg00: base=0x080000000 ( 2048MB), size= 2048MB, count=1: uncachable
reg01: base=0x380000000000 (58720256MB), size=1048576MB, count=1: uncachable
reg02: base=0x099000000 ( 2448MB), size=   16MB, count=1: write-through
reg03: base=0x09a000000 ( 2464MB), size=   16MB, count=1: write-through
reg04: base=0x381ffa000000 (58851232MB), size=   32MB, count=1: write-through
reg05: base=0x381ffc000000 (58851264MB), size=    1MB, count=1: write-through
reg06: base=0x0ad000000 ( 2768MB), size=   16MB, count=1: write-through
reg07: base=0x0bd000000 ( 3024MB), size=   16MB, count=1: write-through
reg08: base=0x09b000000 ( 2480MB), size=   16MB, count=1: write-combining

-v2: simply checking in mtrr_add_page according to hpa.

[ hpa: This probably wants to go into -stable only after having sat in
  mainline for a bit.  It is not a regression. ]
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1371162815-29931-1-git-send-email-yinghai@kernel.org
Cc: <stable@vger.kernel.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d5c78673

24 6月, 2013 1 次提交

irqdomain: Refactor irq_domain_associate_many() · ddaf144c

由 Grant Likely 提交于 6月 10, 2013

Originally, irq_domain_associate_many() was designed to unwind the
mapped irqs on a failure of any individual association. However, that
proved to be a problem with certain IRQ controllers. Some of them only
support a subset of irqs, and will fail when attempting to map a
reserved IRQ. In those cases we want to map as many IRQs as possible, so
instead it is better for irq_domain_associate_many() to make a
best-effort attempt to map irqs, but not fail if any or all of them
don't succeed. If a caller really cares about how many irqs got
associated, then it should instead go back and check that all of the
irqs is cares about were mapped.

The original design open-coded the individual association code into the
body of irq_domain_associate_many(), but with no longer needing to
unwind associations, the code becomes simpler to split out
irq_domain_associate() to contain the bulk of the logic, and
irq_domain_associate_many() to be a simple loop wrapper.

This patch also adds a new error check to the associate path to make
sure it isn't called for an irq larger than the controller can handle,
and adds locking so that the irq_domain_mutex is held while setting up a
new association.

v3: Fixup missing change to irq_domain_add_tree()
v2: Fixup x86 warning. irq_domain_associate_many() no longer returns an
    error code, but reports errors to the printk log directly. In the
    majority of cases we don't actually want to fail if there is a
    problem, but rather log it and still try to boot the system.
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

irqdomain: Fix flubbed irq_domain_associate_many refactoring

commit d39046ec72, "irqdomain: Refactor irq_domain_associate_many()" was
missing the following hunk which causes a boot failure on anything using
irq_domain_add_tree() to allocate an irq domain.
Signed-off-by: NGrant Likely <grant.likely@linaro.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Stephen Rothwell <sfr@canb.auug.org.au>

ddaf144c

23 6月, 2013 3 次提交

x86: Add NMI duration tracepoints · 0c4df02d

由 Dave Hansen 提交于 6月 21, 2013

This patch has been invaluable in my adventures finding
issues in the perf NMI handler.  I'm as big a fan of
printk() as anybody is, but using printk() in NMIs is
deadly when they're happening frequently.

Even hacking in trace_printk() ended up eating enough
CPU to throw off some of the measurements I was making.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

0c4df02d

perf: Drop sample rate when sampling is too slow · 14c63f17

由 Dave Hansen 提交于 6月 21, 2013

This patch keeps track of how long perf's NMI handler is taking,
and also calculates how many samples perf can take a second.  If
the sample length times the expected max number of samples
exceeds a configurable threshold, it drops the sample rate.

This way, we don't have a runaway sampling process eating up the
CPU.

This patch can tend to drop the sample rate down to level where
perf doesn't work very well.  *BUT* the alternative is that my
system hangs because it spends all of its time handling NMIs.

I'll take a busted performance tool over an entire system that's
busted and undebuggable any day.

BTW, my suspicion is that there's still an underlying bug here.
Using the HPET instead of the TSC is definitely a contributing
factor, but I suspect there are some other things going on.
But, I can't go dig down on a bug like that with my machine
hanging all the time.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
[ Prettified it a bit. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

14c63f17

x86: Warn when NMI handlers take large amounts of time · 2ab00456

由 Dave Hansen 提交于 6月 21, 2013

I have a system which is causing all kinds of problems.  It has
8 NUMA nodes, and lots of cores that can fight over cachelines.
If things are not working _perfectly_, then NMIs can take longer
than expected.

If we get too many of them backed up to each other, we can
easily end up in a situation where we are doing nothing *but*
running NMIs.  The biggest problem, though, is that this happens
_silently_.  You might be lucky to get an hrtimer warning, but
most of the time system simply hangs.

This patch should at least give us some warning before we fall
off the cliff.  the warnings look like this:

	nmi_handle: perf_event_nmi_handler() took: 26095071 ns

The message is triggered whenever we notice the longest NMI
we've seen to date.  You can always view and reset this value
via the debugfs interface if you like.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
Cc: acme@ghostprotocols.net
Cc: Dave Hansen <dave@sr71.net>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2ab00456