提交 · 18c20f373b76a64270a991396b06542abaf9f530 · openeuler / raspberrypi-kernel

07 6月, 2012 5 次提交

x86, MCE, AMD: Print decimal thresholding values · 18c20f37

由 Borislav Petkov 提交于 4月 27, 2012

If one sets the threshold limit, say to 25:

$ echo 25 > machinecheck0/threshold_bank4/misc0/threshold_limit

and then reads it back again, it gives

$ cat machinecheck0/threshold_bank4/misc0/threshold_limit
19

which is actually 0x19 but we don't know that.

Make all output decimal.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

18c20f37

x86, MCE, AMD: Move shared bank to node descriptor · 019f34fc

由 Borislav Petkov 提交于 5月 02, 2012

Well, instead of having a real bank 4 on the BSP of each node and
symlinks on the remaining cores, we push it up into the amd_northbridge
descriptor which now contains a pointer to the northbridge bank 4
because the bank is one per northbridge and, as such, belongs in the NB
descriptor anyway.

Each time we hotplug CPUs, we use the northbridge pointer to copy the
shared bank into the per-CPU array of threshold_banks pointers, or
destroy it when the last CPU on the node goes offline, or create it when
the first comes online.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

019f34fc

B
x86, MCE, AMD: Remove local_allocate_... wrapper · 26ab256e
由 Borislav Petkov 提交于 5月 02, 2012
```
It is unneeded now so drop it.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
```
26ab256e

x86, MCE, AMD: Remove shared banks sysfs linking · 92e26e2a

由 Borislav Petkov 提交于 5月 02, 2012

The code used to create a symlink on all non-BSP cores of a node when
the MCi_MISCj bank is present once per node. (This is generally the
case with bank 4 on AMD). However, these sysfs links cause a bunch
of problems with cpu off-/onlining testing and are, as such, a bit
overengineered. IOW, there's nothing wrong with having normal sysfs
files for the shared banks since the corresponding MSRs are replicated
across each core anyway.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

92e26e2a

x86, amd_nb: Export model 0x10 and later PCI id · 24214449

由 Borislav Petkov 提交于 5月 04, 2012

Add the F3 PCI id of F15h, model 0x10 to pci_ids.h and to the amd_nb
code which generates the list of northbridges on an AMD box. Shorten
define name while at it so that it fits into pci_ids.h.
Acked-by: NClemens Ladisch <clemens@ladisch.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>

24214449

02 6月, 2012 7 次提交

x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32 · bad1a753

由 H.J. Lu 提交于 5月 21, 2012

When I added x32 ptrace to 3.4 kernel, I also include PTRACE_ARCH_PRCTL
support for x32 GDB For ARCH_GET_FS/GS, it takes a pointer to int64. But
at user level, ARCH_GET_FS/GS takes a pointer to int32. So I have to add
x32 ptrace to glibc to handle it with a temporary int64 passed to kernel and
copy it back to GDB as int32. Roland suggested that PTRACE_ARCH_PRCTL
is obsolete and x32 GDB should use fs_base and gs_base fields of
user_regs_struct instead.

Accordingly, remove PTRACE_ARCH_PRCTL completely from the x32 code to
avoid possible memory overrun when pointer to int32 is passed to
kernel.

Link: http://lkml.kernel.org/r/CAMe9rOpDzHfS7NH7m1vmD9QRw8SSj4Sc%2BaNOgcWm_WJME2eRsQ@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4

bad1a753

x86: get rid of calling do_notify_resume() when returning to kernel mode · 44fbbb3d

由 Al Viro 提交于 4月 30, 2012

If we end up calling do_notify_resume() with !user_mode(refs), it
does nothing (do_signal() explicitly bails out and we can't get there
with TIF_NOTIFY_RESUME in such situations).  Then we jump to
resume_userspace_sig, which rechecks the same thing and bails out
to resume_kernel, thus breaking the loop.

It's easier and cheaper to check *before* calling do_notify_resume()
and bail out to resume_kernel immediately.  And kill the check in
do_signal()...

Note that on amd64 we can't get there with !user_mode() at all - asm
glue takes care of that.
Acked-and-reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44fbbb3d

new helper: signal_delivered() · efee984c

由 Al Viro 提交于 4月 28, 2012

Does block_sigmask() + tracehook_signal_handler();  called when
sigframe has been successfully built.  All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).

I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efee984c

most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set · 77097ae5

由 Al Viro 提交于 4月 27, 2012

Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77097ae5

A
pull clearing RESTORE_SIGMASK into block_sigmask() · a610d6e6
由 Al Viro 提交于 5月 21, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a610d6e6

new helper: sigmask_to_save() · b7f9a11a

由 Al Viro 提交于 5月 02, 2012

replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7f9a11a

new helper: restore_saved_sigmask() · 51a7b448

由 Al Viro 提交于 5月 21, 2012

first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper.  Open-coded instances switched...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51a7b448

01 6月, 2012 5 次提交

ftrace/x86: Do not change stacks in DEBUG when calling lockdep · 5963e317

由 Steven Rostedt 提交于 5月 30, 2012

When both DYNAMIC_FTRACE and LOCKDEP are set, the TRACE_IRQS_ON/OFF
will call into the lockdep code. The lockdep code can call lots of
functions that may be traced by ftrace. When ftrace is updating its
code and hits a breakpoint, the breakpoint handler will call into
lockdep. If lockdep happens to call a function that also has a breakpoint
attached, it will jump back into the breakpoint handler resetting
the stack to the debug stack and corrupt the contents currently on
that stack.

The 'do_sym' call that calls do_int3() is protected by modifying the
IST table to point to a different location if another breakpoint is
hit. But the TRACE_IRQS_OFF/ON are outside that protection, and if
a breakpoint is hit from those, the stack will get corrupted, and
the kernel will crash:

[ 1013.243754] BUG: unable to handle kernel NULL pointer dereference at 0000000000000002
[ 1013.272665] IP: [<ffff880145cc0000>] 0xffff880145cbffff
[ 1013.285186] PGD 1401b2067 PUD 14324c067 PMD 0
[ 1013.298832] Oops: 0010 [#1] PREEMPT SMP
[ 1013.310600] CPU 2
[ 1013.317904] Modules linked in: ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables crc32c_intel ghash_clmulni_intel microcode usb_debug serio_raw pcspkr iTCO_wdt i2c_i801 iTCO_vendor_support e1000e nfsd nfs_acl auth_rpcgss lockd sunrpc i915 video i2c_algo_bit drm_kms_helper drm i2c_core [last unloaded: scsi_wait_scan]
[ 1013.401848]
[ 1013.407399] Pid: 112, comm: kworker/2:1 Not tainted 3.4.0+ #30
[ 1013.437943] RIP: 8eb8:[<ffff88014630a000>]  [<ffff88014630a000>] 0xffff880146309fff
[ 1013.459871] RSP: ffffffff8165e919:ffff88014780f408  EFLAGS: 00010046
[ 1013.477909] RAX: 0000000000000001 RBX: ffffffff81104020 RCX: 0000000000000000
[ 1013.499458] RDX: ffff880148008ea8 RSI: ffffffff8131ef40 RDI: ffffffff82203b20
[ 1013.521612] RBP: ffffffff81005751 R08: 0000000000000000 R09: 0000000000000000
[ 1013.543121] R10: ffffffff82cdc318 R11: 0000000000000000 R12: ffff880145cc0000
[ 1013.564614] R13: ffff880148008eb8 R14: 0000000000000002 R15: ffff88014780cb40
[ 1013.586108] FS:  0000000000000000(0000) GS:ffff880148000000(0000) knlGS:0000000000000000
[ 1013.609458] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1013.627420] CR2: 0000000000000002 CR3: 0000000141f10000 CR4: 00000000001407e0
[ 1013.649051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1013.670724] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1013.692376] Process kworker/2:1 (pid: 112, threadinfo ffff88013fe0e000, task ffff88014020a6a0)
[ 1013.717028] Stack:
[ 1013.724131]  ffff88014780f570 ffff880145cc0000 0000400000004000 0000000000000000
[ 1013.745918]  cccccccccccccccc ffff88014780cca8 ffffffff811072bb ffffffff81651627
[ 1013.767870]  ffffffff8118f8a7 ffffffff811072bb ffffffff81f2b6c5 ffffffff81f11bdb
[ 1013.790021] Call Trace:
[ 1013.800701] Code: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a <e7> d7 64 81 ff ff ff ff 01 00 00 00 00 00 00 00 65 d9 64 81 ff
[ 1013.861443] RIP  [<ffff88014630a000>] 0xffff880146309fff
[ 1013.884466]  RSP <ffff88014780f408>
[ 1013.901507] CR2: 0000000000000002

The solution was to reuse the NMI functions that change the IDT table to make the debug
stack keep its current stack (in kernel mode) when hitting a breakpoint:

  call debug_stack_set_zero
  TRACE_IRQS_ON
  call debug_stack_reset

If the TRACE_IRQS_ON happens to hit a breakpoint then it will keep the current stack
and not crash the box.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5963e317

x86: Allow nesting of the debug stack IDT setting · f8988175

由 Steven Rostedt 提交于 5月 30, 2012

When the NMI handler runs, it checks if it preempted a debug handler
and if that handler is using the debug stack. If it is, it changes the
IDT table not to update the stack, otherwise it will reset the debug
stack and corrupt the debug handler it preempted.

Now that ftrace uses breakpoints to change functions from nops to
callers, many more places may hit a breakpoint. Unfortunately this
includes some of the calls that lockdep performs. Which causes issues
with the debug stack. It too needs to change the debug stack before
tracing (if called from the debug handler).

Allow the debug_stack_set_zero() and debug_stack_reset() to be nested
so that the debug handlers can take advantage of them too.

[ Used this_cpu_*() over __get_cpu_var() as suggested by H. Peter Anvin ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f8988175

x86: Reset the debug_stack update counter · c0525a69

由 Steven Rostedt 提交于 5月 30, 2012

When an NMI goes off and it sees that it preempted the debug stack,
to keep the debug stack safe, it changes the IDT to point to one that
does not modify the stack on breakpoint (to allow breakpoints in NMIs).

But the variable that gets set to know to undo it on exit never gets
cleared on exit. Thus every NMI will reset it on exit the first time
it is done even if it does not need to be reset.

[ Added H. Peter Anvin's suggestion to use this_cpu_read/write ]

Cc: <stable@vger.kernel.org> # v3.3
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c0525a69

ftrace: Use breakpoint method to update ftrace caller · 8a4d0a68

由 Steven Rostedt 提交于 5月 30, 2012

On boot up and module load, it is fine to modify the code directly,
without the use of breakpoints. This is because boot up modification
is done before SMP is initialized, thus the modification is serial,
and module load is done before the module executes.

But after that we must use a SMP safe method to modify running code.
Otherwise, if we are running the function tracer and update its
function (by starting off the stack tracer, or perf tracing)
the change of the function called by the ftrace trampoline is done
directly. If this is being executed on another CPU, that CPU may
take a GPF and crash the kernel.

The breakpoint method is used to change the nops at all the functions, but
the change of the ftrace callback handler itself was still using a
direct modification. If tracing was enabled and the function callback
was changed then another CPU could fault if it was currently calling
the original callback. This modification must use the breakpoint method
too.

Note, the direct method is still used for boot up and module load.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8a4d0a68

ftrace: Synchronize variable setting with breakpoints · a192cd04

由 Steven Rostedt 提交于 5月 30, 2012

When the function tracer starts modifying the code via breakpoints
it sets a variable (modifying_ftrace_code) to inform the breakpoint
handler to call the ftrace int3 code.

But there's no synchronization between setting this code and the
handler, thus it is possible for the handler to be called on another
CPU before it sees the variable. This will cause a kernel crash as
the int3 handler will not know what to do with it.

I originally added smp_mb()'s to force the visibility of the variable
but H. Peter Anvin suggested that I just make it atomic.

[ Added comments as suggested by Peter Zijlstra ]
Suggested-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a192cd04

31 5月, 2012 2 次提交

x86/mce: Cleanup timer mess · 82f7af09

由 Thomas Gleixner 提交于 5月 24, 2012

Use unsigned long for dealing with jiffies not int. Rename the
callback to something sensible. Use __this_cpu_read/write for
accessing per cpu data.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

82f7af09

x86, mtrr: Fix a type overflow in range_to_mtrr func · 2da06af8

由 zhenzhong.duan 提交于 5月 30, 2012

When boot on sun G5+ with 4T mem, see an overflow in mtrr cleanup as below.

*BAD*gran_size: 2G      chunk_size: 2G  num_reg: 10     lose cover RAM:
-18014398505283592M

This is because 1<<31 sign extended. Use an unsigned long constant to
fix it.  Useful for mem larger than or equal to 4T.

-v2: Use 64bit constant instead of explicit type conversion as suggested
by Yinghai. Description updated too.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
Link: http://lkml.kernel.org/r/4FC5A77F.6060505@oracle.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

2da06af8

30 5月, 2012 2 次提交

x86: print physical addresses consistently with other parts of kernel · 365811d6

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -found SMP MP-table at [ffff8800000fce90] fce90
    +found SMP MP-table at [mem 0x000fce90-0x000fce9f] mapped at [ffff8800000fce90]
    -initial memory mapped : 0 - 20000000
    +initial memory mapped: [mem 0x00000000-0x1fffffff]
    -Base memory trampoline at [ffff88000009c000] 9c000 size 8192
    +Base memory trampoline [mem 0x0009c000-0x0009dfff] mapped at [ffff88000009c000]
    -SRAT: Node 0 PXM 0 0-80000000
    +SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

365811d6

x86: print e820 physical addresses consistently with other parts of kernel · 91eb0f67

由 Bjorn Helgaas 提交于 5月 29, 2012

Print physical address info in a style consistent with the %pR style used
elsewhere in the kernel.  For example:

    -BIOS-provided physical RAM map:
    +e820: BIOS-provided physical RAM map:
    - BIOS-e820: 0000000000000100 - 000000000009e000 (usable)
    +BIOS-e820: [mem 0x0000000000000100-0x000000000009dfff] usable
    -Allocating PCI resources starting at 90000000 (gap: 90000000:6ed1c000)
    +e820: [mem 0x90000000-0xfed1bfff] available for PCI devices
    -reserve RAM buffer: 000000000009e000 - 000000000009ffff
    +e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff]
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

91eb0f67

25 5月, 2012 1 次提交

x86: hpet: Fix copy-and-paste mistake in earlier change · 1b38a3a1

由 Jan Beulich 提交于 5月 25, 2012

This fixes an oversight in 396e2c6f
("x86: Clear HPET configuration registers on startup"), noticed by
Thomas Gleixner.
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/4FBF7DA902000078000861EE@nat28.tlf.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1b38a3a1

24 5月, 2012 4 次提交

A
move key_repace_session_keyring() into tracehook_notify_resume() · a42c6ded
由 Al Viro 提交于 5月 23, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a42c6ded

x86/mce: Add instruction recovery signatures to mce-severity table · 37c3459b

由 Tony Luck 提交于 5月 10, 2012

Instruction recovery cases are very similar to the data recovery one
we already have. Just trade out for a new MCACOD value.
Signed-off-by: NTony Luck <tony.luck@intel.com>

37c3459b

x86/mce: Fix check for processor context when machine check was taken. · 875e2664

由 Tony Luck 提交于 5月 23, 2012

Linus pointed out that there was no value is checking whether m->ip
was zero - because zero is a legimate value.  If we have a reliable
(or faked in the VM86 case) "m->cs" we can use it to tell whether we
were in user mode or kernelwhen the machine check hit.
Reported-by: NLinus Torvalds <torvalds@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>

875e2664

MCE: Fix vm86 handling for 32bit mce handler · a129a7c8

由 Andi Kleen 提交于 11月 19, 2010

When running on 32bit the mce handler could misinterpret
vm86 mode as ring 0. This can affect whether it does recovery
or not; it was possible to panic when recovery was actually
possible.

Fix this by always forcing vm86 to look like ring 3.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NTony Luck <tony.luck@intel.com>

a129a7c8

23 5月, 2012 1 次提交

x86/mce: Fix 32-bit build · 80f03361

由 Borislav Petkov 提交于 5月 22, 2012

Got bitten again by the BIT() macro:

 arch/x86/kernel/cpu/mcheck/mce.c: In function '__mcheck_cpu_apply_quirks':
 arch/x86/kernel/cpu/mcheck/mce.c:1453:6: warning: left shift
 count >= width of type arch/x86/kernel/cpu/mcheck/mce.c:1454:7: warning: left shift count >= width of type

Fix it already.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Cc: Frank Arnold <frank.arnold@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1337684026-19740-2-git-send-email-bp@amd64.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

80f03361

22 5月, 2012 2 次提交

new helper: sigsuspend() · 68f3f16d

由 Al Viro 提交于 5月 21, 2012

guts of saved_sigmask-based sigsuspend/rt_sigsuspend.  Takes
kernel sigset_t *.

Open-coded instances replaced with calling it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

68f3f16d

x86, printk: Add missing KERN_CONT to NMI selftest · 29d679ff

由 Sasha Levin 提交于 5月 08, 2012

Fix this behaviour:

----------------
| NMI testsuite:
--------------------
  remote IPI:
  ok  |

   local IPI:
  ok  |

Revealed due to a new modification to printk().
Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
Link: http://lkml.kernel.org/r/1336492573-17530-3-git-send-email-levinsasha928@gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

29d679ff

21 5月, 2012 2 次提交

X86: integrate CMA with DMA-mapping subsystem · 0a2b9a6e

由 Marek Szyprowski 提交于 12月 29, 2011

This patch adds support for CMA to dma-mapping subsystem for x86
architecture that uses common pci-dma/pci-nommu implementation. This
allows to test CMA on KVM/QEMU and a lot of common x86 boxes.
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
CC: Michal Nazarewicz <mina86@mina86.com>
Acked-by: NArnd Bergmann <arnd@arndb.de>

0a2b9a6e

x86/pci-calgary_64.c: Remove obsoleted simple_strtoul() usage · 74bc4917

由 Shuah Khan 提交于 5月 20, 2012

Change calgary_parse_options() to call kstrtoul() instead of
calling obsoleted simple_strtoul().
Signed-off-by: NShuah Khan <shuahkhan@gmail.com>
Acked-by: NMuli Ben-Yehuda <muli@cs.technion.ac.il>
Cc: jdmason@kudzu.us
Link: http://lkml.kernel.org/r/1337556268.3126.5.camel@lorien2Signed-off-by: NIngo Molnar <mingo@kernel.org>

74bc4917

18 5月, 2012 4 次提交

perf/x86: Update event scheduling constraints for AMD family 15h models · 5bcdf5e4

由 Robert Richter 提交于 5月 18, 2012

This update is for newer family 15h cpu models from 0x02 to 0x1f.
Signed-off-by: NRobert Richter <robert.richter@amd.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: stable@vger.kernel.org # v2.6.39+
Link: http://lkml.kernel.org/r/1337337642-1621-1-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

5bcdf5e4

x86/apic: Implement EIO micro-optimization · 0ab711ae

由 Michael S. Tsirkin 提交于 5月 16, 2012

We know both register and value for eoi beforehand,
so there's no need to check it and no need to do math
to calculate the msr. Saves instructions/branches
on each EOI when using x2apic.

I looked at the objdump output to verify that the
generated code looks right and actually is shorter.

The real improvemements will be on the KVM guest side
though, those come in a later patch.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/e019d1a125316f10d3e3a4b2f6bda41473f4fb72.1337184153.git.mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0ab711ae

x86/apic: Add apic->eoi_write() callback · 2a43195d

由 Michael S. Tsirkin 提交于 5月 16, 2012

Add eoi_write callback so that kvm can override
eoi accesses without touching the rest of the apic.
As a side-effect, this will enable a micro-optimization
for apics using msr.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: gleb@redhat.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/0df425d746c49ac2ecc405174df87752869629d2.1337184153.git.mst@redhat.com
[ tidied it up a bit ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

2a43195d

MCA: delete all remaining traces of microchannel bus support. · bb8187d3

由 Paul Gortmaker 提交于 5月 17, 2012

Hardware with MCA bus is limited to 386 and 486 class machines
that are now 20+ years old and typically with less than 32MB
of memory.  A quick search on the internet, and you see that
even the MCA hobbyist/enthusiast community has lost interest
in the early 2000 era and never really even moved ahead from
the 2.4 kernels to the 2.6 series.

This deletes anything remaining related to CONFIG_MCA from core
kernel code and from the x86 architecture.  There is no point in
carrying this any further into the future.

One complication to watch for is inadvertently scooping up
stuff relating to machine check, since there is overlap in
the TLA name space (e.g. arch/x86/boot/mca.c).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: x86@kernel.org
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>

bb8187d3

17 5月, 2012 5 次提交

sched: Remove stale power aware scheduling remnants and dysfunctional knobs · 8e7fbcbc

由 Peter Zijlstra 提交于 1月 09, 2012

It's been broken forever (i.e. it's not scheduling in a power
aware fashion), as reported by Suresh and others sending
patches, and nobody cares enough to fix it properly ...
so remove it to make space free for something better.

There's various problems with the code as it stands today, first
and foremost the user interface which is bound to topology
levels and has multiple values per level. This results in a
state explosion which the administrator or distro needs to
master and almost nobody does.

Furthermore large configuration state spaces aren't good, it
means the thing doesn't just work right because it's either
under so many impossibe to meet constraints, or even if
there's an achievable state workloads have to be aware of
it precisely and can never meet it for dynamic workloads.

So pushing this kind of decision to user-space was a bad idea
even with a single knob - it's exponentially worse with knobs
on every node of the topology.

There is a proposal to replace the user interface with a single
3 state knob:

 sched_balance_policy := { performance, power, auto }

where 'auto' would be the preferred default which looks at things
like Battery/AC mode and possible cpufreq state or whatever the hw
exposes to show us power use expectations - but there's been no
progress on it in the past many months.

Aside from that, the actual implementation of the various knobs
is known to be broken. There have been sporadic attempts at
fixing things but these always stop short of reaching a mergable
state.

Therefore this wholesale removal with the hopes of spurring
people who care to come forward once again and work on a
coherent replacement.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

8e7fbcbc

ftrace/x86: Have x86 ftrace use the ftrace_modify_all_code() · e4f5d544

由 Steven Rostedt 提交于 4月 27, 2012

To remove duplicate code, have the ftrace arch_ftrace_update_code()
use the generic ftrace_modify_all_code(). This requires that the
default ftrace_replace_code() becomes a weak function so that an
arch may override it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e4f5d544

x86, fpu: drop the fpu state during thread exit · 1dcc8d7b

由 Suresh Siddha 提交于 5月 16, 2012

There is no need to save any active fpu state to the task structure
memory if the task is dead. Just drop the state instead.

For example, this saved some 1770 xsave's during the system boot
of a two socket Xeon system.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-4-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

1dcc8d7b

x86, xsave: remove thread_has_fpu() bug check in __sanitize_i387_state() · d75f1b39

由 Suresh Siddha 提交于 5月 16, 2012

Code paths like fork(), exit() and signal handling flush the fpu
state explicitly to the structures in memory.

BUG_ON() in __sanitize_i387_state() is checking that the fpu state
is not live any more. But for preempt kernels, task can be scheduled
out and in at any place and the preload_fpu logic during context switch
can make the fpu registers live again.

For example, consider a 64-bit Task which uses fpu frequently and as such
you will find its fpu_counter mostly non-zero. During its time slice, kernel
used fpu by doing kernel_fpu_begin/kernel_fpu_end(). After this, in the same
scheduling slice, task-A got a signal to handle. Then during the signal
setup path we got preempted when we are just before the sanitize_i387_state()
in arch/x86/kernel/xsave.c:save_i387_xstate(). And when we come back we
will have the fpu registers live that can hit the bug_on.

Similarly during core dump, other threads can context-switch in and out
(because of spurious wakeups while waiting for the coredump to finish in
kernel/exit.c:exit_mm()) and the main thread dumping core can run into this
bug when it finds some other thread with its fpu registers live on some other cpu.

So remove the paranoid check for now, even though it caught a bug in the
multi-threaded core dump case (fixed in the previous patch).
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-3-git-send-email-suresh.b.siddha@intel.com
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

d75f1b39

fork: move the real prepare_to_copy() users to arch_dup_task_struct() · 55ccf3fe

由 Suresh Siddha 提交于 5月 16, 2012

Historical prepare_to_copy() is mostly a no-op, duplicated for majority of
the architectures and the rest following the x86 model of flushing the extended
register state like fpu there.

Remove it and use the arch_dup_task_struct() instead.
Suggested-by: NOleg Nesterov <oleg@redhat.com>
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1336692811-30576-1-git-send-email-suresh.b.siddha@intel.comAcked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Chris Zankel <chris@zankel.net>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.chen@sunplusct.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

55ccf3fe