提交 · 4648da7cb4079e263eaf4dcd3b10fdb2409d4ad6 · openeuler / Kernel

20 7月, 2012 8 次提交

xen: remove cast from HYPERVISOR_shared_info assignment · 4648da7c

由 Olaf Hering 提交于 7月 17, 2012

Both have type struct shared_info so no cast is needed.
Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

4648da7c

xen/x86: avoid updating TLS descriptors if they haven't changed · 1c32cdc6

由 David Vrabel 提交于 7月 09, 2012

When switching tasks in a Xen PV guest, avoid updating the TLS
descriptors if they haven't changed.  This improves the speed of
context switches by almost 10% as much of the time the descriptors are
the same or only one is different.

The descriptors written into the GDT by Xen are modified from the
values passed in the update_descriptor hypercall so we keep shadow
copies of the three TLS descriptors to compare against.

lmbench3 test     Before  After  Improvement
--------------------------------------------
lat_ctx -s 32 24   7.19    6.52  9%
lat_pipe          12.56   11.66  7%
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

1c32cdc6

xen/x86: add desc_equal() to compare GDT descriptors · 59290362

由 David Vrabel 提交于 7月 09, 2012

Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
[v1: Moving it to the Xen file]
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

59290362

xen/mm: zero PTEs for non-present MFNs in the initial page table · 66a27dde

由 David Vrabel 提交于 7月 09, 2012

When constructing the initial page tables, if the MFN for a usable PFN
is missing in the p2m then that frame is initially ballooned out.  In
this case, zero the PTE (as in decrease_reservation() in
drivers/xen/balloon.c).

This is obviously safe instead of having an valid PTE with an MFN of
INVALID_P2M_ENTRY (~0).
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

66a27dde

xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable · d095d43e

由 David Vrabel 提交于 7月 09, 2012

In xen_set_pte() if batching is unavailable (because the caller is in
an interrupt context such as handling a page fault) it would fall back
to using native_set_pte() and trapping and emulating the PTE write.

On 32-bit guests this requires two traps for each PTE write (one for
each dword of the PTE).  Instead, do one mmu_update hypercall
directly.

During construction of the initial page tables, continue to use
native_set_pte() because most of the PTEs being set are in writable
and unpinned pages (see phys_pmd_init() in arch/x86/mm/init_64.c) and
using a hypercall for this is very expensive.

This significantly improves page fault performance in 32-bit PV
guests.

lmbench3 test  Before    After     Improvement
----------------------------------------------
lat_pagefault  3.18 us   2.32 us   27%
lat_proc fork  356 us    313.3 us  11%
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

d095d43e

xen/mce: Register native mce handler as vMCE bounce back point · 05e36006

由 Liu, Jinsong 提交于 6月 07, 2012

When Xen hypervisor inject vMCE to guest, use native mce handler
to handle it
Signed-off-by: NKe, Liping <liping.ke@intel.com>
Signed-off-by: NJiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

05e36006

x86, MCE, AMD: Adjust initcall sequence for xen · a8fccdb0

由 Liu, Jinsong 提交于 6月 07, 2012

there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device

xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;

mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.

so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);

when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Acked-and-tested-by: NBorislav Petkov <bp@amd64.org>
Suggested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

a8fccdb0

xen/mce: Add mcelog support for Xen platform · cef12ee5

由 Liu, Jinsong 提交于 6月 07, 2012

When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.

This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.

By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.

To test follow directions outlined in Documentation/acpi/apei/einj.txt
Acked-and-tested-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NKe, Liping <liping.ke@intel.com>
Signed-off-by: NJiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NLiu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

cef12ee5

15 6月, 2012 1 次提交

xen: mark local pages as FOREIGN in the m2p_override · b9e0d95c

由 Stefano Stabellini 提交于 5月 23, 2012

When the frontend and the backend reside on the same domain, even if we
add pages to the m2p_override, these pages will never be returned by
mfn_to_pfn because the check "get_phys_to_machine(pfn) != mfn" will
always fail, so the pfn of the frontend will be returned instead
(resulting in a deadlock because the frontend pages are already locked).

INFO: task qemu-system-i38:1085 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
qemu-system-i38 D ffff8800cfc137c0     0  1085      1 0x00000000
 ffff8800c47ed898 0000000000000282 ffff8800be4596b0 00000000000137c0
 ffff8800c47edfd8 ffff8800c47ec010 00000000000137c0 00000000000137c0
 ffff8800c47edfd8 00000000000137c0 ffffffff82213020 ffff8800be4596b0
Call Trace:
 [<ffffffff81101ee0>] ? __lock_page+0x70/0x70
 [<ffffffff81a0fdd9>] schedule+0x29/0x70
 [<ffffffff81a0fe80>] io_schedule+0x60/0x80
 [<ffffffff81101eee>] sleep_on_page+0xe/0x20
 [<ffffffff81a0e1ca>] __wait_on_bit_lock+0x5a/0xc0
 [<ffffffff81101ed7>] __lock_page+0x67/0x70
 [<ffffffff8106f750>] ? autoremove_wake_function+0x40/0x40
 [<ffffffff811867e6>] ? bio_add_page+0x36/0x40
 [<ffffffff8110b692>] set_page_dirty_lock+0x52/0x60
 [<ffffffff81186021>] bio_set_pages_dirty+0x51/0x70
 [<ffffffff8118c6b4>] do_blockdev_direct_IO+0xb24/0xeb0
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff8118ca95>] __blockdev_direct_IO+0x55/0x60
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff811e91c8>] ext3_direct_IO+0xf8/0x390
 [<ffffffff811e71a0>] ? ext3_get_blocks_handle+0xe00/0xe00
 [<ffffffff81004b60>] ? xen_mc_flush+0xb0/0x1b0
 [<ffffffff81104027>] generic_file_aio_read+0x737/0x780
 [<ffffffff813bedeb>] ? gnttab_map_refs+0x15b/0x1e0
 [<ffffffff811038f0>] ? find_get_pages+0x150/0x150
 [<ffffffff8119736c>] aio_rw_vect_retry+0x7c/0x1d0
 [<ffffffff811972f0>] ? lookup_ioctx+0x90/0x90
 [<ffffffff81198856>] aio_run_iocb+0x66/0x1a0
 [<ffffffff811998b8>] do_io_submit+0x708/0xb90
 [<ffffffff81199d50>] sys_io_submit+0x10/0x20
 [<ffffffff81a18d69>] system_call_fastpath+0x16/0x1b

The explanation is in the comment within the code:

We need to do this because the pages shared by the frontend
(xen-blkfront) can be already locked (lock_page, called by
do_read_cache_page); when the userspace backend tries to use them
with direct_IO, mfn_to_pfn returns the pfn of the frontend, so
do_blockdev_direct_IO is going to try to lock the same pages
again resulting in a deadlock.

A simplified call graph looks like this:

pygrub                          QEMU
-----------------------------------------------
do_read_cache_page              io_submit
  |                              |
lock_page                       ext3_direct_IO
                                 |
                                bio_add_page
                                 |
                                lock_page

Internally the xen-blkback uses m2p_add_override to swizzle (temporarily)
a 'struct page' to have a different MFN (so that it can point to another
guest). It also can easily find out whether another pfn corresponding
to the mfn exists in the m2p, and can set the FOREIGN bit
in the p2m, making sure that mfn_to_pfn returns the pfn of the backend.

This allows the backend to perform direct_IO on these pages, but as a
side effect prevents the frontend from using get_user_pages_fast on
them while they are being shared with the backend.
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>

b9e0d95c

14 6月, 2012 1 次提交

x86: dma-mapping: fix broken allocation when dma_mask has been provided · c080e26e

由 Marek Szyprowski 提交于 6月 14, 2012

Commit 0a2b9a6e ("X86: integrate CMA with DMA-mapping subsystem")
broke memory allocation with dma_mask. This patch fixes possible kernel
ops caused by lack of resetting page variable when jumping to 'again' label.
Reported-by: NKonrad Rzeszutek Wilk <konrad@darnok.org>
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Acked-by: NMichal Nazarewicz <mina86@mina86.com>

c080e26e

13 6月, 2012 2 次提交

perf/x86: Fix broken LBR fixup code · 25f42985

由 Stephane Eranian 提交于 6月 11, 2012

I noticed that the LBR fixups were not working anymore
on programs where they used to. I tracked this down to
a recent change to copy_from_user_nmi():

 db0dc75d ("perf/x86: Check user address explicitly in copy_from_user_nmi()")

This commit added a call to __range_not_ok() to the
copy_from_user_nmi() routine. The problem is that the logic
of the test must be reversed. __range_not_ok() returns 0 if the
range is VALID. We want to return early from copy_from_user_nmi()
if the range is NOT valid.
Signed-off-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NArun Sharma <asharma@fb.com>
Link: http://lkml.kernel.org/r/20120611134426.GA7542@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>

25f42985

x86/smp: Fix topology checks on AMD MCM CPUs · 161270fc

由 Borislav Petkov 提交于 6月 06, 2012

The warning below triggers on AMD MCM packages because physical package
IDs on the cores of a _physical_ socket are the same. I.e., this field
says which CPUs belong to the same physical package.

However, the same two CPUs belong to two different internal, i.e.
"logical" nodes in the same physical socket which is reflected in the
CPU-to-node map on x86 with NUMA.

Which makes this check wrong on the above topologies so circumvent it.

[    0.444413] Booting Node   0, Processors  #1 #2 #3 #4 #5 Ok.
[    0.461388] ------------[ cut here ]------------
[    0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81()
[    0.473960] Hardware name: Dinar
[    0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
[    0.486860] Booting Node   1, Processors  #6
[    0.491104] Modules linked in:
[    0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1
[    0.499510] Call Trace:
[    0.501946]  [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81
[    0.508185]  [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d
[    0.514163]  [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48
[    0.519881]  [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81
[    0.525943]  [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371
[    0.532004]  [<ffffffff8144c4ee>] start_secondary+0x19a/0x218
[    0.537729] ---[ end trace 4eaa2a86a8e2da22 ]---
[    0.628197]  #7 #8 #9 #10 #11 Ok.
[    0.807108] Booting Node   3, Processors  #12 #13 #14 #15 #16 #17 Ok.
[    0.897587] Booting Node   2, Processors  #18 #19 #20 #21 #22 #23 Ok.
[    0.917443] Brought up 24 CPUs

We ran a topology sanity check test we have here on it and
it all looks ok... hopefully :).
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120529135442.GE29157@aftab.osrc.amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

161270fc

12 6月, 2012 1 次提交

x86: kvmclock: remove check_and_clear_guest_paused warning · e32025a5

由 Marcelo Tosatti 提交于 6月 11, 2012

CPU offline path calls the hrtimer interrupt handler with interrupts
disabled, without touching preempt_count, triggering this warning.

Remove the warning since it is supposed to be used from hrtimer
interrupt context only.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

e32025a5

11 6月, 2012 1 次提交

x86/mm: Fix some kernel-doc warnings · 9efc31b8

由 Wanpeng Li 提交于 6月 10, 2012

Fix kernel-doc warnings in arch/x86/mm/ioremap.c and
arch/x86/mm/pageattr.c, just like this one:

  Warning(arch/x86/mm/ioremap.c:204):
     No description found for parameter 'phys_addr'
  Warning(arch/x86/mm/ioremap.c:204):
     Excess function parameter 'offset' description in 'ioremap_nocache'
Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
Cc: Wanpeng Li <liwp.linux@gmail.com>
Link: http://lkml.kernel.org/r/1339296652-2935-1-git-send-email-liwp.linux@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9efc31b8

10 6月, 2012 1 次提交

x86, um: Correct syscall table type attributes breaking gcc 4.8 · 9271b0b4

由 Martin Pelikan 提交于 6月 09, 2012

The latest GCC 4.8 does some more checking on type attributes that
break the build for ARCH=um -> fill them in. Specifically, the
"asmlinkage" attributes is now tested for consistency.
Signed-off-by: NMartin Pelikan <pelikan@storkhole.cz>
Link: http://lkml.kernel.org/r/1339269731-10772-1-git-send-email-pelikan@storkhole.czAcked-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

9271b0b4

08 6月, 2012 4 次提交

x86/nmi: Fix section mismatch warnings on 32-bit · eeaaa96a

由 Don Zickus 提交于 6月 06, 2012

It was reported that compiling for 32-bit caused a bunch of
section mismatch warnings:

 VDSOSYM arch/x86/vdso/vdso32-syms.lds
  LD      arch/x86/vdso/built-in.o
  LD      arch/x86/built-in.o

 WARNING: arch/x86/built-in.o(.data+0x5af0): Section mismatch in
 reference from the variable test_nmi_ipi_callback_na.10451 to
 the function .init.text:test_nmi_ipi_callback() [...]

 WARNING: arch/x86/built-in.o(.data+0x5b04): Section mismatch in
 reference from the variable nmi_unk_cb_na.10399 to the function
 .init.text:nmi_unk_cb() The variable nmi_unk_cb_na.10399
 references the function __init nmi_unk_cb() [...]

Both of these are attributed to the internal representation of
the nmiaction struct created during register_nmi_handler.  The
reason for this is that those structs are not defined in the
init section whereas the rest of the code in nmi_selftest.c is.

To resolve this, I created a new #define,
register_nmi_handler_initonly, that tags the struct as
__initdata to resolve the mismatch.  This #define should only be
used in rare situations where the register/unregister is called
during init of the kernel.

Big thanks to Jan Beulich for decoding this for me as I didn't
have a clue what was going on.
Reported-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
Tested-by: NWitold Baryluk <baryluk@smp.if.uj.edu.pl>
Cc: Jan Beulich <JBeulich@suse.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1338991542-23000-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

eeaaa96a

x86/uv: Fix UV2 BAU legacy mode · d5d2d2ee

由 Cliff Wickman 提交于 6月 07, 2012

The SGI Altix UV2 BAU (Broadcast Assist Unit) as used for
tlb-shootdown (selective broadcast mode) always uses UV2
broadcast descriptor format. There is no need to clear the
'legacy' (UV1) mode, because the hardware always uses UV2 mode
for selective broadcast.

But the BIOS uses general broadcast and legacy mode, and the
hardware pays attention to the legacy mode bit for general
broadcast. So the kernel must not clear that mode bit.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/r/E1SccoO-0002Lh-Cb@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d5d2d2ee

x86/mm: Only add extra pages count for the first memory range during... · bd2753b2

由 Yinghai Lu 提交于 6月 06, 2012

x86/mm: Only add extra pages count for the first memory range during pre-allocation early page table space

Robin found this regression:

| I just tried to boot an 8TB system.  It fails very early in boot with:
| Kernel panic - not syncing: Cannot find space for the kernel page tables

git bisect commit 722bc6b1.

A git revert of that commit does boot past that point on the 8TB
configuration.

That commit will add up extra pages for all memory range even
above 4g.

Try to limit that extra page count adding to first entry only.
Bisected-by: NRobin Holt <holt@sgi.com>
Tested-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NYinghai Lu <yinghai@kernel.org>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/CAE9FiQUj3wyzQxtq9yzBNc9u220p8JZ1FYHG7t%3DMOzJ%3D9BZMYA@mail.gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

bd2753b2

x86, efi stub: Add .reloc section back into image · 743628e8

由 Jordan Justen 提交于 6月 07, 2012

Some UEFI firmware will not load a .efi with a .reloc section
with a size of 0.

Therefore, we create a .efi image with 4 main areas and 3 sections.
1. PE/COFF file header
2. .setup section (covers all setup code following the first sector)
3. .reloc section (contains 1 dummy reloc entry, created in build.c)
4. .text section (covers the remaining kernel image)

To make room for the new .setup section data, the header
bugger_off_msg had to be shortened.
Reported-by: NHenrik Rydberg <rydberg@euromail.se>
Signed-off-by: NJordan Justen <jordan.l.justen@intel.com>
Link: http://lkml.kernel.org/r/1339085121-12760-1-git-send-email-jordan.l.justen@intel.comTested-by: NLee G Rosenbaum <lee.g.rosenbaum@intel.com>
Tested-by: NHenrik Rydberg <rydberg@euromail.se>
Cc: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

743628e8

06 6月, 2012 17 次提交

perf/x86: Check user address explicitly in copy_from_user_nmi() · db0dc75d

由 Arun Sharma 提交于 4月 20, 2012

Signed-off-by: NArun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-5-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

db0dc75d

perf/x86: Check if user fp is valid · bc6ca7b3

由 Arun Sharma 提交于 4月 20, 2012

Signed-off-by: NArun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-4-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

bc6ca7b3

perf/x86: Allow multiple stacks · 302fa4b5

由 Arun Sharma 提交于 4月 20, 2012

Without this patch, applications with two different stack
regions (eg: native stack vs JIT stack) get truncated
callchains even when RBP chaining is present. GDB shows proper
stack traces and the frame pointer chaining is intact.

This patch disables the (fp < RSP) check, hoping that other checks
in the code save the day for us. In our limited testing, this
didn't seem to break anything.

In the long term, we could potentially have userspace advise
the kernel on the range of valid stack addresses, so we don't
spend a lot of time unwinding from bogus addresses.
Signed-off-by: NArun Sharma <asharma@fb.com>
CC: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-2-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

302fa4b5

perf/x86: Update SNB PEBS constraints · 8440ccb4

由 Peter Zijlstra 提交于 6月 05, 2012

Afaict there's no need to (incompletely) iterate the
MEM_UOPS_RETIRED.* umask state.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

8440ccb4

perf/x86: Enable/Add IvyBridge hardware support · b6db437b

由 Peter Zijlstra 提交于 6月 05, 2012

Implement rudimentary IVB perf support. The SDM states its identical
to SNB with exception of the exact event tables, but a quick look
suggests they're similar enough.

Also mark SNB-EP as broken for now.
Requested-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

b6db437b

perf/x86: Implement cycles:p for SNB/IVB · cccb9ba9

由 Peter Zijlstra 提交于 6月 05, 2012

Now that there's finally a chip with working PEBS (IvyBridge), we can
enable the hardware and implement cycles:p for SNB/IVB.

Cc: Stephane Eranian <eranian@google.com>
Requested-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338884803.28282.153.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

cccb9ba9

perf/x86: Fix Intel shared extra MSR allocation · b430f7c4

由 Peter Zijlstra 提交于 6月 05, 2012

Zheng Yan reported that event group validation can wreck event state
when Intel extra_reg allocation changes event state.

Validation shouldn't change any persistent state. Cloning events in
validate_{event,group}() isn't really pretty either, so add a few
special cases to avoid modifying the event state.

The code is restructured to minimize the special case impact.
Reported-by: NZheng Yan <zheng.z.yan@linux.intel.com>
Acked-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1338903031.28282.175.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>

b430f7c4

sched/x86: Calculate booted cores after construction of sibling_mask · ceb1cbac

由 Kamalesh Babulal 提交于 5月 31, 2012

Commit 316ad248 ("sched/x86: Rewrite set_cpu_sibling_map()")
broke the booted_cores accounting.

The problem is that the booted_cores accounting needs all the
sibling links set up. So restore the second loop and add a comment as
to why its needed.

On qemu booted with -smp sockets=1,cores=2,threads=2;
Before:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 1
 cpu cores       : 4
 cpu cores       : 3

With the patch:
 $ grep cores /proc/cpuinfo
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2
 cpu cores       : 2
Reported-by: NPrarit Bhargava <prarit@redhat.com>
Reported-by: NBorislav Petkov <bp@amd64.org>
Signed-off-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20120531073738.GH7511@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

ceb1cbac

x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs · f6175f5b

由 Tomoki Sekiyama 提交于 5月 28, 2012

In current Linux, percpu variable `vector_irq' is not cleared on
offlined cpus while disabling devices' irqs. If the cpu that has
the disabled irqs in vector_irq is hotplugged,
__setup_vector_irq() hits invalid irq vector and may crash.

This bug can be reproduced as following;

  # echo 0 > /sys/devices/system/cpu/cpu7/online
  # modprobe -r some_driver_using_interrupts      # vector_irq@cpu7 uncleared
  # echo 1 > /sys/devices/system/cpu/cpu7/online  # kernel may crash

This patch fixes this bug by clearing vector_irq in
__clear_irq_vector() even if the cpu is offlined.
Signed-off-by: NTomoki Sekiyama <tomoki.sekiyama.qu@hitachi.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: ltc-kernel@ml.yrl.intra.hitachi.co.jp
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Alexander Gordeev <agordeev@redhat.com>
Link: http://lkml.kernel.org/r/4FC340BE.7080101@hitachi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f6175f5b

x86/reboot: Fix a warning message triggered by stop_other_cpus() · 55c844a4

由 Feng Tang 提交于 5月 30, 2012

When rebooting our 24 CPU Westmere servers with 3.4-rc6, we
always see this warning msg:

Restarting system.
machine restart
------------[ cut here ]------------
WARNING: at arch/x86/kernel/smp.c:125
native_smp_send_reschedule+0x74/0xa7() Hardware name: X8DTN
Modules linked in: igb [last unloaded: scsi_wait_scan]
Pid: 1, comm: systemd-shutdow Not tainted 3.4.0-rc6+ #22
Call Trace:
 <IRQ>  [<ffffffff8102a41f>] warn_slowpath_common+0x7e/0x96
 [<ffffffff8102a44c>] warn_slowpath_null+0x15/0x17
 [<ffffffff81018cf7>] native_smp_send_reschedule+0x74/0xa7
 [<ffffffff810561c1>] trigger_load_balance+0x279/0x2a6
 [<ffffffff81050112>] scheduler_tick+0xe0/0xe9
 [<ffffffff81036768>] update_process_times+0x60/0x70
 [<ffffffff81062f2f>] tick_sched_timer+0x68/0x92
 [<ffffffff81046e33>] __run_hrtimer+0xb3/0x13c
 [<ffffffff81062ec7>] ? tick_nohz_handler+0xd0/0xd0
 [<ffffffff810474f2>] hrtimer_interrupt+0xdb/0x198
 [<ffffffff81019a35>] smp_apic_timer_interrupt+0x81/0x94
 [<ffffffff81655187>] apic_timer_interrupt+0x67/0x70
 <EOI>  [<ffffffff8101a3c4>] ? default_send_IPI_mask_allbutself_phys+0xb4/0xc4
 [<ffffffff8101c680>] physflat_send_IPI_allbutself+0x12/0x14
 [<ffffffff81018db4>] native_nmi_stop_other_cpus+0x8a/0xd6
 [<ffffffff810188ba>] native_machine_shutdown+0x50/0x67
 [<ffffffff81018926>] machine_shutdown+0xa/0xc
 [<ffffffff8101897e>] native_machine_restart+0x20/0x32
 [<ffffffff810189b0>] machine_restart+0xa/0xc
 [<ffffffff8103b196>] kernel_restart+0x47/0x4c
 [<ffffffff8103b2e6>] sys_reboot+0x13e/0x17c
 [<ffffffff8164e436>] ? _raw_spin_unlock_bh+0x10/0x12
 [<ffffffff810fcac9>] ? bdi_queue_work+0xcf/0xd8
 [<ffffffff810fe82f>] ? __bdi_start_writeback+0xae/0xb7
 [<ffffffff810e0d64>] ? iterate_supers+0xa3/0xb7
 [<ffffffff816547a2>] system_call_fastpath+0x16/0x1b
---[ end trace 320af5cb1cb60c5b ]---

The root cause seems to be the
default_send_IPI_mask_allbutself_phys() takes quite some time (I
measured it could be several ms) to complete sending NMIs to all
the other 23 CPUs, and for HZ=250/1000 system, the time is long
enough for a timer interrupt to happen, which will in turn
trigger to kick load balance to a stopped CPU and cause this
warning in native_smp_send_reschedule().

So disabling the local irq before stop_other_cpu() can fix this
problem (tested 25 times reboot ok), and it is fine as there
should be nobody caring the timer interrupt in such reboot
stage.

The latest 3.4 kernel slightly changes this behavior by sending
REBOOT_VECTOR first and only send NMI_VECTOR if the REBOOT_VCTOR
fails, and this patch is still needed to prevent the problem.
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Acked-by: NDon Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20120530231541.4c13433a@feng-i7Signed-off-by: NIngo Molnar <mingo@kernel.org>

55c844a4

x86/intel/moorestown: Change intel_scu_devices_create() to __devinit · 7071f6b2

由 Sebastian Andrzej Siewior 提交于 5月 31, 2012

The allmodconfig hits:

 WARNING: vmlinux.o(.text+0x6553d): Section mismatch in
          reference from the function intel_scu_devices_create() to the
          function .devinit.text: spi_register_board_info()
	  [...]

This patch marks intel_scu_devices_create() as devinit because
it only calls a devinit function, spi_register_board_info().
Signed-off-by: NSebastian Andrzej Siewior <sebastian@breakpoint.cc>
Cc: Alan Cox <alan@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Link: http://lkml.kernel.org/r/20120531212025.GA8519@breakpoint.ccSigned-off-by: NIngo Molnar <mingo@kernel.org>

7071f6b2

x86/numa: Set numa_nodes_parsed at acpi_numa_memory_affinity_init() · 4af463d2

由 Yasuaki Ishimatsu 提交于 6月 04, 2012

When hot-adding a CPU, the system outputs following messages
since node_to_cpumask_map[2] was not allocated memory.

Booting Node 2 Processor 32 APIC 0xc0
node_to_cpumask_map[2] NULL
Pid: 0, comm: swapper/32 Tainted: G       A     3.3.5-acd #21
Call Trace:
 [<ffffffff81048845>] debug_cpumask_set_cpu+0x155/0x160
 [<ffffffff8105e28a>] ? add_timer_on+0xaa/0x120
 [<ffffffff8150665f>] numa_add_cpu+0x1e/0x22
 [<ffffffff815020bb>] identify_cpu+0x1df/0x1e4
 [<ffffffff815020d6>] identify_econdary_cpu+0x16/0x1d
 [<ffffffff81504614>] smp_store_cpu_info+0x3c/0x3e
 [<ffffffff81505263>] smp_callin+0x139/0x1be
 [<ffffffff815052fb>] start_secondary+0x13/0xeb

The reason is that the bit of node 2 was not set at
numa_nodes_parsed. numa_nodes_parsed is set by only
acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init. Thus even if hot-added memory
which is same PXM as hot-added CPU is written in ACPI SRAT
Table, if the hot-added CPU is not written in ACPI SRAT table,
numa_nodes_parsed is not set.

But according to ACPI Spec Rev 5.0, it says about ACPI SRAT
table as follows: This optional table provides information that
allows OSPM to associate processors and memory ranges, including
ranges of memory provided by hot-added memory devices, with
system localities / proximity domains and clock domains.

It means that ACPI SRAT table only provides information for CPUs
present at boot time and for memory including hot-added memory.
So hot-added memory is written in ACPI SRAT table, but hot-added
CPU is not written in it. Thus numa_nodes_parsed should be set
by not only acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init but also
acpi_numa_memory_affinity_init for the case.

Additionally, if system has cpuless memory node,
acpi_numa_processor_affinity_init /
acpi_numa_x2apic_affinity_init cannot set numa_nodes_parseds
since these functions cannot find cpu description for the node.
In this case, numa_nodes_parsed needs to be set by
acpi_numa_memory_affinity_init.
Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: liuj97@gmail.com
Cc: kosaki.motohiro@gmail.com
Link: http://lkml.kernel.org/r/4FCC2098.4030007@jp.fujitsu.com
[ merged it ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4af463d2

x86/gart: Fix kmemleak warning · aff5a62d

由 Xiaotian Feng 提交于 6月 05, 2012

aperture_64.c now is using memblock, the previous
kmemleak_ignore() for alloc_bootmem() should be removed then.

Otherwise, with kmemleak enabled, kernel will throw warnings
like:

[    0.000000] kmemleak: Trying to color unknown object at 0xffff8800c4000000 as Black
[    0.000000] Pid: 0, comm: swapper/0 Not tainted 3.5.0-rc1-next-20120605+ #130
[    0.000000] Call Trace:
[    0.000000]  [<ffffffff811b27e6>] paint_ptr+0x66/0xc0
[    0.000000]  [<ffffffff816b90fb>] kmemleak_ignore+0x2b/0x60
[    0.000000]  [<ffffffff81ef7bc0>] kmemleak_init+0x217/0x2c1
[    0.000000]  [<ffffffff81ed2b97>] start_kernel+0x32d/0x3eb
[    0.000000]  [<ffffffff81ed25e4>] ? repair_env_string+0x5a/0x5a
[    0.000000]  [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]  [<ffffffff81ed2120>] ? early_idt_handlers+0x120/0x120
[    0.000000]  [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000] kmemleak: Early log backtrace:
[    0.000000]    [<ffffffff816b911b>] kmemleak_ignore+0x4b/0x60
[    0.000000]    [<ffffffff81ee6a38>] gart_iommu_hole_init+0x3e7/0x547
[    0.000000]    [<ffffffff81edb20b>] pci_iommu_alloc+0x44/0x6f
[    0.000000]    [<ffffffff81ee81ad>] mem_init+0x19/0xec
[    0.000000]    [<ffffffff81ed2a54>] start_kernel+0x1ea/0x3eb
[    0.000000]    [<ffffffff81ed2356>] x86_64_start_reservations+0x131/0x135
[    0.000000]    [<ffffffff81ed245c>] x86_64_start_kernel+0x102/0x111
[    0.000000]    [<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: NXiaotian Feng <dannyfeng@tencent.com>
Cc: Xiaotian Feng <xtfeng@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Link: http://lkml.kernel.org/r/1338922831-2847-1-git-send-email-xtfeng@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

aff5a62d

x86: mce: Add the dropped timer interval init back · 1a87fc1e

由 Thomas Gleixner 提交于 6月 06, 2012

commit 82f7af09 ("x86/mce: Cleanup timer mess) dropped the
initialization of the per cpu timer interval. Duh :(

Restore the previous behaviour.
Reported-by: NChen Gong <gong.chen@linux.intel.com>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1a87fc1e

x86/decoder: Fix bsr/bsf/jmpe decoding with operand-size prefix · 436d03fa

由 Masami Hiramatsu 提交于 6月 05, 2012

Fix the x86 instruction decoder to decode bsr/bsf/jmpe with
operand-size prefix (66h). This fixes the test case failure
reported by Linus, attached below.

bsf/bsr/jmpe have a special encoding. Opcode map in
Intel Software Developers Manual vol2 says they have
TZCNT/LZCNT variants if it has F3h prefix. However, there
is no information if it has other 66h or F2h prefixes.
Current instruction decoder supposes that those are
bad instructions, but it actually accepts at least
operand-size prefixes.

H. Peter Anvin further explains:

 " TZCNT/LZCNT are F3 + BSF/BSR exactly because the F2 and
   F3 prefixes have historically been no-ops with most instructions.
   This allows software to unconditionally use the prefixed versions
   and get TZCNT/LZCNT on the processors that have them if they don't
   care about the difference. "

This fixes errors reported by test_get_len:

  Warning: arch/x86/tools/test_get_len found difference at <em_bsf>:ffffffff81036d87
  Warning: ffffffff81036de5:	66 0f bc c2          	bsf    %dx,%ax
  Warning: objdump says 4 bytes, but insn_get_length() says 3
  Warning: arch/x86/tools/test_get_len found difference at <em_bsr>:ffffffff81036ea6
  Warning: ffffffff81036f04:	66 0f bd c2          	bsr    %dx,%ax
  Warning: objdump says 4 bytes, but insn_get_length() says 3
  Warning: decoded and checked 13298882 instructions with 2 warnings
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reported-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <yrl.pp-manager.tt@hitachi.com>
Link: http://lkml.kernel.org/r/20120604150911.22338.43296.stgit@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>

436d03fa

x86/mce: Fix the MCE poll timer logic · 958fb3c5

由 Chen Gong 提交于 6月 05, 2012

In commit 82f7af09 ("x86/mce: Cleanup timer mess), Thomas just
forgot the "/ 2" there while cleaning up.
Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: bp@amd64.org
Cc: tony.luck@intel.com
Link: http://lkml.kernel.org/r/1338863702-9245-1-git-send-email-gong.chen@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

958fb3c5

x86/mce: Fix the MCE poll timer logic · c2238f10

由 Chen Gong 提交于 6月 05, 2012

In commit 82f7af09 (x86/mce: Cleanup timer mess), Thomas just forgot
the "/ 2" there while cleaning up.
Signed-off-by: NChen Gong <gong.chen@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NTony Luck <tony.luck@intel.com>

c2238f10

02 6月, 2012 4 次提交

x86, x32, ptrace: Remove PTRACE_ARCH_PRCTL for x32 · bad1a753

由 H.J. Lu 提交于 5月 21, 2012

When I added x32 ptrace to 3.4 kernel, I also include PTRACE_ARCH_PRCTL
support for x32 GDB For ARCH_GET_FS/GS, it takes a pointer to int64. But
at user level, ARCH_GET_FS/GS takes a pointer to int32. So I have to add
x32 ptrace to glibc to handle it with a temporary int64 passed to kernel and
copy it back to GDB as int32. Roland suggested that PTRACE_ARCH_PRCTL
is obsolete and x32 GDB should use fs_base and gs_base fields of
user_regs_struct instead.

Accordingly, remove PTRACE_ARCH_PRCTL completely from the x32 code to
avoid possible memory overrun when pointer to int32 is passed to
kernel.

Link: http://lkml.kernel.org/r/CAMe9rOpDzHfS7NH7m1vmD9QRw8SSj4Sc%2BaNOgcWm_WJME2eRsQ@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org> v3.4

bad1a753

x86: get rid of calling do_notify_resume() when returning to kernel mode · 44fbbb3d

由 Al Viro 提交于 4月 30, 2012

If we end up calling do_notify_resume() with !user_mode(refs), it
does nothing (do_signal() explicitly bails out and we can't get there
with TIF_NOTIFY_RESUME in such situations).  Then we jump to
resume_userspace_sig, which rechecks the same thing and bails out
to resume_kernel, thus breaking the loop.

It's easier and cheaper to check *before* calling do_notify_resume()
and bail out to resume_kernel immediately.  And kill the check in
do_signal()...

Note that on amd64 we can't get there with !user_mode() at all - asm
glue takes care of that.
Acked-and-reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44fbbb3d

new helper: signal_delivered() · efee984c

由 Al Viro 提交于 4月 28, 2012

Does block_sigmask() + tracehook_signal_handler();  called when
sigframe has been successfully built.  All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).

I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efee984c

most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set · 77097ae5

由 Al Viro 提交于 4月 27, 2012

Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77097ae5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功