提交 · d974baa398f34393db76be45f7d4d04fbdbb4a0a · openanolis / cloud-kernel

19 10月, 2014 1 次提交

x86,kvm,vmx: Preserve CR4 across VM entry · d974baa3

由 Andy Lutomirski 提交于 10月 08, 2014

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry.  Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d974baa3

15 10月, 2014 1 次提交

x86: bpf_jit: fix two bugs in eBPF JIT compiler · e0ee9c12

由 Alexei Starovoitov 提交于 10月 10, 2014

1.
JIT compiler using multi-pass approach to converge to final image size,
since x86 instructions are variable length. It starts with large
gaps between instructions (so some jumps may use imm32 instead of imm8)
and iterates until total program size is the same as in previous pass.
This algorithm works only if program size is strictly decreasing.
Programs that use LD_ABS insn need additional code in prologue, but it
was not emitted during 1st pass, so there was a chance that 2nd pass would
adjust imm32->imm8 jump offsets to the same number of bytes as increase in
prologue, which may cause algorithm to erroneously decide that size converged.
Fix it by always emitting largest prologue in the first pass which
is detected by oldproglen==0 check.
Also change error check condition 'proglen != oldproglen' to fail gracefully.

2.
while staring at the code realized that 64-byte buffer may not be enough
when 1st insn is large, so increase it to 128 to avoid buffer overflow
(theoretical maximum size of prologue+div is 109) and add runtime check.

Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Tested-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0ee9c12

14 10月, 2014 7 次提交

kvm: ensure hard lockup detection is disabled by default · 9919e39a

由 Ulrich Obergfell 提交于 10月 13, 2014

Use watchdog_enable_hardlockup_detector() to set hard lockup detection's
default value to false.  It's risky to run this detection in a guest, as
false positives are easy to trigger, especially if the host is
overcommitted.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NAndrew Jones <drjones@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9919e39a

arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable · bd5cfb89

由 Xishi Qiu 提交于 10月 13, 2014

If all the nodes are marked hotpluggable, alloc node data will fail.
Because __next_mem_range_rev() will skip the hotpluggable memory
regions.  numa_clear_kernel_node_hotplug() is called after alloc node
data.

numa_init()
    ...
    ret = init_func();  // this will mark hotpluggable flag from SRAT
    ...
    memblock_set_bottom_up(false);
    ...
    ret = numa_register_memblks(&numa_meminfo);  // this will alloc node data(pglist_data)
    ...
    numa_clear_kernel_node_hotplug();  // in case all the nodes are hotpluggable
    ...

numa_register_memblks()
    setup_node_data()
        memblock_find_in_range_node()
            __memblock_find_range_top_down()
                for_each_mem_range_rev()
                    __next_mem_range_rev()

This patch moves numa_clear_kernel_node_hotplug() into
numa_register_memblks(), clear kernel node hotpluggable flag before
alloc node data, then alloc node data won't fail even all the nodes
are hotpluggable.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd5cfb89

arch/x86/kernel/cpu/common.c: fix unused symbol warning · e48510f4

由 Andrew Morton 提交于 10月 13, 2014

x86_64 allnoconfig:

arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used

Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e48510f4

x86: use optimized ioresource lookup in ioremap function · 906e36c5

由 Mike Travis 提交于 10月 13, 2014

Use the optimized ioresource lookup, "region_is_ram", for the ioremap
function.  If the region is not found, it falls back to the
"page_is_ram" function.  If it is found and it is RAM, then the usual
warning message is issued, and the ioremap operation is aborted.
Otherwise, the ioremap operation continues.
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NAlex Thorlton <athorlton@sgi.com>
Reviewed-by: NCliff Wickman <cpw@sgi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

906e36c5

kexec-bzimage64: fix sparse warnings · f8da964d

由 Vivek Goyal 提交于 10月 13, 2014

David Howells brought to my attention the mails generated by kbuild test
bot and following sparse warnings were present. This patch fixes these
warnings.

arch/x86/kernel/kexec-bzimage64.c:270:5: warning: symbol 'bzImage64_probe' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:328:6: warning: symbol 'bzImage64_load' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:517:5: warning: symbol 'bzImage64_cleanup' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:531:5: warning: symbol 'bzImage64_verify_sig' was not declared. Should it be static?
arch/x86/kernel/kexec-bzimage64.c:546:23: warning: symbol 'kexec_bzImage64_ops' was not declared. Should it be static?
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Reported-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f8da964d

kexec: check if crashk_res_low exists when exclude it from crash mem ranges · a2d6aa8f

由 Baoquan He 提交于 10月 13, 2014

Add a check if crashk_res_low exists just like GART region does.  If
crashk_res_low doesn't exist, calling exclude_mem_range is unnecessary.

Meanwhile, since crashk_res_low has been initialized at definition, it's
safe just use "if (crashk_low_res.end)" to check if it's exist.  And this
can make it consistent with other places of check.
Signed-off-by: NBaoquan He <bhe@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2d6aa8f

arch/x86/purgatory/Makefile: try to use automatic variable in kexec purgatory makefile · 887f4f86

由 Baoquan He 提交于 10月 13, 2014

Make the Makefile of kexec purgatory be consistent with others in linux
src tree, and make it look generic and simple.
Signed-off-by: NBaoquan He <bhe@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

887f4f86

13 10月, 2014 1 次提交

um: remove csum_partial_copy_generic_i386 to clean up exception table · 67131230

由 Honggang Li 提交于 6月 05, 2014

arch/x86/um/checksum_32.S had been copy & paste from x86. When build
x86 uml, csum_partial_copy_generic_i386 mess up the exception table.
In fact, exception table dose not work in uml kernel.

And csum_partial_copy_generic_i386 never been called. So, delete it.
Signed-off-by: NHonggang Li <enjoymindful@gmail.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

67131230

10 10月, 2014 2 次提交

nosave: consolidate __nosave_{begin,end} in <asm/sections.h> · 7f8998c7

由 Geert Uytterhoeven 提交于 10月 09, 2014

The different architectures used their own (and different) declarations:

    extern __visible const void __nosave_begin, __nosave_end;
    extern const void __nosave_begin, __nosave_end;
    extern long __nosave_begin, __nosave_end;

Consolidate them using the first variant in <asm/sections.h>.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f8998c7

mm: remove misleading ARCH_USES_NUMA_PROT_NONE · 6a33979d

由 Mel Gorman 提交于 10月 09, 2014

ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
_PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
fault scanner.  This was found to be conceptually confusing with a lot of
implicit assumptions and it was asked that an alternative be found.

Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
bits and shrunk the maximum possible swap size but it did not go far
enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
but the relics still exist.

This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
duplication in powerpc vs the generic implementation by defining the types
the core NUMA helpers expected to exist from x86 with their ppc64
equivalent.  This necessitated that a PTE bit mask be created that
identified the bits that distinguish present from NUMA pte entries but it
is expected this will only differ between arches based on _PAGE_PROTNONE.
The naming for the generic helpers was taken from x86 originally but ppc64
has types that are equivalent for the purposes of the helper so they are
mapped instead of duplicating code.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6a33979d

09 10月, 2014 2 次提交

x86/build: Add arch/x86/purgatory/ make generated files to gitignore · 4ea48a01

由 Shuah Khan 提交于 9月 29, 2014

The following generated files are missing from gitignore
and show up in git status after x86_64 build. Add them
to gitignore.

    arch/x86/purgatory/kexec-purgatory.c
    arch/x86/purgatory/purgatory.ro
Signed-off-by: NShuah Khan <shuahkh@osg.samsung.com>
Link: http://lkml.kernel.org/r/1412016116-7213-1-git-send-email-shuahkh@osg.samsung.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

4ea48a01

handle suicide on late failure exits in execve() in search_binary_handler() · 19d860a1

由 Al Viro 提交于 5月 04, 2014

... rather than doing that in the guts of ->load_binary().
[updated to fix the bug spotted by Shentino - for SIGSEGV we really need
something stronger than send_sig_info(); again, better do that in one place]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

19d860a1

08 10月, 2014 7 次提交

x86: Unwind-annotate thunk_32.S · f74954f0

由 Jan Beulich 提交于 9月 24, 2014

Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/542291CA0200007800038085@mail.emea.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

f74954f0

x86: Fix section conflict for numachip · 2dee5c43

由 Andi Kleen 提交于 9月 24, 2014

A variable cannot be both __read_mostly and const. This
is a meaningless combination.

Just make it only const.

This fixes the LTO build with numachip enabled.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1411533139-25708-1-git-send-email-andi@firstfloor.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

2dee5c43

x86: Reject x32 executables if x32 ABI not supported · 0e6d3112

由 Ben Hutchings 提交于 9月 07, 2014

It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled.  However all its syscalls
will fail, even _exit().  This usually causes it to segfault.

Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk
Cc: stable@vger.kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

0e6d3112

x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache() · aece118e

由 Bryan O'Donoghue 提交于 10月 07, 2014

Intel processors which don't report cache information via cpuid(2)
or cpuid(4) need quirk code in the legacy_cache_size callback to
report this data. For Intel that callback is is intel_size_cache().

This patch enables calling of cpu_detect_cache_sizes() inside of
init_intel() and hence the calling of the legacy_cache callback in
intel_size_cache(). Adding this call will ensure that PIII Tualatin
currently in intel_size_cache() and Quark SoC X1000 being added to
intel_size_cache() in this patch will report their respective cache
sizes.

This model of calling cpu_detect_cache_sizes() is consistent with
AMD/Via/Cirix/Transmeta and Centaur.

Also added is a string to idenitfy the Quark as Quark SoC X1000
giving better and more descriptive output via /proc/cpuinfo

Adding cpu_detect_cache_sizes to init_intel() will enable calling
of intel_size_cache() on Intel processors which currently no code
can reach. Therefore this patch will also re-enable reporting
of PIII Tualatin cache size information as well as add
Quark SoC X1000 support.

Comment text and cache flow logic suggested by Thomas Gleixner
Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

aece118e

x86: Quark: Comment setup_arch() to document TLB/PGE bug · 2075244f

由 Bryan O'Donoghue 提交于 10月 07, 2014

Quark SoC X1000 advertises Page Global Enable for it's
Translation Lookaside Buffer via cpuid. The silicon does not
in fact support PGE and hence will not flush the TLB when CR4.PGE
is rewritten. The Quark documentation makes clear the necessity to
instead rewrite CR3 in order to flush any TLB entries, irrespective
of the state of CR4.PGE or an individual PTE.PGE

See Intel Quark Core DevMan_001.pdf section 6.4.11

In setup.c setup_arch() the code will load_cr3() and then do a
__flush_tlb_all().

On Quark the entire TLB will be flushed at the load_cr3().
The __flush_tlb_all() have no effect and can be safely ignored.

Later on in the boot process we switch off the flag for cpu_has_pge()
which means that subsequent calls to __flush_tlb_all() will
call __flush_tlb() not __flush_tlb_global() flushing the TLB in the
correct way via load_cr3() not CR4.PGE rewrite

This patch documents the behaviour of flushing the TLB for Quark in
setup_arch()

Comment text suggested by Thomas Gleixner
Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
Cc: davej@redhat.com
Cc: hmh@hmh.eng.br
Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

2075244f

x86: Improve cmpxchg8b_emu.S · 5f1d919a

由 Jan Beulich 提交于 9月 24, 2014

- don't include unneeded headers
- drop redundant entry point label
- complete unwind annotations
- use .L prefix on local labels to not clutter the symbol table
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/5422917E0200007800038081@mail.emea.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

5f1d919a

x86: Improve cmpxchg16b_emu.S · 3f635721

由 Jan Beulich 提交于 9月 24, 2014

- don't include unneeded headers
- don't open-code PER_CPU_VAR()
- drop redundant entry point label
- complete unwind annotations
- use .L prefix on local label to not clutter the symbol table
Signed-off-by: NJan Beulich <jbeulich@suse.com>
Link: http://lkml.kernel.org/r/542290BC020000780003807D@mail.emea.novell.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

3f635721

07 10月, 2014 1 次提交

x86_64, entry: Filter RFLAGS.NT on entry from userspace · 8c7aa698

由 Andy Lutomirski 提交于 10月 01, 2014

The NT flag doesn't do anything in long mode other than causing IRET
to #GP.  Oddly, CPL3 code can still set NT using popf.

Entry via hardware or software interrupt clears NT automatically, so
the only relevant entries are fast syscalls.

If user code causes kernel code to run with NT set, then there's at
least some (small) chance that it could cause trouble.  For example,
user code could cause a call to EFI code with NT set, and who knows
what would happen?  Apparently some games on Wine sometimes do
this (!), and, if an IRET return happens, they will segfault.  That
segfault cannot be handled, because signal delivery fails, too.

This patch programs the CPU to clear NT on entry via SYSCALL (both
32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
in software on entry via SYSENTER.

To save a few cycles, this borrows a trick from Jan Beulich in Xen:
it checks whether NT is set before trying to clear it.  As a result,
it seems to have very little effect on SYSENTER performance on my
machine.

There's another minor bug fix in here: it looks like the CFI
annotations were wrong if CONFIG_AUDITSYSCALL=n.

Testers beware: on Xen, SYSENTER with NT set turns into a GPF.

I haven't touched anything on 32-bit kernels.

The syscall mask change comes from a variant of this patch by Anish
Bhatt.

Note to stable maintainers: there is no known security issue here.
A misguided program can set NT and cause the kernel to try and fail
to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275

Cc: <stable@vger.kernel.org>
Reported-by: NAnish Bhatt <anish@chelsio.com>
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>

8c7aa698

06 10月, 2014 1 次提交

x86/xen: Set EFER.NX and EFER.SCE in PVH guests · a2ef5dc2

由 Mukesh Rathor 提交于 9月 10, 2014

This fixes two bugs in PVH guests:

  - Not setting EFER.NX means the NX bit in page table entries is
    ignored on Intel processors and causes reserved bit page faults on
    AMD processors.

  - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for
    pvh guest") PVH guests are required to set EFER.SCE to enable the
    SYSCALL instruction.

Secondary VCPUs are started with pagetables with the NX bit set so
EFER.NX must be set before using any stack or data segment.
xen_pvh_cpu_early_init() is the new secondary VCPU entry point that
sets EFER before jumping to cpu_bringup_and_idle().
Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com>
Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

a2ef5dc2

03 10月, 2014 6 次提交

xen: eliminate scalability issues from initrd handling · d1e9abd6

由 Juergen Gross 提交于 9月 17, 2014

Size restrictions native kernels wouldn't have resulted from the initrd
getting mapped into the initial mapping. The kernel doesn't really need
the initrd to be mapped, so use infrastructure available in Xen to avoid
the mapping and hence the restriction.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

d1e9abd6

locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read() · 2291059c

由 Pranith Kumar 提交于 9月 23, 2014

Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile.
This is purely a stylistic change.
Signed-off-by: NPranith Kumar <bobby.prani@gmail.com>
Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: NMax Filippov <jcmvbkbc@gmail.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2291059c

perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment · cc6cd47e

由 Wei Huang 提交于 9月 24, 2014

PMU checking can fail due to various reasons. On native machine, this
is mostly caused by faulty hardware and it is reasonable to use
KERN_ERR in reporting. However, when kernel is running on virtualized
environment, this checking can fail if virtual PMU is not supported
(e.g. KVM on AMD host). It is annoying to see an error message on
splash screen, even though we know such failure is benign on
virtualized environment.

This patch checks if the kernel is running in a virtualized environment.
If so, it will use KERN_INFO in reporting, which reduces the syslog
priority of them. This patch was tested successfully on KVM.
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-wei@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

cc6cd47e

perf/x86/intel/uncore: Fix minor race in box set up · 4f971248

由 Andi Kleen 提交于 9月 22, 2014

I was looking for the trinity oops cause in the uncore driver.
(so far didn't found it)

However I found this tiny race: when a box is set up two threads on the
same CPU, they may be setting up the box in parallel (e.g. with kernel
preemption). This could lead to the reference count being increasing
too much. Always recheck there is no existing cpu reference inside the lock.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: eranian@google.com
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Link: http://lkml.kernel.org/r/1411424826-15629-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4f971248

sched/x86: Fix up typo in topology detection · 728e5653

由 Dave Hansen 提交于 9月 30, 2014

Commit:

  cebf15eb ("x86, sched: Add new topology for multi-NUMA-node CPUs")

some code to try to detect the situation where we have a NUMA node
inside of the "DIE" sched domain.

It detected this by looking for cpus which match_die() but do not match
NUMA nodes via topology_same_node().

I wrote it up as:

	if (match_die(c, o) == !topology_same_node(c, o))

which actually seemed to work some of the time, albiet
accidentally.

It should have been doing an &&, not an ==.

This code essentially chopped off the "DIE" domain on one of
Andrew Morton's systems.  He reported that this patch fixed his
issue.
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Lan Tianyu <tianyu.lan@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Toshi Kani <toshi.kani@hp.com>
Link: http://lkml.kernel.org/r/20140930214546.FD481CFF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

728e5653

kvm: do not handle APIC access page if in-kernel irqchip is not in use · f439ed27

由 Paolo Bonzini 提交于 10月 02, 2014

This fixes the following OOPS:

   loaded kvm module (v3.17-rc1-168-gcec26bc3)
   BUG: unable to handle kernel paging request at fffffffffffffffe
   IP: [<ffffffff81168449>] put_page+0x9/0x30
   PGD 1e15067 PUD 1e17067 PMD 0
   Oops: 0000 [#1] PREEMPT SMP
    [<ffffffffa063271d>] ? kvm_vcpu_reload_apic_access_page+0x5d/0x70 [kvm]
    [<ffffffffa013b6db>] vmx_vcpu_reset+0x21b/0x470 [kvm_intel]
    [<ffffffffa0658816>] ? kvm_pmu_reset+0x76/0xb0 [kvm]
    [<ffffffffa064032a>] kvm_vcpu_reset+0x15a/0x1b0 [kvm]
    [<ffffffffa06403ac>] kvm_arch_vcpu_setup+0x2c/0x50 [kvm]
    [<ffffffffa062e540>] kvm_vm_ioctl+0x200/0x780 [kvm]
    [<ffffffff81212170>] do_vfs_ioctl+0x2d0/0x4b0
    [<ffffffff8108bd99>] ? __mmdrop+0x69/0xb0
    [<ffffffff812123d1>] SyS_ioctl+0x81/0xa0
    [<ffffffff8112a6f6>] ? __audit_syscall_exit+0x1f6/0x2a0
    [<ffffffff817229e9>] system_call_fastpath+0x16/0x1b
   Code: c6 78 ce a3 81 4c 89 e7 e8 d9 80 ff ff 0f 0b 4c 89 e7 e8 8f f6 ff ff e9 fa fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 1e 8b 47 1c 85 c0 74 27 f0
   RIP  [<ffffffff81193045>] put_page+0x5/0x50

when not using the in-kernel irqchip ("-machine kernel_irqchip=off"
with QEMU).  The fix is to make the same check in
kvm_vcpu_reload_apic_access_page that we already have
in vmx.c's vm_need_virtualize_apic_accesses().
Reported-by: NJan Kiszka <jan.kiszka@siemens.com>
Tested-by: NJan Kiszka <jan.kiszka@siemens.com>
Fixes: 4256f43fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f439ed27

02 10月, 2014 4 次提交

Revert "crypto: aesni - disable "by8" AVX CTR optimization" · 5cfed7b3

由 Mathias Krause 提交于 9月 28, 2014

This reverts commit 7da4b29d.

Now, that the issue is fixed, we can re-enable the code.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

5cfed7b3

crypto: aesni - remove unused defines in "by8" variant · e3b3bb5a

由 Mathias Krause 提交于 9月 28, 2014

The defines for xkey3, xkey6 and xkey9 are not used in the code. They're
probably left overs from merging the three source files for 128, 192 and
256 bit AES. They can safely be removed.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e3b3bb5a

crypto: aesni - fix counter overflow handling in "by8" variant · 80dca473

由 Mathias Krause 提交于 9月 28, 2014

The "by8" CTR AVX implementation fails to propperly handle counter
overflows. That was the reason it got disabled in commit 7da4b29d
("crypto: aesni - disable "by8" AVX CTR optimization").

Fix the overflow handling by incrementing the counter block as a double
quad word, i.e. a 128 bit, and testing for overflows afterwards. We need
to use VPTEST to do so as VPADD* does not set the flags itself and
silently drops the carry bit.

As this change adds branches to the hot path, minor performance
regressions  might be a side effect. But, OTOH, we now have a conforming
implementation -- the preferable goal.

A tcrypt test on a SandyBridge system (i7-2620M) showed almost identical
numbers for the old and this version with differences within the noise
range. A dm-crypt test with the fixed version gave even slightly better
results for this version. So the performance impact might not be as big
as expected.
Tested-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

80dca473

x86, boot, kaslr: Fix nuisance warning on 32-bit builds · 20cc2888

由 Kees Cook 提交于 10月 01, 2014

Building 32-bit threw a warning on kASLR enabled builds:

arch/x86/boot/compressed/aslr.c: In function ‘mem_avoid_overlap’:
arch/x86/boot/compressed/aslr.c:198:17: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
   avoid.start = (u64)ptr;
                 ^

This fixes the warning; unsigned long should have been used here.
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20141001183632.GA11431@www.outflux.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>

20cc2888

27 9月, 2014 1 次提交

bpf: enable bpf syscall on x64 and i386 · 749730ce

由 Alexei Starovoitov 提交于 9月 26, 2014

done as separate commit to ease conflict resolution
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

749730ce

25 9月, 2014 1 次提交

x86/efi: Truncate 64-bit values when calling 32-bit OutputString() · 115c6628

由 Matt Fleming 提交于 9月 24, 2014

If we're executing the 32-bit efi_char16_printk() code path (i.e.
running on top of 32-bit firmware) we know that efi_early->text_output
will be a 32-bit value, even though ->text_output has type u64.

Unfortunately, we currently pass ->text_output directly to
efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
value onto the stack, causing the other parameters to be misaligned.

The way we handle this in the rest of the EFI boot stub is to pass
pointers as arguments to efi_early->call(), which automatically do the
right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
the upper 32-bits of the argument register if running in 64-bit mode
with 32-bit firmware).

This fixes a corruption bug when printing strings from the 32-bit EFI
boot stub.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241Signed-off-by: NMatt Fleming <matt.fleming@intel.com>

115c6628

24 9月, 2014 5 次提交

x86/relocs: Make per_cpu_load_addr static · eeeda4cd

由 Ben Hutchings 提交于 9月 24, 2014

per_cpu_load_addr is only used for 64-bit relocations, but is
declared in both configurations of relocs.c - with different
types. This has undefined behaviour in general. GNU ld is
documented to use the larger size in this case, but other tools
may differ and some warn about this.

References: https://bugs.debian.org/748577Reported-by: NMichael Tautschnig <mt@debian.org>
Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
Cc: 748577@bugs.debian.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1411561812.3659.23.camel@decadent.org.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>

eeeda4cd

x86/lib/Makefile: Remove the unnecessary "+= thunk_64.o" · 212be3b2

由 Oleg Nesterov 提交于 9月 21, 2014

Trivial. We have "lib-y += thunk_$(BITS).o" at the start, no
need to add thunk_64.o if !CONFIG_X86_32.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921184232.GB23727@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

212be3b2

x86: Speed up ___preempt_schedule*() by using THUNK helpers · 0ad6e3c5

由 Oleg Nesterov 提交于 9月 21, 2014

___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is
suboptimal, we do not need to save/restore the callee-saved
register. And we already have arch/x86/lib/thunk_*.S which
implements the similar asm wrappers, so it makes sense to
redefine ___preempt_schedule() as "THUNK ..." and remove
preempt.S altogether.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140921184153.GA23727@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0ad6e3c5

crypto: aesni - disable "by8" AVX CTR optimization · 7da4b29d

由 Mathias Krause 提交于 9月 23, 2014

The "by8" implementation introduced in commit 22cddcc7 ("crypto: aes
- AES CTR x86_64 "by8" AVX optimization") is failing crypto tests as it
handles counter block overflows differently. It only accounts the right
most 32 bit as a counter -- not the whole block as all other
implementations do. This makes it fail the cryptomgr test #4 that
specifically tests this corner case.

As we're quite late in the release cycle, just disable the "by8" variant
for now.
Reported-by: NRomain Francoise <romain@orebokech.com>
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

7da4b29d

sched: Fix unreleased llc_shared_mask bit during CPU hotplug · 03bd4e1f

由 Wanpeng Li 提交于 9月 24, 2014

The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
	PGD 5a9d5067 PUD 13067 PMD 0
	Oops: 0000 [#3] SMP
	[...]
	Call Trace:
	load_balance
	? _raw_spin_unlock_irqrestore
	idle_balance
	__schedule
	schedule
	schedule_timeout
	? lock_timer_base
	schedule_timeout_uninterruptible
	msleep
	lock_device_hotplug_sysfs
	online_store
	dev_attr_store
	sysfs_write_file
	vfs_write
	SyS_write
	system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

  https://lkml.org/lkml/2014/7/22/1018

His case is here:

	==
	Here is an example on my system.
	My system has 4 sockets and each socket has 15 cores and HT is
	enabled. In this case, each core of sockes is numbered as
	follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-44, 90-104
	Socket#3 | 45-59, 105-119

	Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

	It means that last level cache of Socket#2 is shared with
	CPU#30-44 and 90-104.

	When hot-removing socket#2 and #3, each core of sockets is
	numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89

	But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
	remains having 0x3fff80000001fffc0000000.

	After that, when hot-adding socket#2 and #3, each core of
	sockets is numbered as follows:

		 | CPU#
	Socket#0 | 0-14 , 60-74
	Socket#1 | 15-29, 75-89
	Socket#2 | 30-59
	Socket#3 | 90-119

	Then llc_shared_mask of CPU#30 becomes
	0x3fff8000fffffffc0000000. It means that last level cache of
	Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
	the wrong value.
Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com>
Tested-by: NLinn Crosetto <linn@hp.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NToshi Kani <toshi.kani@hp.com>
Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

03bd4e1f

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功