提交 · c70f4abef07a99db6654be4f1190bacd69aa4365 · openanolis / cloud-kernel

17 7月, 2015 4 次提交

crypto: poly1305 - Add a SSE2 SIMD variant for x86_64 · c70f4abe

由 Martin Willi 提交于 7月 16, 2015

Implements an x86_64 assembler driver for the Poly1305 authenticator. This
single block variant holds the 130-bit integer in 5 32-bit words, but uses
SSE to do two multiplications/additions in parallel.

When calling updates with small blocks, the overhead for kernel_fpu_begin/
kernel_fpu_end() negates the perfmance gain. We therefore use the
poly1305-generic fallback for small updates.

For large messages, throughput increases by ~5-10% compared to
poly1305-generic:

testing speed of poly1305 (poly1305-generic)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 4080026 opers/sec, 391682496 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 6221094 opers/sec, 597225024 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9609750 opers/sec, 922536057 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1459379 opers/sec, 420301267 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2115179 opers/sec, 609171609 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3729874 opers/sec, 1074203856 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 593000 opers/sec, 626208000 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1081536 opers/sec, 1142102332 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 302077 opers/sec, 628320576 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 554384 opers/sec, 1153120176 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 278715 opers/sec, 1150536345 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 140202 opers/sec, 1153022070 bytes/sec

testing speed of poly1305 (poly1305-simd)
test 0 ( 96 byte blocks, 16 bytes per update, 6 updates): 3790063 opers/sec, 363846076 bytes/sec
test 1 ( 96 byte blocks, 32 bytes per update, 3 updates): 5913378 opers/sec, 567684355 bytes/sec
test 2 ( 96 byte blocks, 96 bytes per update, 1 updates): 9352574 opers/sec, 897847104 bytes/sec
test 3 ( 288 byte blocks, 16 bytes per update, 18 updates): 1362145 opers/sec, 392297990 bytes/sec
test 4 ( 288 byte blocks, 32 bytes per update, 9 updates): 2007075 opers/sec, 578037628 bytes/sec
test 5 ( 288 byte blocks, 288 bytes per update, 1 updates): 3709811 opers/sec, 1068425798 bytes/sec
test 6 ( 1056 byte blocks, 32 bytes per update, 33 updates): 566272 opers/sec, 597984182 bytes/sec
test 7 ( 1056 byte blocks, 1056 bytes per update, 1 updates): 1111657 opers/sec, 1173910108 bytes/sec
test 8 ( 2080 byte blocks, 32 bytes per update, 65 updates): 288857 opers/sec, 600823808 bytes/sec
test 9 ( 2080 byte blocks, 2080 bytes per update, 1 updates): 590746 opers/sec, 1228751888 bytes/sec
test 10 ( 4128 byte blocks, 4128 bytes per update, 1 updates): 301825 opers/sec, 1245936902 bytes/sec
test 11 ( 8224 byte blocks, 8224 bytes per update, 1 updates): 153075 opers/sec, 1258896201 bytes/sec

Benchmark results from a Core i5-4670T.
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c70f4abe

crypto: chacha20 - Add an eight block AVX2 variant for x86_64 · 3d1e93cd

由 Martin Willi 提交于 7月 16, 2015

Extends the x86_64 ChaCha20 implementation by a function processing eight
ChaCha20 blocks in parallel using AVX2.

For large messages, throughput increases by ~55-70% compared to four block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 42249230 operations in 10 seconds (675987680 bytes)
test 1 (256 bit key, 64 byte blocks): 46441641 operations in 10 seconds (2972265024 bytes)
test 2 (256 bit key, 256 byte blocks): 33028112 operations in 10 seconds (8455196672 bytes)
test 3 (256 bit key, 1024 byte blocks): 11568759 operations in 10 seconds (11846409216 bytes)
test 4 (256 bit key, 8192 byte blocks): 1448761 operations in 10 seconds (11868250112 bytes)

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 41999675 operations in 10 seconds (671994800 bytes)
test 1 (256 bit key, 64 byte blocks): 45805908 operations in 10 seconds (2931578112 bytes)
test 2 (256 bit key, 256 byte blocks): 32814947 operations in 10 seconds (8400626432 bytes)
test 3 (256 bit key, 1024 byte blocks): 19777167 operations in 10 seconds (20251819008 bytes)
test 4 (256 bit key, 8192 byte blocks): 2279321 operations in 10 seconds (18672197632 bytes)

Benchmark results from a Core i5-4670T.
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

3d1e93cd

crypto: chacha20 - Add a four block SSSE3 variant for x86_64 · 274f938e

由 Martin Willi 提交于 7月 16, 2015

Extends the x86_64 SSSE3 ChaCha20 implementation by a function processing
four ChaCha20 blocks in parallel. This avoids the word shuffling needed
in the single block variant, further increasing throughput.

For large messages, throughput increases by ~110% compared to single block
SSSE3:

testing speed of chacha20 (chacha20-simd) encryption
test 0 (256 bit key, 16 byte blocks): 43141886 operations in 10 seconds (690270176 bytes)
test 1 (256 bit key, 64 byte blocks): 46845874 operations in 10 seconds (2998135936 bytes)
test 2 (256 bit key, 256 byte blocks): 18458512 operations in 10 seconds (4725379072 bytes)
test 3 (256 bit key, 1024 byte blocks): 5360533 operations in 10 seconds (5489185792 bytes)
test 4 (256 bit key, 8192 byte blocks): 692846 operations in 10 seconds (5675794432 bytes)

Benchmark results from a Core i5-4670T.
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

274f938e

crypto: chacha20 - Add a SSSE3 SIMD variant for x86_64 · c9320b6d

由 Martin Willi 提交于 7月 16, 2015

Implements an x86_64 assembler driver for the ChaCha20 stream cipher. This
single block variant works on a single state matrix using SSE instructions.
It requires SSSE3 due the use of pshufb for efficient 8/16-bit rotate
operations.

For large messages, throughput increases by ~65% compared to
chacha20-generic:

testing speed of chacha20 (chacha20-generic) encryption
test 0 (256 bit key, 16 byte blocks): 45089207 operations in 10 seconds (721427312 bytes)
test 1 (256 bit key, 64 byte blocks): 43839521 operations in 10 seconds (2805729344 bytes)
test 2 (256 bit key, 256 byte blocks): 12702056 operations in 10 seconds (3251726336 bytes)
test 3 (256 bit key, 1024 byte blocks): 3371173 operations in 10 seconds (3452081152 bytes)
test 4 (256 bit key, 8192 byte blocks): 422468 operations in 10 seconds (3460857856 bytes)

Benchmark results from a Core i5-4670T.
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

c9320b6d

14 7月, 2015 1 次提交

crypto: aesni - Use new IV convention · e9b8d2c2

由 Herbert Xu 提交于 7月 09, 2015

This patch converts rfc4106 to the new calling convention where
the IV is now in the AD and needs to be skipped.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

e9b8d2c2

04 7月, 2015 6 次提交

x86/fpu: Fix boot crash in the early FPU code · b96fecbf

由 Ingo Molnar 提交于 7月 04, 2015

Jan Kara and Thomas Gleixner reported boot crashes in the FPU
code:

  general protection fault: 0000 [#1] SMP
  RIP: 0010:[<ffffffff81048a6c>]  [<ffffffff81048a6c>] mxcsr_feature_mask_init+0x1c/0x40

  2b:*  0f ae 85 00 fe ff ff    fxsave -0x200(%rbp)

and bisected it down to the following FPU commit:

   91a8c2a5 ("x86/fpu: Clean up and fix MXCSR handling")

The reason is that the on-stack FPU registers state variable,
used by the FXSAVE instruction, did not have the required
minimum alignment of 16 bytes, causing the general protection
fault.

This is most likely a GCC bug in older GCC versions, but the
offending commit also added a bogus extra 32-byte alignment
(which GCC ignored too).

So fix this bug by making the variable static again, but also
mark it __initdata this time, because fpu__init_system_mxcsr()
is now an __init function.
Reported-and-bisected-by: NJan Kara <jack@suse.cz>
Reported-bisected-and-tested-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150704075819.GA9201@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

b96fecbf

kvm: add hyper-v crash msrs values · a88464a8

由 Andrey Smetanin 提交于 7月 02, 2015

Added Hyper-V crash msrs values - HV_X64_MSR_CRASH*.
Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NPeter Hornyack <peterhornyack@google.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
CC: Gleb Natapov <gleb@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a88464a8

KVM: x86: remove data variable from kvm_get_msr_common · b0996ae4

由 Nicolas Iooss 提交于 6月 29, 2015

Commit 609e36d3 ("KVM: x86: pass host_initiated to functions that
read MSRs") modified kvm_get_msr_common function to use msr_info->data
instead of data but missed one occurrence.  Replace it and remove the
unused local variable.

Fixes: 609e36d3 ("KVM: x86: pass host_initiated to functions that
read MSRs")
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b0996ae4

KVM: x86: keep track of LVT0 changes under APICv · 59fd1323

由 Radim Krčmář 提交于 6月 30, 2015

Memory-mapped LVT0 register already contains the new value when APICv
traps so we can't directly detect a change.
Memorize a bit we are interested in to enable legacy NMI watchdog.
Suggested-by: NYoshida Nobuo <yoshida.nb@ncos.nec.co.jp>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

59fd1323

KVM: x86: properly restore LVT0 · db138562

由 Radim Krčmář 提交于 6月 30, 2015

Legacy NMI watchdog didn't work after migration/resume, because
vapics_in_nmi_mode was left at 0.

Cc: stable@vger.kernel.org
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db138562

KVM: x86: make vapics_in_nmi_mode atomic · 42720138

由 Radim Krčmář 提交于 7月 01, 2015

Writes were a bit racy, but hard to turn into a bug at the same time.
(Particularly because modern Linux doesn't use this feature anymore.)
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
[Actually the next patch makes it much, much easier to trigger the race
 so I'm including this one for stable@ as well. - Paolo]
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42720138

01 7月, 2015 3 次提交

x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit · c1bd55f9

由 Josh Triplett 提交于 6月 30, 2015

For 32-bit userspace on a 64-bit kernel, this requires modifying
stub32_clone to actually swap the appropriate arguments to match
CONFIG_CLONE_BACKWARDS, rather than just leaving the C argument for tls
broken.

Patch co-authored by Josh Triplett and Thiago Macieira.
Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thiago Macieira <thiago.macieira@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1bd55f9

x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel command-line. · a846f479

由 KarimAllah Ahmed 提交于 6月 30, 2015

Any parameter passed after '--' in the kernel command-line will not be
parsed by the kernel at all, instead it will be passed directly to init
process.

Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from
kexec load, and if this command-line is used to pass parameters to init
process this means that 'elfcorehdr' will not be parsed as a kernel
parameter at all which will be a problem for vmcore subsystem since it
will know nothing about the location of the ELF structure!

Prepending 'elfcorehdr' instead of appending it fixes this problem since
it ensures that it always comes before '--' and so it's always parsed as a
kernel command-line parameter.

Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was
also used to embedd a command-line to the crash dump kernel and this
command-line contains '--' since the current behavior of the kernel is to
actually append the boot loader command-line to the embedded command-line.
Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: Haren Myneni <hbabu@us.ibm.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a846f479

x86: mm: enable deferred struct page initialisation on x86-64 · 3b242c66

由 Mel Gorman 提交于 6月 30, 2015

Subject says it all.  Other architectures may enable on a case-by-case
basis after auditing early_pfn_to_nid and testing.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Tested-by: NNate Zimmer <nzimmer@sgi.com>
Tested-by: NWaiman Long <waiman.long@hp.com>
Tested-by: NDaniel J Blueman <daniel@numascale.com>
Acked-by: NPekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b242c66

30 6月, 2015 3 次提交

perf/x86: Fix 'active_events' imbalance · 93472aff

由 Peter Zijlstra 提交于 6月 24, 2015

Commit 1b7b938f ("perf/x86/intel: Fix PMI handling for Intel PT") conditionally
increments active_events in x86_add_exclusive() but unconditionally decrements in
x86_del_exclusive().

These extra decrements can lead to the situation where
active_events is zero and thus the PMI handler is 'disabled'
while we have active events on the PMU generating PMIs.

This leads to a truckload of:

  Uhhuh. NMI received for unknown reason 21 on CPU 28.
  Do you have a strange power saving mode enabled?
  Dazed and confused, but trying to continue

messages and generally messes up perf.

Remove the condition on the increment, double increment balanced
by a double decrement is perfectly fine.

Restructure the code a little bit to make the unconditional inc
a bit more natural.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: alexander.shishkin@linux.intel.com
Cc: brgerst@gmail.com
Cc: dvlasenk@redhat.com
Cc: luto@amacapital.net
Cc: oleg@redhat.com
Fixes: 1b7b938f ("perf/x86/intel: Fix PMI handling for Intel PT")
Link: http://lkml.kernel.org/r/20150624144750.GJ18673@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

93472aff

x86/fpu: Fix FPU related boot regression when CPUID masking BIOS feature is enabled · db52ef74

由 Ingo Molnar 提交于 6月 27, 2015

Mike Galbraith reported:

  " My i7-4790 box is having one hell of a time with this merge
    window, dead in the water.

    BIOS setting "Limit CPUID Maximum" upsets new fpu code
    mightily. "

It turns out that Linux does a double workaround here, as per:

  066941bd ("x86: unmask CPUID levels on Intel CPUs")

it undoes the BIOS workaround - but as a side effect the CPUID
state is not completely constant during early init anymore,
and the new FPU init code did not take this into account.

So what happened is that the xstate init code did not have full
CPUID available, which broke subsequent attempts to use xstate
features.

Fix this by ordering the early FPU init code to after we've
stabilized the CPUID state.
Reported-bisected-and-tested-by: NMike Galbraith <umgwanakikbuti@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150627082514.GA10894@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

db52ef74

intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver · 0a8b8353

由 qipeng.zha 提交于 6月 27, 2015

This driver provides support for PMC control on Apollo Lake platforms.
The PMC is an ARC processor which defines some IPC commands for
communication with other entities in the CPU.
Signed-off-by: Nqipeng.zha <qipeng.zha@intel.com>
[fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDarren Hart <dvhart@linux.intel.com>

0a8b8353

29 6月, 2015 1 次提交

crypto: aesni - fix failing setkey for rfc4106-gcm-aesni · 0fbafd06

由 Tadeusz Struk 提交于 6月 27, 2015

rfc4106(gcm(aes)) uses ctr(aes) to generate hash key. ctr(aes) needs
chainiv, but the chainiv gets initialized after aesni_intel when both
are statically linked so the setkey fails.
This patch forces aesni_intel to be initialized after chainiv.
Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
Tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

0fbafd06

26 6月, 2015 4 次提交

arch, x86: pmem api for ensuring durability of persistent memory updates · 61031952

由 Ross Zwisler 提交于 6月 25, 2015

Based on an original patch by Ross Zwisler [1].

Writes to persistent memory have the potential to be posted to cpu
cache, cpu write buffers, and platform write buffers (memory controller)
before being committed to persistent media.  Provide apis,
memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to
pmem and assert that it is durable in PMEM (a persistent linear address
range).  A '__pmem' attribute is added so sparse can track proper usage
of pointers to pmem.

This continues the status quo of pmem being x86 only for 4.2, but
reworks to ioremap, and wider implementation of memremap() will enable
other archs in 4.3.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
[djbw: various reworks]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

61031952

libnvdimm: Set numa_node to NVDIMM devices · 41d7a6d6

由 Toshi Kani 提交于 6月 19, 2015

ACPI NFIT table has System Physical Address Range Structure entries that
describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is
set in the flags.

Change acpi_nfit_register_region() to map a proximity ID to its node ID,
and set it to a new numa_node field of nd_region_desc, which is then
conveyed to the nd_region device.

The device core arranges for btt and namespace devices to inherit their
node from their parent region.
Signed-off-by: NToshi Kani <toshi.kani@hp.com>
[djbw: move set_dev_node() from region.c to bus.c]
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

41d7a6d6

mm/hugetlb: remove arch_prepare/release_hugepage from arch headers · 08bd4fc1

由 Dominik Dingel 提交于 6月 25, 2015

Nobody used these hooks so they were removed from common code, and can now
be removed from the architectures.
Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

08bd4fc1

um: Don't pollute kernel namespace with uapi · da028d5e

由 Richard Weinberger 提交于 6月 25, 2015

Don't include ptrace uapi stuff in arch headers, it will
pollute the kernel namespace and conflict with existing
stuff.
In this case it fixes clashes with common names like R8.
Signed-off-by: NRichard Weinberger <richard@nod.at>

da028d5e

25 6月, 2015 6 次提交

libnvdimm, pmem: add libnvdimm support to the pmem driver · 9f53f9fa

由 Dan Williams 提交于 6月 09, 2015

nd_pmem attaches to persistent memory regions and namespaces emitted by
the libnvdimm subsystem, and, same as the original pmem driver, presents
the system-physical-address range as a block device.

The existing e820-type-12 to pmem setup is converted to an nvdimm_bus
that emits an nd_namespace_io device.

Note that the X in 'pmemX' is now derived from the parent region.  This
provides some stability to the pmem devices names from boot-to-boot.
The minor numbers are also more predictable by passing 0 to
alloc_disk().

Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jens Axboe <axboe@fb.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
Acked-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NToshi Kani <toshi.kani@hp.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9f53f9fa

x86, mirror: x86 enabling - find mirrored memory ranges · b05b9f5f

由 Tony Luck 提交于 6月 24, 2015

UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory
address ranges.  See UEFI 2.5 spec pages 157-158:

  http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf

On EFI enabled systems scan the memory map and tell memblock about any
mirrored ranges.
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b05b9f5f

mm/memblock: add extra "flags" to memblock to allow selection of memory based on attribute · fc6daaf9

由 Tony Luck 提交于 6月 24, 2015

Some high end Intel Xeon systems report uncorrectable memory errors as a
recoverable machine check.  Linux has included code for some time to
process these and just signal the affected processes (or even recover
completely if the error was in a read only page that can be replaced by
reading from disk).

But we have no recovery path for errors encountered during kernel code
execution.  Except for some very specific cases were are unlikely to ever
be able to recover.

Enter memory mirroring. Actually 3rd generation of memory mirroing.

Gen1: All memory is mirrored
	Pro: No s/w enabling - h/w just gets good data from other side of the
	     mirror
	Con: Halves effective memory capacity available to OS/applications

Gen2: Partial memory mirror - just mirror memory begind some memory controllers
	Pro: Keep more of the capacity
	Con: Nightmare to enable. Have to choose between allocating from
	     mirrored memory for safety vs. NUMA local memory for performance

Gen3: Address range partial memory mirror - some mirror on each memory
      controller
	Pro: Can tune the amount of mirror and keep NUMA performance
	Con: I have to write memory management code to implement

The current plan is just to use mirrored memory for kernel allocations.
This has been broken into two phases:

1) This patch series - find the mirrored memory, use it for boot time
   allocations

2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the
   unused mirrored memory from mm/memblock.c and only give it out to
   select kernel allocations (this is still being scoped because
   page_alloc.c is scary).

This patch (of 3):

Add extra "flags" to memblock to allow selection of memory based on
attribute.  No functional changes
Signed-off-by: NTony Luck <tony.luck@intel.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Xiexiuqi <xiexiuqi@huawei.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc6daaf9

mm: clarify that the function operates on hugepage pte · 8809aa2d

由 Aneesh Kumar K.V 提交于 6月 24, 2015

We have confusing functions to clear pmd, pmd_clear_* and pmd_clear.  Add
_huge_ to pmdp_clear functions so that we are clear that they operate on
hugepage pte.

We don't bother about other functions like pmdp_set_wrprotect,
pmdp_clear_flush_young, because they operate on PTE bits and hence
indicate they are operating on hugepage ptes
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8809aa2d

mm/hugetlb: reduce arch dependent code about hugetlb_prefault_arch_hook · a67a31fa

由 Zhang Zhen 提交于 6月 24, 2015

Currently we have many duplicates in definitions of
hugetlb_prefault_arch_hook.  In all architectures this function is empty.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a67a31fa

mm: new mm hook framework · 2ae416b1

由 Laurent Dufour 提交于 6月 24, 2015

CRIU is recreating the process memory layout by remapping the checkpointee
memory area on top of the current process (criu).  This includes remapping
the vDSO to the place it has at checkpoint time.

However some architectures like powerpc are keeping a reference to the
vDSO base address to build the signal return stack frame by calling the
vDSO sigreturn service.  So once the vDSO has been moved, this reference
is no more valid and the signal frame built later are not usable.

This patch serie is introducing a new mm hook framework, and a new
arch_remap hook which is called when mremap is done and the mm lock still
hold.  The next patch is adding the vDSO remap and unmap tracking to the
powerpc architecture.

This patch (of 3):

This patch introduces a new set of header file to manage mm hooks:
- per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
- a generic header (include/linux/mm-arch-hooks.h)

The architecture which need to overwrite a hook as to redefine it in its
header file, while architecture which doesn't need have nothing to do.

The default hooks are defined in the generic header and are used in the
case the architecture is not defining it.

In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
be moved here.
Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ae416b1

23 6月, 2015 4 次提交

KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs · 6912ac32

由 Wei Huang 提交于 6月 12, 2015

This patch enables AMD guest VM to access (R/W) PMU related MSRs, which
include PERFCTR[0..3] and EVNTSEL[0..3].
Reviewed-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NJoerg Roedel <jroedel@suse.de>
Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6912ac32

KVM: x86/vPMU: Implement AMD vPMU code for KVM · ca724305

由 Wei Huang 提交于 6月 12, 2015

This patch replaces the empty AMD vPMU functions (in pmu_amd.c) with real
implementation.
Reviewed-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ca724305

KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch · 25462f7f

由 Wei Huang 提交于 6月 19, 2015

This patch defines a new function pointer struct (kvm_pmu_ops) to
support vPMU for both Intel and AMD. The functions pointers defined in
this new struct will be linked with Intel and AMD functions later. In the
meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE
events is renamed and moved from Intel specific code to kvm_host.h as a
common struct.
Reviewed-by: NJoerg Roedel <jroedel@suse.de>
Tested-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25462f7f

KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc · 41aac14a

由 Wei Huang 提交于 6月 19, 2015

This function will be part of the kvm_pmu_ops interface.  Introduce
it already.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

41aac14a

22 6月, 2015 1 次提交

x86: Load __USER_DS into DS/ES after resume · ffa64eff

由 Ingo Molnar 提交于 6月 22, 2015

Srinivas Pandruvada reported a problem with system resume from
suspend-to-RAM on 32-bit x86 systems where the DS register of
the CPU is set to __KERNEL_DS instead of __USER_DS on return
to user space which cases a General Protection Fault to occur.

The issue is that DS is set to __KERNEL_DS by the ACPI resume code
path while the SYSEXIT path never reloads DS/ES.  It assumes they
are still __USER_DS set at the SYSENTER time (Brian Gerst), so if
the return to user space happens to be through SYSEXIT, it will lead
to the reported GPF.

Fix the problem by setting the DS and ES registers to __USER_DS
as expected by the SYSEXIT path.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=61781
Link: http://marc.info/?l=linux-pm&m=143406648920385&w=2Acked-by: NPavel Machek <pavel@ucw.cz>
Tested-by: NPavel Machek <pavel@ucw.cz>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ffa64eff

21 6月, 2015 1 次提交

x86/hpet: Use proper hpet device number for MSI allocation · cb17b2a6

由 Thomas Gleixner 提交于 6月 21, 2015

hpet_assign_irq() is called with hpet_device->num as "hardware
interrupt number", but hpet_device->num is initialized after the
interrupt has been assigned, so it's always 0. As a consequence only
the first MSI allocation succeeds, the following ones fail because the
"hardware interrupt number" already exists.

Move the initialization of dev->num and other fields before the call
to hpet_assign_irq(), which is the ordering before the offending
commit which introduced that regression.

Fixes: "3cb96f0c x86/hpet: Enhance HPET IRQ to support hierarchical irqdomains"
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1506211635010.4107@nanos
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>

cb17b2a6

20 6月, 2015 1 次提交

x86/hpet: Check for irq==0 when allocating hpet MSI interrupts · bafac298

由 Jiang Liu 提交于 6月 20, 2015

irq == 0 is not a valid irq for a irqdomain MSI allocation, but hpet
code checks only for negative return values.
Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/558447AF.30703@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

bafac298

19 6月, 2015 5 次提交

KVM: x86/vPMU: reorder PMU functions · e5af058a

由 Wei Huang 提交于 6月 19, 2015

Keep called functions closer to their callers, and init/destroy
functions next to each other.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5af058a

W
KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code · e84cfe4c
由 Wei Huang 提交于 6月 19, 2015
```
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
e84cfe4c
W
KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU · 212dba12
由 Wei Huang 提交于 6月 19, 2015
```
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
212dba12

KVM: x86/vPMU: introduce pmu.h header · 474a5bb9

由 Wei Huang 提交于 6月 19, 2015

This will be used for private function used by AMD- and Intel-specific
PMU implementations.
Signed-off-by: NWei Huang <wei@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

474a5bb9

KVM: x86/vPMU: rename a few PMU functions · c6702c9d

由 Wei Huang 提交于 6月 19, 2015

Before introducing a pmu.h header for them, make the naming more
consistent.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6702c9d

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功