提交 · f5caf621ee357279e759c0911daf6d55c7d36f03 · openanolis / cloud-kernel

23 9月, 2017 1 次提交

x86/asm: Fix inline asm call constraints for Clang · f5caf621

由 Josh Poimboeuf 提交于 9月 20, 2017

For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.
Suggested-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: NMatthias Kaehlcke <mka@chromium.org>
Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f5caf621

18 9月, 2017 4 次提交

x86/mm/32: Load a sane CR3 before cpu_init() on secondary CPUs · 4ba55e65

由 Andy Lutomirski 提交于 9月 17, 2017

For unknown historical reasons (i.e. Borislav doesn't recall),
32-bit kernels invoke cpu_init() on secondary CPUs with
initial_page_table loaded into CR3.  Then they set
current->active_mm to &init_mm and call enter_lazy_tlb() before
fixing CR3.  This means that the x86 TLB code gets invoked while CR3
is inconsistent, and, with the improved PCID sanity checks I added,
we warn.

Fix it by loading swapper_pg_dir (i.e. init_mm.pgd) earlier.
Reported-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Reported-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 72c0098d ("x86/mm: Reinitialize TLB state on hotplug and resume")
Link: http://lkml.kernel.org/r/30cdfea504682ba3b9012e77717800a91c22097f.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4ba55e65

x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier · b8b7abae

由 Andy Lutomirski 提交于 9月 17, 2017

Otherwise we might have the PCID feature bit set during cpu_init().

This is just for robustness. I haven't seen any actual bugs here.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: cba4671a ("x86/mm: Disable PCID on 32-bit kernels")
Link: http://lkml.kernel.org/r/b16dae9d6b0db5d9801ddbebbfd83384097c61f3.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

b8b7abae

x86/mm/64: Stop using CR3.PCID == 0 in ASID-aware code · 52a2af40

由 Andy Lutomirski 提交于 9月 17, 2017

Putting the logical ASID into CR3's PCID bits directly means that we
have two cases to consider separately: ASID == 0 and ASID != 0.
This means that bugs that only hit in one of these cases trigger
nondeterministically.

There were some bugs like this in the past, and I think there's
still one in current kernels.  In particular, we have a number of
ASID-unware code paths that save CR3, write some special value, and
then restore CR3.  This includes suspend/resume, hibernate, kexec,
EFI, and maybe other things I've missed.  This is currently
dangerous: if ASID != 0, then this code sequence will leave garbage
in the TLB tagged for ASID 0.  We could potentially see corruption
when switching back to ASID 0.  In principle, an
initialize_tlbstate_and_flush() call after these sequences would
solve the problem, but EFI, at least, does not call this.  (And it
probably shouldn't -- initialize_tlbstate_and_flush() is rather
expensive.)
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/cdc14bbe5d3c3ef2a562be09a6368ffe9bd947a6.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

52a2af40

x86/mm: Factor out CR3-building code · 47061a24

由 Andy Lutomirski 提交于 9月 17, 2017

Current, the code that assembles a value to load into CR3 is
open-coded everywhere.  Factor it out into helpers build_cr3() and
build_cr3_noflush().

This makes one semantic change: __get_current_cr3_fast() was wrong
on SME systems.  No one noticed because the only caller is in the
VMX code, and there are no CPUs with both SME and VMX.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Link: http://lkml.kernel.org/r/ce350cf11e93e2842d14d0b95b0199c7d881f527.1505663533.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

47061a24

15 9月, 2017 9 次提交

kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly · 4f350c6d

由 Jim Mattson 提交于 9月 14, 2017

When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.

The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f350c6d

kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly · b060ca3b

由 Jim Mattson 提交于 9月 14, 2017

On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.

On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b060ca3b

kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry · 7881f96c

由 Jim Mattson 提交于 9月 14, 2017

After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7881f96c

kvm,x86: Fix apf_task_wake_one() wq serialization · a0cff57b

由 Davidlohr Bueso 提交于 9月 13, 2017

During code inspection, the following potential race was seen:

CPU0   	    		    	     	CPU1
kvm_async_pf_task_wait			apf_task_wake_one
					  [L] swait_active(&n->wq)
  [S] prepare_to_swait(&n.wq)
  [L] if (!hlist_unhahed(&n.link))
	schedule()			  [S] hlist_del_init(&n->link);

Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0cff57b

kvm,lapic: Justify use of swait_active() · cc1b4680

由 Davidlohr Bueso 提交于 9月 13, 2017

A comment might serve future readers.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc1b4680

KVM: VMX: Do not BUG() on out-of-bounds guest IRQ · 3a8b0677

由 Jan H. Schönherr 提交于 9月 07, 2017

The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)

Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).

This fixes CVE-2017-1000252.

Fixes: efc64404 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a8b0677

kvm: nVMX: Don't allow L2 to access the hardware CR8 · 51aa68e7

由 Jim Mattson 提交于 9月 12, 2017

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

51aa68e7

x86/cpu/AMD: Fix erratum 1076 (CPB bit) · f7f3dc00

由 Borislav Petkov 提交于 9月 07, 2017

CPUID Fn8000_0007_EDX[CPB] is wrongly 0 on models up to B1. But they do
support CPB (AMD's Core Performance Boosting cpufreq CPU feature), so fix that.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907170821.16021-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

f7f3dc00

KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously · 9a6e7c39

由 Wanpeng Li 提交于 9月 14, 2017

qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2

After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.

This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705b ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9a6e7c39

14 9月, 2017 7 次提交

KVM: X86: Don't block vCPU if there is pending exception · a5f01f8e

由 Wanpeng Li 提交于 9月 13, 2017

Don't block vCPU if there is pending exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a5f01f8e

KVM: SVM: Add irqchip_split() checks before enabling AVIC · 67034bb9

由 Suravee Suthikulpanit 提交于 9月 12, 2017

SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

67034bb9

dmi: Mark all struct dmi_system_id instances const · 6faadbbb

由 Christoph Hellwig 提交于 9月 14, 2017

... and __initconst if applicable.

Based on similar work for an older kernel in the Grsecurity patch.

[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJean Delvare <jdelvare@suse.de>

6faadbbb

um: remove a stray tab · 7b24afbf

由 Dan Carpenter 提交于 8月 25, 2017

Static checkers would urge us to add curly braces to this code, but
actually the code works correctly.  It just isn't indented right.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>

7b24afbf

um: Fix FP register size for XSTATE/XSAVE · 6f602afd

由 Thomas Meyer 提交于 7月 29, 2017

Hard code max size. Taken from
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/common/x86-xstate.hSigned-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NRichard Weinberger <richard@nod.at>

6f602afd

KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() · b2a05fef

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b2a05fef

KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() · dfa20099

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Preparing the base code for subsequent changes. This does not change
existing logic.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dfa20099

13 9月, 2017 11 次提交

KVM: x86: fix clang build · 51537233

由 Radim Krčmář 提交于 9月 13, 2017

Clang resolves __builtin_constant_p() to false even if the expression is
constant in the end.  The only purpose of that expression was to
differentiate a case where the following expression couldn't be checked
at compile-time, so we can just remove the check.

Clang handles the following two correctly.  Turn it into BUG_ON if there
are any more problems with this.

Fixes: d6321d49 ("KVM: x86: generalize guest_cpuid_has_ helpers")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

51537233

KVM: x86: Fix immediate_exit handling for uninitialized AP · 2f173d26

由 Jan H. Schönherr 提交于 9月 06, 2017

When user space sets kvm_run->immediate_exit, KVM is supposed to
return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED,
the value is not considered and the vCPU blocks.

Fix that oversight.

Fixes: 460df4c1 ("KVM: race-free exit from KVM_RUN without POSIX signals")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

2f173d26

KVM: x86: Fix handling of pending signal on uninitialized AP · a0595000

由 Jan H. Schönherr 提交于 9月 06, 2017

KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.

Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a0595000

KVM: SVM: Add a missing 'break' statement · 49a8afca

由 Jan H. Schönherr 提交于 9月 05, 2017

Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

49a8afca

KVM: x86: Remove .get_pkru() from kvm_x86_ops · 98152b83

由 Joerg Roedel 提交于 8月 28, 2017

The commit

	9dd21e104bc ('KVM: x86: simplify handling of PKRU')

removed all users and providers of that call-back, but
didn't remove it. Remove it now.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

98152b83

x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition · 1278f58c

由 Vitaly Kuznetsov 提交于 9月 11, 2017

Commits:

  7dcf90e9 ("PCI: hv: Use vPCI protocol version 1.2")
  628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")

added the same definition and they came in through different trees.
Fix the duplication.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1278f58c

x86/hyper-V: Allocate the IDT entry early in boot · 213ff44a

由 K. Y. Srinivasan 提交于 9月 08, 2017

Allocate the hypervisor callback IDT entry early in the boot sequence.

The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

213ff44a

x86/paravirt: Remove no longer used paravirt functions · 87930019

由 Juergen Gross 提交于 9月 04, 2017

With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

87930019

x86/mm/64: Initialize CR4.PCIDE early · c7ad5ad2

由 Andy Lutomirski 提交于 9月 10, 2017

cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs.  It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.

Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems.  First, we'd crash in the trampoline code.  That's
fixable, and I tried that.  It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.

This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate.  This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.

Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().

( Yes, this is ugly.  I think we should have improved mmu_cr4_features
  to actually control CR4 during secondary bootup, but that would be
  fairly intrusive at this stage. )
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c7ad5ad2

x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3 · f34902c5

由 Andy Lutomirski 提交于 9月 07, 2017

Jiri reported a resume-from-hibernation failure triggered by PCID.
The root cause appears to be rather odd.  The hibernation asm
restores a CR3 value that comes from the image header.  If the image
kernel has PCID on, it's entirely reasonable for this CR3 value to
have one of the low 12 bits set.  The restore code restores it with
CR4.PCIDE=0, which means that those low 12 bits are accepted by the
CPU but are either ignored or interpreted as a caching mode.  This
is odd, but still works.  We blow up later when the image kernel
restores CR4, though, since changing CR4.PCIDE with CR3[11:0] != 0
is illegal.  Boom!

FWIW, it's entirely unclear to me what's supposed to happen if a PAE
kernel restores a non-PAE image or vice versa.  Ditto for LA57.
Reported-by: NJiri Kosina <jikos@kernel.org>
Tested-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
Link: http://lkml.kernel.org/r/18ca57090651a6341e97083883f9e814c4f14684.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f34902c5

x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off() · a376e7f9

由 Andy Lutomirski 提交于 9月 07, 2017

If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.

Make it a warning instead.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

a376e7f9

11 9月, 2017 2 次提交

x86/cpu: Remove unused and undefined __generic_processor_info() declaration · e2329b42

由 Dou Liyang 提交于 9月 11, 2017

The following revert:

  2b85b3d2 ("x86/acpi: Restore the order of CPU IDs")

... got rid of __generic_processor_info(), but forgot to remove its
declaration in mpspec.h.

Remove the declaration and update the comments as well.
Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e2329b42

x86/mm/64: Fix an incorrect warning with CONFIG_DEBUG_VM=y, !PCID · 7898f796

由 Andy Lutomirski 提交于 9月 10, 2017

I've been staring at the word PCID too long.

Fixes: f13c8e8c58ba ("x86/mm: Reinitialize TLB state on hotplug and resume")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7898f796

09 9月, 2017 6 次提交

treewide: make "nr_cpu_ids" unsigned · 9b130ad5

由 Alexey Dobriyan 提交于 9月 08, 2017

First, number of CPUs can't be negative number.

Second, different signnnedness leads to suboptimal code in the following
cases:

1)
	kmalloc(nr_cpu_ids * sizeof(X));

"int" has to be sign extended to size_t.

2)
	while (loff_t *pos < nr_cpu_ids)

MOVSXD is 1 byte longed than the same MOV.

Other cases exist as well. Basically compiler is told that nr_cpu_ids
can't be negative which can't be deduced if it is "int".

Code savings on allyesconfig kernel: -3KB

	add/remove: 0/0 grow/shrink: 25/264 up/down: 261/-3631 (-3370)
	function                                     old     new   delta
	coretemp_cpu_online                          450     512     +62
	rcu_init_one                                1234    1272     +38
	pci_device_probe                             374     399     +25

				...

	pgdat_reclaimable_pages                      628     556     -72
	select_fallback_rq                           446     369     -77
	task_numa_find_cpu                          1923    1807    -116

Link: http://lkml.kernel.org/r/20170819114959.GA30580@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b130ad5

x86: implement memset16, memset32 & memset64 · 4c512485

由 Matthew Wilcox 提交于 9月 08, 2017

These are single instructions on x86.  There's no 64-bit instruction for
x86-32, but we don't yet have any user for memset64() on 32-bit
architectures, so don't bother to implement it.

Link: http://lkml.kernel.org/r/20170720184539.31609-4-willy@infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c512485

mm/memory_hotplug: introduce add_pages · 3072e413

由 Michal Hocko 提交于 9月 08, 2017

There are new users of memory hotplug emerging.  Some of them require
different subset of arch_add_memory.  There are some which only require
allocation of struct pages without mapping those pages to the kernel
address space.  We currently have __add_pages for that purpose.  But this
is rather lowlevel and not very suitable for the code outside of the
memory hotplug.  E.g.  x86_64 wants to update max_pfn which should be done
by the caller.  Introduce add_pages() which should care about those
details if they are needed.  Each architecture should define its
implementation and select CONFIG_ARCH_HAS_ADD_PAGES.  All others use the
currently existing __add_pages.

Link: http://lkml.kernel.org/r/20170817000548.32038-7-jglisse@redhat.comSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NJérôme Glisse <jglisse@redhat.com>
Acked-by: NBalbir Singh <bsingharora@gmail.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Evgeny Baskakov <ebaskakov@nvidia.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mark Hairgrove <mhairgrove@nvidia.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Sherry Cheung <SCheung@nvidia.com>
Cc: Subhash Gutti <sgutti@nvidia.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Bob Liu <liubo95@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3072e413

mm: soft-dirty: keep soft-dirty bits over thp migration · ab6e3d09

由 Naoya Horiguchi 提交于 9月 08, 2017

Soft dirty bit is designed to keep tracked over page migration.  This
patch makes it work in the same manner for thp migration too.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ab6e3d09

mm: thp: enable thp migration in generic path · 616b8371

由 Zi Yan 提交于 9月 08, 2017

Add thp migration's core code, including conversions between a PMD entry
and a swap entry, setting PMD migration entry, removing PMD migration
entry, and waiting on PMD migration entries.

This patch makes it possible to support thp migration.  If you fail to
allocate a destination page as a thp, you just split the source thp as
we do now, and then enter the normal page migration.  If you succeed to
allocate destination thp, you enter thp migration.  Subsequent patches
actually enable thp migration for each caller of page migration by
allowing its get_new_page() callback to allocate thps.

[zi.yan@cs.rutgers.edu: fix gcc-4.9.0 -Wmissing-braces warning]
  Link: http://lkml.kernel.org/r/A0ABA698-7486-46C3-B209-E95A9048B22C@cs.rutgers.edu
[akpm@linux-foundation.org: fix x86_64 allnoconfig warning]
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

616b8371

mm: thp: introduce CONFIG_ARCH_ENABLE_THP_MIGRATION · 9c670ea3

由 Naoya Horiguchi 提交于 9月 08, 2017

Introduce CONFIG_ARCH_ENABLE_THP_MIGRATION to limit thp migration
functionality to x86_64, which should be safer at the first step.

Link: http://lkml.kernel.org/r/20170717193955.20207-5-zi.yan@sent.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: NZi Yan <zi.yan@cs.rutgers.edu>
Reviewed-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Nellans <dnellans@nvidia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c670ea3

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功