提交 · 7318413077a5141a50a753b1fab687b7907eef16 · openanolis / cloud-kernel

15 9月, 2017 13 次提交

kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly · 4f350c6d

由 Jim Mattson 提交于 9月 14, 2017

When emulating a nested VM-entry from L1 to L2, several control field
validation checks are deferred to the hardware. Should one of these
validation checks fail, vcpu_vmx_run will set the vmx->fail flag. When
this happens, the L2 guest state is not loaded (even in part), and
execution should continue in L1 with the next instruction after the
VMLAUNCH/VMRESUME.

The VMCS12 is not modified (except for the VM-instruction error
field), the VMCS12 MSR save/load lists are not processed, and the CPU
state is not loaded from the VMCS12 host area. Moreover, the vmcs02
exit reason is stale, so it should not be consulted for any reason.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f350c6d

kvm: vmx: Handle VMLAUNCH/VMRESUME failure properly · b060ca3b

由 Jim Mattson 提交于 9月 14, 2017

On an early VMLAUNCH/VMRESUME failure (i.e. one which sets the
VM-instruction error field of the current VMCS), the launch state of
the current VMCS is not set to "launched," and the VM-exit information
fields of the current VMCS (including IDT-vectoring information and
exit reason) are stale.

On a late VMLAUNCH/VMRESUME failure (i.e. one which sets the high bit
of the exit reason field), the launch state of the current VMCS is not
set to "launched," and only two of the VM-exit information fields of
the current VMCS are modified (exit reason and exit
qualification). The remaining VM-exit information fields of the
current VMCS (including IDT-vectoring information, in particular) are
stale.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b060ca3b

kvm: nVMX: Remove nested_vmx_succeed after successful VM-entry · 7881f96c

由 Jim Mattson 提交于 9月 14, 2017

After a successful VM-entry, RFLAGS is cleared, with the exception of
bit 1, which is always set. This is handled by load_vmcs12_host_state.
Signed-off-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7881f96c

kvm,mips: Fix potential swait_active() races · 4c0b4bc6

由 Davidlohr Bueso 提交于 9月 13, 2017

For example, the following could occur, making us miss a wakeup:

CPU0					CPU1
kvm_vcpu_block				kvm_mips_comparecount_func
					  [L] swait_active(&vcpu->wq)
  [S] prepare_to_swait(&vcpu->wq)
  [L] if (!kvm_vcpu_has_pending_timer(vcpu))
         schedule()                       [S] queue_timer_int(vcpu)

Ensure that the swait_active() check is not hoisted over the interrupt.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c0b4bc6

kvm,powerpc: Serialize wq active checks in ops->vcpu_kick · 267ad7bc

由 Davidlohr Bueso 提交于 9月 13, 2017

Particularly because kvmppc_fast_vcpu_kick_hv() is a callback,
ensure that we properly serialize wq active checks in order to
avoid potentially missing a wakeup due to racing with the waiter
side.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

267ad7bc

kvm,x86: Fix apf_task_wake_one() wq serialization · a0cff57b

由 Davidlohr Bueso 提交于 9月 13, 2017

During code inspection, the following potential race was seen:

CPU0   	    		    	     	CPU1
kvm_async_pf_task_wait			apf_task_wake_one
					  [L] swait_active(&n->wq)
  [S] prepare_to_swait(&n.wq)
  [L] if (!hlist_unhahed(&n.link))
	schedule()			  [S] hlist_del_init(&n->link);

Properly serialize swait_active() checks such that a wakeup is
not missed.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a0cff57b

kvm,lapic: Justify use of swait_active() · cc1b4680

由 Davidlohr Bueso 提交于 9月 13, 2017

A comment might serve future readers.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cc1b4680

KVM: VMX: Do not BUG() on out-of-bounds guest IRQ · 3a8b0677

由 Jan H. Schönherr 提交于 9月 07, 2017

The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)

Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).

This fixes CVE-2017-1000252.

Fixes: efc64404 ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3a8b0677

nios2: time: Read timer in get_cycles only if initialized · 65d1e3dd

由 Guenter Roeck 提交于 9月 11, 2017

Mainline crashes as follows when running nios2 images.

On node 0 totalpages: 65536
free_area_init_node: node 0, pgdat c8408fa0, node_mem_map c8726000
  Normal zone: 512 pages used for memmap
  Normal zone: 0 pages reserved
  Normal zone: 65536 pages, LIFO batch:15
Unable to handle kernel NULL pointer dereference at virtual address 00000000
ea = c8003cb0, ra = c81cbf40, cause = 15
Kernel panic - not syncing: Oops

Problem is seen because get_cycles() is called before the timer it depends
on is initialized. Returning 0 in that situation fixes the problem.

Fixes: 33d72f38 ("init/main.c: extract early boot entropy from the ..")
Cc: Laura Abbott <labbott@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>

65d1e3dd

nios2: add earlycon support to 3c120 devboard DTS · 8993d5e4

由 Tobias Klauser 提交于 6月 20, 2017

Allow earlycon to be used on the JTAG UART present in the 3c120 GHRD.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>

8993d5e4

kvm: nVMX: Don't allow L2 to access the hardware CR8 · 51aa68e7

由 Jim Mattson 提交于 9月 12, 2017

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

51aa68e7

powerpc: Fix handling of alignment interrupt on dcbz instruction · 1bc944ce

由 Paul Mackerras 提交于 9月 13, 2017

This fixes the emulation of the dcbz instruction in the alignment
interrupt handler.  The error was that we were comparing just the
instruction type field of op.type rather than the whole thing,
and therefore the comparison "type != CACHEOP + DCBZ" was always
true.

Fixes: 31bfdb03 ("powerpc: Use instruction emulation infrastructure to handle alignment faults")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
Tested-by: NMichal Sojka <sojkam1@fel.cvut.cz>
Tested-by: NChristian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

1bc944ce

KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously · 9a6e7c39

由 Wanpeng Li 提交于 9月 14, 2017

qemu-system-x86-8600 [004] d..1 7205.687530: kvm_entry: vcpu 2
qemu-system-x86-8600 [004] .... 7205.687532: kvm_exit: reason EXCEPTION_NMI rip 0xffffffffa921297d info ffffeb2c0e44e018 80000b0e
qemu-system-x86-8600 [004] .... 7205.687532: kvm_page_fault: address ffffeb2c0e44e018 error_code 0
qemu-system-x86-8600 [004] .... 7205.687620: kvm_try_async_get_page: gva = 0xffffeb2c0e44e018, gfn = 0x427e4e
qemu-system-x86-8600 [004] .N.. 7205.687628: kvm_async_pf_not_present: token 0x8b002 gva 0xffffeb2c0e44e018
kworker/4:2-7814 [004] .... 7205.687655: kvm_async_pf_completed: gva 0xffffeb2c0e44e018 address 0x7fcc30c4e000
qemu-system-x86-8600 [004] .... 7205.687703: kvm_async_pf_ready: token 0x8b002 gva 0xffffeb2c0e44e018
qemu-system-x86-8600 [004] d..1 7205.687711: kvm_entry: vcpu 2

After running some memory intensive workload in guest, I catch the kworker
which completes the GUP too quickly, and queues an "Page Ready" #PF exception
after the "Page not Present" exception before the next vmentry as the above
trace which will result in #DF injected to guest.

This patch fixes it by clearing the queue for "Page not Present" if "Page Ready"
occurs before the next vmentry since the GUP has already got the required page
and shadow page table has already been fixed by "Page Ready" handler.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Fixes: 7c90705b ("KVM: Inject asynchronous page fault into a PV guest if page is swapped out.")
[Changed indentation and added clearing of injected. - Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

9a6e7c39

14 9月, 2017 7 次提交

KVM: X86: Don't block vCPU if there is pending exception · a5f01f8e

由 Wanpeng Li 提交于 9月 13, 2017

Don't block vCPU if there is pending exception.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a5f01f8e

KVM: SVM: Add irqchip_split() checks before enabling AVIC · 67034bb9

由 Suravee Suthikulpanit 提交于 9月 12, 2017

SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 44a95dae ("KVM: x86: Detect and Initialize AVIC support")
[Removed pr_debug - Radim.]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

67034bb9

dmi: Mark all struct dmi_system_id instances const · 6faadbbb

由 Christoph Hellwig 提交于 9月 14, 2017

... and __initconst if applicable.

Based on similar work for an older kernel in the Grsecurity patch.

[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJean Delvare <jdelvare@suse.de>

6faadbbb

arm64: stacktrace: avoid listing stacktrace functions in stacktrace · bb53c820

由 Prakash Gupta 提交于 9月 13, 2017

The stacktraces always begin as follows:

  [<c00117b4>] save_stack_trace_tsk+0x0/0x98
  [<c0011870>] save_stack_trace+0x24/0x28
  ...

This is because the stack trace code includes the stack frames for
itself.  This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)

Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.

This was fixed for arch arm by commit 3683f44c ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")

Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.orgSigned-off-by: NPrakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bb53c820

mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4

由 Michal Hocko 提交于 9月 13, 2017

GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ee931c4

KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv() · b2a05fef

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b2a05fef

KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu() · dfa20099

由 Suravee Suthikulpanit 提交于 9月 12, 2017

Preparing the base code for subsequent changes. This does not change
existing logic.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

dfa20099

13 9月, 2017 11 次提交

KVM: x86: fix clang build · 51537233

由 Radim Krčmář 提交于 9月 13, 2017

Clang resolves __builtin_constant_p() to false even if the expression is
constant in the end.  The only purpose of that expression was to
differentiate a case where the following expression couldn't be checked
at compile-time, so we can just remove the check.

Clang handles the following two correctly.  Turn it into BUG_ON if there
are any more problems with this.

Fixes: d6321d49 ("KVM: x86: generalize guest_cpuid_has_ helpers")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

51537233

KVM: x86: Fix immediate_exit handling for uninitialized AP · 2f173d26

由 Jan H. Schönherr 提交于 9月 06, 2017

When user space sets kvm_run->immediate_exit, KVM is supposed to
return quickly. However, when a vCPU is in KVM_MP_STATE_UNINITIALIZED,
the value is not considered and the vCPU blocks.

Fix that oversight.

Fixes: 460df4c1 ("KVM: race-free exit from KVM_RUN without POSIX signals")
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

2f173d26

KVM: x86: Fix handling of pending signal on uninitialized AP · a0595000

由 Jan H. Schönherr 提交于 9月 06, 2017

KVM API says that KVM_RUN will return with -EINTR when a signal is
pending. However, if a vCPU is in KVM_MP_STATE_UNINITIALIZED, then
the return value is unconditionally -EAGAIN.

Copy over some code from vcpu_run(), so that the case of a pending
signal results in the expected return value.
Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

a0595000

KVM: SVM: Add a missing 'break' statement · 49a8afca

由 Jan H. Schönherr 提交于 9月 05, 2017

Signed-off-by: NJan H. Schönherr <jschoenh@amazon.de>
Fixes: f6511935 ("KVM: SVM: Add checks for IO instructions")
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

49a8afca

KVM: x86: Remove .get_pkru() from kvm_x86_ops · 98152b83

由 Joerg Roedel 提交于 8月 28, 2017

The commit

	9dd21e104bc ('KVM: x86: simplify handling of PKRU')

removed all users and providers of that call-back, but
didn't remove it. Remove it now.
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

98152b83

x86/hyper-v: Remove duplicated HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED definition · 1278f58c

由 Vitaly Kuznetsov 提交于 9月 11, 2017

Commits:

  7dcf90e9 ("PCI: hv: Use vPCI protocol version 1.2")
  628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")

added the same definition and they came in through different trees.
Fix the duplication.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1278f58c

x86/hyper-V: Allocate the IDT entry early in boot · 213ff44a

由 K. Y. Srinivasan 提交于 9月 08, 2017

Allocate the hypervisor callback IDT entry early in the boot sequence.

The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.
Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

213ff44a

x86/paravirt: Remove no longer used paravirt functions · 87930019

由 Juergen Gross 提交于 9月 04, 2017

With removal of lguest some of the paravirt functions are no longer
needed:

	->read_cr4()
	->store_idt()
	->set_pmd_at()
	->set_pud_at()
	->pte_update()

Remove them.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

87930019

x86/mm/64: Initialize CR4.PCIDE early · c7ad5ad2

由 Andy Lutomirski 提交于 9月 10, 2017

cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs.  It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.

Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems.  First, we'd crash in the trampoline code.  That's
fixable, and I tried that.  It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.

This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate.  This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.

Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().

( Yes, this is ugly.  I think we should have improved mmu_cr4_features
  to actually control CR4 during secondary bootup, but that would be
  fairly intrusive at this stage. )
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Reported-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c7ad5ad2

x86/hibernate/64: Mask off CR3's PCID bits in the saved CR3 · f34902c5

由 Andy Lutomirski 提交于 9月 07, 2017

Jiri reported a resume-from-hibernation failure triggered by PCID.
The root cause appears to be rather odd.  The hibernation asm
restores a CR3 value that comes from the image header.  If the image
kernel has PCID on, it's entirely reasonable for this CR3 value to
have one of the low 12 bits set.  The restore code restores it with
CR4.PCIDE=0, which means that those low 12 bits are accepted by the
CPU but are either ignored or interpreted as a caching mode.  This
is odd, but still works.  We blow up later when the image kernel
restores CR4, though, since changing CR4.PCIDE with CR3[11:0] != 0
is illegal.  Boom!

FWIW, it's entirely unclear to me what's supposed to happen if a PAE
kernel restores a non-PAE image or vice versa.  Ditto for LA57.
Reported-by: NJiri Kosina <jikos@kernel.org>
Tested-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 660da7c9 ("x86/mm: Enable CR4.PCIDE on supported systems")
Link: http://lkml.kernel.org/r/18ca57090651a6341e97083883f9e814c4f14684.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f34902c5

x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off() · a376e7f9

由 Andy Lutomirski 提交于 9月 07, 2017

If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.

Make it a warning instead.
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

a376e7f9

12 9月, 2017 3 次提交

KVM: PPC: Book3S HV: Fix bug causing host SLB to be restored incorrectly · 67f8a8c1

由 Paul Mackerras 提交于 9月 12, 2017

Aneesh Kumar reported seeing host crashes when running recent kernels
on POWER8.  The symptom was an oops like this:

Unable to handle kernel paging request for data at address 0xf00000000786c620
Faulting instruction address: 0xc00000000030e1e4
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Modules linked in: powernv_op_panel
CPU: 24 PID: 6663 Comm: qemu-system-ppc Tainted: G        W 4.13.0-rc7-43932-gfc36c59 #2
task: c000000fdeadfe80 task.stack: c000000fdeb68000
NIP:  c00000000030e1e4 LR: c00000000030de6c CTR: c000000000103620
REGS: c000000fdeb6b450 TRAP: 0300   Tainted: G        W        (4.13.0-rc7-43932-gfc36c59)
MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24044428  XER: 20000000
CFAR: c00000000030e134 DAR: f00000000786c620 DSISR: 40000000 SOFTE: 0
GPR00: 0000000000000000 c000000fdeb6b6d0 c0000000010bd000 000000000000e1b0
GPR04: c00000000115e168 c000001fffa6e4b0 c00000000115d000 c000001e1b180386
GPR08: f000000000000000 c000000f9a8913e0 f00000000786c600 00007fff587d0000
GPR12: c000000fdeb68000 c00000000fb0f000 0000000000000001 00007fff587cffff
GPR16: 0000000000000000 c000000000000000 00000000003fffff c000000fdebfe1f8
GPR20: 0000000000000004 c000000fdeb6b8a8 0000000000000001 0008000000000040
GPR24: 07000000000000c0 00007fff587cffff c000000fdec20bf8 00007fff587d0000
GPR28: c000000fdeca9ac0 00007fff587d0000 00007fff587c0000 00007fff587d0000
NIP [c00000000030e1e4] __get_user_pages_fast+0x434/0x1070
LR [c00000000030de6c] __get_user_pages_fast+0xbc/0x1070
Call Trace:
[c000000fdeb6b6d0] [c00000000139dab8] lock_classes+0x0/0x35fe50 (unreliable)
[c000000fdeb6b7e0] [c00000000030ef38] get_user_pages_fast+0xf8/0x120
[c000000fdeb6b830] [c000000000112318] kvmppc_book3s_hv_page_fault+0x308/0xf30
[c000000fdeb6b960] [c00000000010e10c] kvmppc_vcpu_run_hv+0xfdc/0x1f00
[c000000fdeb6bb20] [c0000000000e915c] kvmppc_vcpu_run+0x2c/0x40
[c000000fdeb6bb40] [c0000000000e5650] kvm_arch_vcpu_ioctl_run+0x110/0x300
[c000000fdeb6bbe0] [c0000000000d6468] kvm_vcpu_ioctl+0x528/0x900
[c000000fdeb6bd40] [c0000000003bc04c] do_vfs_ioctl+0xcc/0x950
[c000000fdeb6bde0] [c0000000003bc930] SyS_ioctl+0x60/0x100
[c000000fdeb6be30] [c00000000000b96c] system_call+0x58/0x6c
Instruction dump:
7ca81a14 2fa50000 41de0010 7cc8182a 68c60002 78c6ffe2 0b060000 3cc2000a
794a3664 390610d8 e9080000 7d485214 <e90a0020> 7d435378 790507e1 408202f0
---[ end trace fad4a342d0414aa2 ]---

It turns out that what has happened is that the SLB entry for the
vmmemap region hasn't been reloaded on exit from a guest, and it has
the wrong page size.  Then, when the host next accesses the vmemmap
region, it gets a page fault.

Commit a25bd72b ("powerpc/mm/radix: Workaround prefetch issue with
KVM", 2017-07-24) modified the guest exit code so that it now only clears
out the SLB for hash guest.  The code tests the radix flag and puts the
result in a non-volatile CR field, CR2, and later branches based on CR2.

Unfortunately, the kvmppc_save_tm function, which gets called between
those two points, modifies all the user-visible registers in the case
where the guest was in transactional or suspended state, except for a
few which it restores (namely r1, r2, r9 and r13).  Thus the hash/radix indication in CR2 gets corrupted.

This fixes the problem by re-doing the comparison just before the
result is needed.  For good measure, this also adds comments next to
the call sites of kvmppc_save_tm and kvmppc_restore_tm pointing out
that non-volatile register state will be lost.

Cc: stable@vger.kernel.org # v4.13
Fixes: a25bd72b ("powerpc/mm/radix: Workaround prefetch issue with KVM")
Tested-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

67f8a8c1

KVM: PPC: Book3S HV: Hold kvm->lock around call to kvmppc_update_lpcr · cf5f6f31

由 Paul Mackerras 提交于 9月 11, 2017

Commit 468808bd ("KVM: PPC: Book3S HV: Set process table for HPT
guests on POWER9", 2017-01-30) added a call to kvmppc_update_lpcr()
which doesn't hold the kvm->lock mutex around the call, as required.
This adds the lock/unlock pair, and for good measure, includes
the kvmppc_setup_partition_table() call in the locked region, since
it is altering global state of the VM.

This error appears not to have any fatal consequences for the host;
the consequences would be that the VCPUs could end up running with
different LPCR values, or an update to the LPCR value by userspace
using the one_reg interface could get overwritten, or the update
done by kvmhv_configure_mmu() could get overwritten.

Cc: stable@vger.kernel.org # v4.10+
Fixes: 468808bd ("KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9")
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

cf5f6f31

KVM: PPC: Book3S HV: Don't access XIVE PIPR register using byte accesses · d222af07

由 Benjamin Herrenschmidt 提交于 9月 06, 2017

The XIVE interrupt controller on POWER9 machines doesn't support byte
accesses to any register in the thread management area other than the
CPPR (current processor priority register).  In particular, when
reading the PIPR (pending interrupt priority register), we need to
do a 32-bit or 64-bit load.

Cc: stable@vger.kernel.org # v4.13
Fixes: 2c4fb78f ("KVM: PPC: Book3S HV: Workaround POWER9 DD1.0 bug causing IPB bit loss")
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

d222af07

11 9月, 2017 4 次提交

x86/cpu: Remove unused and undefined __generic_processor_info() declaration · e2329b42

由 Dou Liyang 提交于 9月 11, 2017

The following revert:

  2b85b3d2 ("x86/acpi: Restore the order of CPU IDs")

... got rid of __generic_processor_info(), but forgot to remove its
declaration in mpspec.h.

Remove the declaration and update the comments as well.
Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: lenb@kernel.org
Link: http://lkml.kernel.org/r/1505101403-29100-1-git-send-email-douly.fnst@cn.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e2329b42

openrisc: add forward declaration for struct vm_area_struct · 56ce2f25

由 Tobias Klauser 提交于 9月 08, 2017

After removing linux/vmalloc.h from asm-generic/io.h, the following
warning occurs on openrisc:

   In file included from arch/openrisc/include/asm/io.h:33:0,
                    from include/linux/io.h:25,
                    from drivers/tty/serial/earlycon.c:19:
   arch/openrisc/include/asm/pgtable.h:424:2: warning: 'struct vm_area_struct' declared inside parameter list
     unsigned long address, pte_t *pte)
     ^
   arch/openrisc/include/asm/pgtable.h:424:2: warning: its scope is only this definition or declaration, which is probably not what you want
Reported-by: Nkbuild test robot <lkp@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NStafford Horne <shorne@gmail.com>

56ce2f25

m68k: Add braces to __pmd(x) initializer to kill compiler warning · 4c2b5e0f

由 Geert Uytterhoeven 提交于 9月 10, 2017

With gcc 4.1.2:

    include/linux/swapops.h: In function ‘swp_entry_to_pmd’:
    include/linux/swapops.h:294: warning: missing braces around initializer
    include/linux/swapops.h:294: warning: (near initialization for ‘(anonymous).pmd’)

Due to a GCC zero initializer bug (#53119), the standard "(pmd_t){ 0 }"
initializer is not accepted by all GCC versions.
In addition, on m68k pmd_t is an array instead of a single value, so we
need "(pmd_t){ { 0 }, }" instead of "(pmd_t){ 0 }".

Based on commit 9157259d ("mm: add pmd_t initializer __pmd() to
work around a GCC bug.") for sparc32.

Fixes: 616b8371 ("mm: thp: enable thp migration in generic path")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c2b5e0f

x86/mm/64: Fix an incorrect warning with CONFIG_DEBUG_VM=y, !PCID · 7898f796

由 Andy Lutomirski 提交于 9月 10, 2017

I've been staring at the word PCID too long.

Fixes: f13c8e8c58ba ("x86/mm: Reinitialize TLB state on hotplug and resume")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAndy Lutomirski <luto@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7898f796

10 9月, 2017 2 次提交

sparc64: Handle additional cases of no fault loads · b6fe1089

由 Rob Gardner 提交于 9月 08, 2017

Load instructions using ASI_PNF or other no-fault ASIs should not
cause a SIGSEGV or SIGBUS.

A garden variety unmapped address follows the TSB miss path, and when
no valid mapping is found in the process page tables, the miss handler
checks to see if the access was via a no-fault ASI.  It then fixes up
the target register with a zero, and skips the no-fault load
instruction.

But different paths are taken for data access exceptions and alignment
traps, and these do not respect the no-fault ASI. We add checks in
these paths for the no-fault ASI, and fix up the target register and
TPC just like in the TSB miss case.
Signed-off-by: NRob Gardner <rob.gardner@oracle.com>
Acked-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6fe1089

sparc64: speed up etrap/rtrap on NG2 and later processors · a7159a87

由 Anthony Yznaga 提交于 8月 18, 2017

For many sun4v processor types, reading or writing a privileged register
has a latency of 40 to 70 cycles.  Use a combination of the low-latency
allclean, otherw, normalw, and nop instructions in etrap and rtrap to
replace 2 rdpr and 5 wrpr instructions and improve etrap/rtrap
performance.  allclean, otherw, and normalw are available on NG2 and
later processors.

The average ticks to execute the flush windows trap ("ta 0x3") with and
without this patch on select platforms:

 CPU            Not patched     Patched    % Latency Reduction

 NG2            1762            1558            -11.58
 NG4            3619            3204            -11.47
 M7             3015            2624            -12.97
 SPARC64-X      829             770              -7.12
Signed-off-by: NAnthony Yznaga <anthony.yznaga@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7159a87

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功