提交 · 26b471c7e2f7befd0f59c35b257749ca57e0ed70 · openanolis / cloud-kernel

21 9月, 2018 1 次提交

KVM: nVMX: Fix bad cleanup on error of get/set nested state IOCTLs · 26b471c7

由 Liran Alon 提交于 9月 16, 2018

The handlers of IOCTLs in kvm_arch_vcpu_ioctl() are expected to set
their return value in "r" local var and break out of switch block
when they encounter some error.
This is because vcpu_load() is called before the switch block which
have a proper cleanup of vcpu_put() afterwards.

However, KVM_{GET,SET}_NESTED_STATE IOCTLs handlers just return
immediately on error without performing above mentioned cleanup.

Thus, change these handlers to behave as expected.

Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE")
Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NPatrick Colp <patrick.colp@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

26b471c7

20 9月, 2018 16 次提交

KVM: x86: Control guest reads of MSR_PLATFORM_INFO · 6fbbde9a

由 Drew Schmitt 提交于 8月 20, 2018

Add KVM_CAP_MSR_PLATFORM_INFO so that userspace can disable guest access
to reads of MSR_PLATFORM_INFO.

Disabling access to reads of this MSR gives userspace the control to "expose"
this platform-dependent information to guests in a clear way. As it exists
today, guests that read this MSR would get unpopulated information if userspace
hadn't already set it (and prior to this patch series, only the CPUID faulting
information could have been populated). This existing interface could be
confusing if guests don't handle the potential for incorrect/incomplete
information gracefully (e.g. zero reported for base frequency).
Signed-off-by: NDrew Schmitt <dasch@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6fbbde9a

KVM: x86: Turbo bits in MSR_PLATFORM_INFO · d84f1cff

由 Drew Schmitt 提交于 8月 20, 2018

Allow userspace to set turbo bits in MSR_PLATFORM_INFO. Previously, only
the CPUID faulting bit was settable. But now any bit in
MSR_PLATFORM_INFO would be settable. This can be used, for example, to
convey frequency information about the platform on which the guest is
running.
Signed-off-by: NDrew Schmitt <dasch@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d84f1cff

nVMX x86: Check VPID value on vmentry of L2 guests · ba8e23db

由 Krish Sadhukhan 提交于 9月 04, 2018

According to section "Checks on VMX Controls" in Intel SDM vol 3C, the
following check needs to be enforced on vmentry of L2 guests:

    If the 'enable VPID' VM-execution control is 1, the value of the
    of the VPID VM-execution control field must not be 0000H.
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba8e23db

nVMX x86: check posted-interrupt descriptor addresss on vmentry of L2 · 6de84e58

由 Krish Sadhukhan 提交于 8月 23, 2018

According to section "Checks on VMX Controls" in Intel SDM vol 3C,
the following check needs to be enforced on vmentry of L2 guests:

   - Bits 5:0 of the posted-interrupt descriptor address are all 0.
   - The posted-interrupt descriptor address does not set any bits
     beyond the processor's physical-address width.
Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Reviewed-by: NMark Kanda <mark.kanda@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Reviewed-by: NKarl Heubaum <karl.heubaum@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6de84e58

KVM: nVMX: Wake blocked vCPU in guest-mode if pending interrupt in virtual APICv · e6c67d8c

由 Liran Alon 提交于 9月 04, 2018

In case L1 do not intercept L2 HLT or enter L2 in HLT activity-state,
it is possible for a vCPU to be blocked while it is in guest-mode.

According to Intel SDM 26.6.5 Interrupt-Window Exiting and
Virtual-Interrupt Delivery: "These events wake the logical processor
if it just entered the HLT state because of a VM entry".
Therefore, if L1 enters L2 in HLT activity-state and L2 has a pending
deliverable interrupt in vmcs12->guest_intr_status.RVI, then the vCPU
should be waken from the HLT state and injected with the interrupt.

In addition, if while the vCPU is blocked (while it is in guest-mode),
it receives a nested posted-interrupt, then the vCPU should also be
waken and injected with the posted interrupt.

To handle these cases, this patch enhances kvm_vcpu_has_events() to also
check if there is a pending interrupt in L2 virtual APICv provided by
L1. That is, it evaluates if there is a pending virtual interrupt for L2
by checking RVI[7:4] > VPPR[7:4] as specified in Intel SDM 29.2.1
Evaluation of Pending Interrupts.

Note that this also handles the case of nested posted-interrupt by the
fact RVI is updated in vmx_complete_nested_posted_interrupt() which is
called from kvm_vcpu_check_block() -> kvm_arch_vcpu_runnable() ->
kvm_vcpu_running() -> vmx_check_nested_events() ->
vmx_complete_nested_posted_interrupt().
Reviewed-by: NNikita Leshenko <nikita.leshchenko@oracle.com>
Reviewed-by: NDarren Kenny <darren.kenny@oracle.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e6c67d8c

KVM: VMX: check nested state and CR4.VMXE against SMM · 5bea5123

由 Paolo Bonzini 提交于 9月 18, 2018

VMX cannot be enabled under SMM, check it when CR4 is set and when nested
virtualization state is restored.

This should fix some WARNs reported by syzkaller, mostly around
alloc_shadow_vmcs.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5bea5123

kvm: x86: make kvm_{load|put}_guest_fpu() static · 822f312d

由 Sebastian Andrzej Siewior 提交于 9月 12, 2018

The functions
	kvm_load_guest_fpu()
	kvm_put_guest_fpu()

are only used locally, make them static. This requires also that both
functions are moved because they are used before their implementation.
Those functions were exported (via EXPORT_SYMBOL) before commit
e5bb4025 ("KVM: Drop kvm_{load,put}_guest_fpu() exports").
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

822f312d

x86/hyper-v: rename ipi_arg_{ex,non_ex} structures · a1efa9b7

由 Vitaly Kuznetsov 提交于 8月 27, 2018

These structures are going to be used from KVM code so let's make
their names reflect their Hyper-V origin.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
Acked-by: NK. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1efa9b7

KVM: VMX: use preemption timer to force immediate VMExit · d264ee0c

由 Sean Christopherson 提交于 8月 27, 2018

A VMX preemption timer value of '0' is guaranteed to cause a VMExit
prior to the CPU executing any instructions in the guest.  Use the
preemption timer (if it's supported) to trigger immediate VMExit
in place of the current method of sending a self-IPI.  This ensures
that pending VMExit injection to L1 occurs prior to executing any
instructions in the guest (regardless of nesting level).

When deferring VMExit injection, KVM generates an immediate VMExit
from the (possibly nested) guest by sending itself an IPI.  Because
hardware interrupts are blocked prior to VMEnter and are unblocked
(in hardware) after VMEnter, this results in taking a VMExit(INTR)
before any guest instruction is executed.  But, as this approach
relies on the IPI being received before VMEnter executes, it only
works as intended when KVM is running as L0.  Because there are no
architectural guarantees regarding when IPIs are delivered, when
running nested the INTR may "arrive" long after L2 is running e.g.
L0 KVM doesn't force an immediate switch to L1 to deliver an INTR.

For the most part, this unintended delay is not an issue since the
events being injected to L1 also do not have architectural guarantees
regarding their timing.  The notable exception is the VMX preemption
timer[1], which is architecturally guaranteed to cause a VMExit prior
to executing any instructions in the guest if the timer value is '0'
at VMEnter.  Specifically, the delay in injecting the VMExit causes
the preemption timer KVM unit test to fail when run in a nested guest.

Note: this approach is viable even on CPUs with a broken preemption
timer, as broken in this context only means the timer counts at the
wrong rate.  There are no known errata affecting timer value of '0'.

[1] I/O SMIs also have guarantees on when they arrive, but I have
    no idea if/how those are emulated in KVM.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
[Use a hook for SVM instead of leaving the default in x86.c - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d264ee0c

KVM: VMX: modify preemption timer bit only when arming timer · f459a707

由 Sean Christopherson 提交于 8月 27, 2018

Provide a singular location where the VMX preemption timer bit is
set/cleared so that future usages of the preemption timer can ensure
the VMCS bit is up-to-date without having to modify unrelated code
paths.  For example, the preemption timer can be used to force an
immediate VMExit.  Cache the status of the timer to avoid redundant
VMREAD and VMWRITE, e.g. if the timer stays armed across multiple
VMEnters/VMExits.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f459a707

KVM: VMX: immediately mark preemption timer expired only for zero value · 4c008127

由 Sean Christopherson 提交于 8月 27, 2018

A VMX preemption timer value of '0' at the time of VMEnter is
architecturally guaranteed to cause a VMExit prior to the CPU
executing any instructions in the guest.  This architectural
definition is in place to ensure that a previously expired timer
is correctly recognized by the CPU as it is possible for the timer
to reach zero and not trigger a VMexit due to a higher priority
VMExit being signalled instead, e.g. a pending #DB that morphs into
a VMExit.

Whether by design or coincidence, commit f4124500 ("KVM: nVMX:
Fully emulate preemption timer") special cased timer values of '0'
and '1' to ensure prompt delivery of the VMExit.  Unlike '0', a
timer value of '1' has no has no architectural guarantees regarding
when it is delivered.

Modify the timer emulation to trigger immediate VMExit if and only
if the timer value is '0', and document precisely why '0' is special.
Do this even if calibration of the virtual TSC failed, i.e. VMExit
will occur immediately regardless of the frequency of the timer.
Making only '0' a special case gives KVM leeway to be more aggressive
in ensuring the VMExit is injected prior to executing instructions in
the nested guest, and also eliminates any ambiguity as to why '1' is
a special case, e.g. why wasn't the threshold for a "short timeout"
set to 10, 100, 1000, etc...
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c008127

KVM: SVM: Switch to bitmap_zalloc() · a101c9d6

由 Andy Shevchenko 提交于 8月 30, 2018

Switch to bitmap_zalloc() to show clearly what we are allocating.
Besides that it returns pointer of bitmap type instead of opaque void *.
Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a101c9d6

KVM/MMU: Fix comment in walk_shadow_page_lockless_end() · 9a984586

由 Tianyu Lan 提交于 9月 07, 2018

kvm_commit_zap_page() has been renamed to kvm_mmu_commit_zap_page()
This patch is to fix the commit.
Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a984586

KVM: x86: don't reset root in kvm_mmu_setup() · 83b20b28

由 Wei Yang 提交于 9月 07, 2018

Here is the code path which shows kvm_mmu_setup() is invoked after
kvm_mmu_create(). Since kvm_mmu_setup() is only invoked in this code path,
this means the root_hpa and prev_roots are guaranteed to be invalid. And
it is not necessary to reset it again.

    kvm_vm_ioctl_create_vcpu()
        kvm_arch_vcpu_create()
            vmx_create_vcpu()
                kvm_vcpu_init()
                    kvm_arch_vcpu_init()
                        kvm_mmu_create()
        kvm_arch_vcpu_setup()
            kvm_mmu_setup()
                kvm_init_mmu()

This patch set reset_roots to false in kmv_mmu_setup().

Fixes: 50c28f21Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83b20b28

kvm: mmu: Don't read PDPTEs when paging is not enabled · d35b34a9

由 Junaid Shahid 提交于 8月 08, 2018

kvm should not attempt to read guest PDPTEs when CR0.PG = 0 and
CR4.PAE = 1.
Signed-off-by: NJunaid Shahid <junaids@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d35b34a9

x86/kvm/lapic: always disable MMIO interface in x2APIC mode · d1766202

由 Vitaly Kuznetsov 提交于 8月 02, 2018

When VMX is used with flexpriority disabled (because of no support or
if disabled with module parameter) MMIO interface to lAPIC is still
available in x2APIC mode while it shouldn't be (kvm-unit-tests):

PASS: apic_disable: Local apic enabled in x2APIC mode
PASS: apic_disable: CPUID.1H:EDX.APIC[bit 9] is set
FAIL: apic_disable: *0xfee00030: 50014

The issue appears because we basically do nothing while switching to
x2APIC mode when APIC access page is not used. apic_mmio_{read,write}
only check if lAPIC is disabled before proceeding to actual write.

When APIC access is virtualized we correctly manipulate with VMX controls
in vmx_set_virtual_apic_mode() and we don't get vmexits from memory writes
in x2APIC mode so there's no issue.

Disabling MMIO interface seems to be easy. The question is: what do we
do with these reads and writes? If we add apic_x2apic_mode() check to
apic_mmio_in_range() and return -EOPNOTSUPP these reads and writes will
go to userspace. When lAPIC is in kernel, Qemu uses this interface to
inject MSIs only (see kvm_apic_mem_write() in hw/i386/kvm/apic.c). This
somehow works with disabled lAPIC but when we're in xAPIC mode we will
get a real injected MSI from every write to lAPIC. Not good.

The simplest solution seems to be to just ignore writes to the region
and return ~0 for all reads when we're in x2APIC mode. This is what this
patch does. However, this approach is inconsistent with what currently
happens when flexpriority is enabled: we allocate APIC access page and
create KVM memory region so in x2APIC modes all reads and writes go to
this pre-allocated page which is, btw, the same for all vCPUs.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1766202

15 9月, 2018 1 次提交

x86/APM: Fix build warning when PROC_FS is not enabled · 002b87d2

由 Randy Dunlap 提交于 9月 14, 2018

Fix build warning in apm_32.c when CONFIG_PROC_FS is not enabled:

../arch/x86/kernel/apm_32.c:1643:12: warning: 'proc_apm_show' defined but not used [-Wunused-function]
 static int proc_apm_show(struct seq_file *m, void *v)

Fixes: 3f3942ac ("proc: introduce proc_create_single{,_data}")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Jiri Kosina <jikos@kernel.org>
Link: https://lkml.kernel.org/r/be39ac12-44c2-4715-247f-4dcc3c525b8b@infradead.org

002b87d2

14 9月, 2018 1 次提交

Revert "x86/mm/legacy: Populate the user page-table with user pgd's" · 61a6bd83

由 Joerg Roedel 提交于 9月 14, 2018

This reverts commit 1f40a46c.

It turned out that this patch is not sufficient to enable PTI on 32 bit
systems with legacy 2-level page-tables. In this paging mode the huge-page
PTEs are in the top-level page-table directory, where also the mirroring to
the user-space page-table happens. So every huge PTE exits twice, in the
kernel and in the user page-table.

That means that accessed/dirty bits need to be fetched from two PTEs in
this mode to be safe, but this is not trivial to implement because it needs
changes to generic code just for the sake of enabling PTI with 32-bit
legacy paging. As all systems that need PTI should support PAE anyway,
remove support for PTI when 32-bit legacy paging is used.

Fixes: 7757d607 ('x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32')
Reported-by: NMeelis Roos <mroos@linux.ee>
Signed-off-by: NJoerg Roedel <jroedel@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Link: https://lkml.kernel.org/r/1536922754-31379-1-git-send-email-joro@8bytes.org

61a6bd83

13 9月, 2018 2 次提交

x86/efi: Load fixmap GDT in efi_call_phys_epilog() before setting %cr3 · cf40361e

由 Guenter Roeck 提交于 9月 11, 2018

Commit eeb89e2b ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
moved loading the fixmap in efi_call_phys_epilog() after load_cr3() since
it was assumed to be more logical.

Turns out this is incorrect: In efi_call_phys_prolog(), the gdt with its
physical address is loaded first, and when the %cr3 is reloaded in _epilog
from initial_page_table to swapper_pg_dir again the gdt is no longer
mapped.  This results in a triple fault if an interrupt occurs after
load_cr3() and before load_fixmap_gdt(0). Calling load_fixmap_gdt(0) first
restores the execution order prior to commit eeb89e2b and fixes the
problem.

Fixes: eeb89e2b ("x86/efi: Load fixmap GDT in efi_call_phys_epilog()")
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: linux-efi@vger.kernel.org
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Joerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/1536689892-21538-1-git-send-email-linux@roeck-us.net

cf40361e

x86/xen: Disable CPU0 hotplug for Xen PV · 99969675

由 Juergen Gross 提交于 9月 12, 2018

Xen PV guests don't allow CPU0 hotplug, so disable it.
Signed-off-by: NJuergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: boris.ostrovsky@oracle.com
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20180912174122.24282-1-jgross@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

99969675

12 9月, 2018 8 次提交

KVM: s390: Make huge pages unavailable in ucontrol VMs · 40ebdb8e

由 Janosch Frank 提交于 8月 01, 2018

We currently do not notify all gmaps when using gmap_pmdp_xchg(), due
to locking constraints. This makes ucontrol VMs, which is the only VM
type that creates multiple gmaps, incompatible with huge pages. Also
we would need to hold the guest_table_lock of all gmaps that have this
vmaddr maped to synchronize access to the pmd.

ucontrol VMs are rather exotic and creating a new locking concept is
no easy task. Hence we return EINVAL when trying to active
KVM_CAP_S390_HPAGE_1M and report it as being not available when
checking for it.

Fixes: a4499382 ("KVM: s390: Add huge page enablement control")
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Message-Id: <20180801112508.138159-1-frankja@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

40ebdb8e

s390/mm: Check for valid vma before zapping in gmap_discard · 1843abd0

由 Janosch Frank 提交于 8月 16, 2018

Userspace could have munmapped the area before doing unmapping from
the gmap. This would leave us with a valid vmaddr, but an invalid vma
from which we would try to zap memory.

Let's check before using the vma.

Fixes: 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c")
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20180816082432.78828-1-frankja@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

1843abd0

MIPS: lantiq: dma: add dev pointer · 2d946e5b

由 Hauke Mehrtens 提交于 9月 09, 2018

dma_zalloc_coherent() now crashes if no dev pointer is given.
Add a dev pointer to the ltq_dma_channel structure and fill it in the
driver using it.

This fixes a bug introduced in kernel 4.19.
Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d946e5b

M
xtensa: enable SG chaining in Kconfig · 4a7f50f7
由 Max Filippov 提交于 9月 11, 2018
```
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
```
4a7f50f7

xtensa: remove unnecessary KBUILD_SRC ifeq conditional · 8e966fab

由 Masahiro Yamada 提交于 9月 12, 2018

You can always prefix variant/platform header search paths with
$(srctree)/ because $(srctree) is '.' for in-tree building.
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>

8e966fab

KVM: PPC: Book3S HV: Don't use compound_order to determine host mapping size · 71d29f43

由 Nicholas Piggin 提交于 9月 11, 2018

THP paths can defer splitting compound pages until after the actual
remap and TLB flushes to split a huge PMD/PUD. This causes radix
partition scope page table mappings to get out of synch with the host
qemu page table mappings.

This results in random memory corruption in the guest when running
with THP. The easiest way to reproduce is use KVM balloon to free up
a lot of memory in the guest and then shrink the balloon to give the
memory back, while some work is being done in the guest.

Cc: David Gibson <david@gibson.dropbear.id.au>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: kvm-ppc@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

71d29f43

KVM: PPC: Avoid marking DMA-mapped pages dirty in real mode · 425333bf

由 Alexey Kardashevskiy 提交于 9月 10, 2018

At the moment the real mode handler of H_PUT_TCE calls iommu_tce_xchg_rm()
which in turn reads the old TCE and if it was a valid entry, marks
the physical page dirty if it was mapped for writing. Since it is in
real mode, realmode_pfn_to_page() is used instead of pfn_to_page()
to get the page struct. However SetPageDirty() itself reads the compound
page head and returns a virtual address for the head page struct and
setting dirty bit for that kills the system.

This adds additional dirty bit tracking into the MM/IOMMU API for use
in the real mode. Note that this does not change how VFIO and
KVM (in virtual mode) set this bit. The KVM (real mode) changes include:
- use the lowest bit of the cached host phys address to carry
the dirty bit;
- mark pages dirty when they are unpinned which happens when
the preregistered memory is released which always happens in virtual
mode;
- add mm_iommu_ua_mark_dirty_rm() helper to set delayed dirty bit;
- change iommu_tce_xchg_rm() to take the kvm struct for the mm to use
in the new mm_iommu_ua_mark_dirty_rm() helper;
- move iommu_tce_xchg_rm() to book3s_64_vio_hv.c (which is the only
caller anyway) to reduce the real mode KVM and IOMMU knowledge
across different subsystems.

This removes realmode_pfn_to_page() as it is not used anymore.

While we at it, remove some EXPORT_SYMBOL_GPL() as that code is for
the real mode only and modules cannot call it anyway.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>

425333bf

x86/EISA: Don't probe EISA bus for Xen PV guests · 6a92b111

由 Boris Ostrovsky 提交于 9月 11, 2018

For unprivileged Xen PV guests this is normal memory and ioremap will
not be able to properly map it.

While at it, since ioremap may return NULL, add a test for pointer's
validity.
Reported-by: NAndy Smith <andy@strugglers.net>
Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Cc: xen-devel@lists.xenproject.org
Cc: jgross@suse.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180911195538.23289-1-boris.ostrovsky@oracle.com

6a92b111

11 9月, 2018 4 次提交

arm64: kernel: arch_crash_save_vmcoreinfo() should depend on CONFIG_CRASH_CORE · 84c57dbd

由 James Morse 提交于 9月 10, 2018

Since commit 23c85094 ("proc/kcore: add vmcoreinfo note to /proc/kcore")
the kernel has exported the vmcoreinfo PT_NOTE on /proc/kcore as well
as /proc/vmcore.

arm64 only exposes it's additional arch information via
arch_crash_save_vmcoreinfo() if built with CONFIG_KEXEC, as kdump was
previously the only user of vmcoreinfo.

Move this weak function to a separate file that is built at the same
time as its caller in kernel/crash_core.c. This ensures values like
'kimage_voffset' are always present in the vmcoreinfo PT_NOTE.

CC: AKASHI Takahiro <takahiro.akashi@linaro.org>
Reviewed-by: NBhupesh Sharma <bhsharma@redhat.com>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

84c57dbd

arm64: jump_label.h: use asm_volatile_goto macro instead of "asm goto" · 13aceef0

由 Miguel Ojeda 提交于 9月 09, 2018

All other uses of "asm goto" go through asm_volatile_goto, which avoids
a miscompile when using GCC < 4.8.2. Replace our open-coded "asm goto"
statements with the asm_volatile_goto macro to avoid issues with older
toolchains.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

13aceef0

hexagon: modify ffs() and fls() to return int · 5c41aaad

由 Randy Dunlap 提交于 7月 22, 2018

Building drivers/mtd/nand/raw/nandsim.c on arch/hexagon/ produces a
printk format build warning.  This is due to hexagon's ffs() being
coded as returning long instead of int.

Fix the printk format warning by changing all of hexagon's ffs() and
fls() functions to return int instead of long.  The variables that
they return are already int instead of long.  This return type
matches the return type in <asm-generic/bitops/>.

../drivers/mtd/nand/raw/nandsim.c: In function 'init_nandsim':
../drivers/mtd/nand/raw/nandsim.c:760:2: warning: format '%u' expects argument of type 'unsigned int', but argument 2 has type 'long int' [-Wformat]

There are no ffs() or fls() allmodconfig build errors after making this
change.
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Patch-mainline: linux-kernel @ 07/22/2018, 16:03
Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>

5c41aaad

arch/hexagon: fix kernel/dma.c build warning · 200f351e

由 Randy Dunlap 提交于 7月 20, 2018

Fix build warning in arch/hexagon/kernel/dma.c by casting a void *
to unsigned long to match the function parameter type.

../arch/hexagon/kernel/dma.c: In function 'arch_dma_alloc':
../arch/hexagon/kernel/dma.c:51:5: warning: passing argument 2 of 'gen_pool_add' makes integer from pointer without a cast [enabled by default]
../include/linux/genalloc.h:112:19: note: expected 'long unsigned int' but argument is of type 'void *'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: linux-sh@vger.kernel.org
Patch-mainline: linux-kernel @ 07/20/2018, 20:17
[rkuo@codeaurora.org: fixed architecture name]
Signed-off-by: NRichard Kuo <rkuo@codeaurora.org>

200f351e

10 9月, 2018 1 次提交

perf/x86/intel: Add support/quirk for the MISPREDICT bit on Knights Landing CPUs · 16160c19

由 Jacek Tomaka 提交于 8月 02, 2018

Problem: perf did not show branch predicted/mispredicted bit in brstack.

Output of perf -F brstack for profile collected

Before:

 0x4fdbcd/0x4fdc03/-/-/-/0
 0x45f4c1/0x4fdba0/-/-/-/0
 0x45f544/0x45f4bb/-/-/-/0
 0x45f555/0x45f53c/-/-/-/0
 0x7f66901cc24b/0x45f555/-/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/-/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/-/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/-/-/-/0

After:

 0x4fdbcd/0x4fdc03/P/-/-/0
 0x45f4c1/0x4fdba0/P/-/-/0
 0x45f544/0x45f4bb/P/-/-/0
 0x45f555/0x45f53c/P/-/-/0
 0x7f66901cc24b/0x45f555/P/-/-/0
 0x7f66901cc22e/0x7f66901cc23d/P/-/-/0
 0x7f66901cc1ff/0x7f66901cc20f/P/-/-/0
 0x7f66901cc1e8/0x7f66901cc1fc/P/-/-/0

Cause:

As mentioned in Software Development Manual vol 3, 17.4.8.1,
IA32_PERF_CAPABILITIES[5:0] indicates the format of the address that is
stored in the LBR stack. Knights Landing reports 1 (LBR_FORMAT_LIP) as
its format. Despite that, registers containing FROM address of the branch,
do have MISPREDICT bit but because of the format indicated in
IA32_PERF_CAPABILITIES[5:0], LBR did not read MISPREDICT bit.

Solution:

Teach LBR about above Knights Landing quirk and make it read MISPREDICT bit.
Signed-off-by: NJacek Tomaka <jacek.tomaka@poczta.fm>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180802013830.10600-1-jacekt@dugeo.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

16160c19

08 9月, 2018 4 次提交

x86/mm: Use WRITE_ONCE() when setting PTEs · 9bc4f28a

由 Nadav Amit 提交于 9月 02, 2018

When page-table entries are set, the compiler might optimize their
assignment by using multiple instructions to set the PTE. This might
turn into a security hazard if the user somehow manages to use the
interim PTE. L1TF does not make our lives easier, making even an interim
non-present PTE a security hazard.

Using WRITE_ONCE() to set PTEs and friends should prevent this potential
security hazard.

I skimmed the differences in the binary with and without this patch. The
differences are (obviously) greater when CONFIG_PARAVIRT=n as more
code optimizations are possible. For better and worse, the impact on the
binary with this patch is pretty small. Skimming the code did not cause
anything to jump out as a security hazard, but it seems that at least
move_soft_dirty_pte() caused set_pte_at() to use multiple writes.
Signed-off-by: NNadav Amit <namit@vmware.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com

9bc4f28a

x86/apic/vector: Make error return value negative · 47b7360c

由 Thomas Gleixner 提交于 9月 08, 2018

activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.

Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org

47b7360c

KVM: LAPIC: Fix pv ipis out-of-bounds access · bdf7ffc8

由 Wanpeng Li 提交于 8月 30, 2018

Dan Carpenter reported that the untrusted data returns from kvm_register_read()
results in the following static checker warning:
  arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi()
  error: buffer underflow 'map->phys_map' 's32min-s32max'

KVM guest can easily trigger this by executing the following assembly sequence
in Ring0:

mov $10, %rax
mov $0xFFFFFFFF, %rbx
mov $0xFFFFFFFF, %rdx
mov $0, %rsi
vmcall

As this will cause KVM to execute the following code-path:
vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi()
which will reach out-of-bounds access.

This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id,
ignoring destinations that are not present and delivering the rest. We also check
whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the
max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm
unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NLiran Alon <liran.alon@oracle.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Liran Alon <liran.alon@oracle.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
[Add second "if (min > map->max_apic_id)" to complete the fix. -Radim]
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

bdf7ffc8

KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2 · b5861e5c

由 Liran Alon 提交于 9月 03, 2018

Consider the case L1 had a IRQ/NMI event until it executed
VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed
(e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME,
L0 needs to evaluate if this pending event should cause an exit from
L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept
EXTERNAL_INTERRUPT).

Usually this would be handled by L0 requesting a IRQ/NMI window
by setting VMCS accordingly. However, this setting was done on
VMCS01 and now VMCS02 is active instead. Thus, when L1 executes
VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by
requesting a KVM_REQ_EVENT.

Note that above scenario exists when L1 KVM is about to enter L2 but
requests an "immediate-exit". As in this case, L1 will
disable-interrupts and then send a self-IPI before entering L2.
Reviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com>
Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NLiran Alon <liran.alon@oracle.com>
Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>

b5861e5c

07 9月, 2018 2 次提交

arm64: KVM: Remove pgd_lock · df3190e2

由 Steven Price 提交于 8月 13, 2018

The lock has never been used and the page tables are protected by
mmu_lock in struct kvm.
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NSteven Price <steven.price@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

df3190e2

KVM: Remove obsolete kvm_unmap_hva notifier backend · a35381e1

由 Marc Zyngier 提交于 8月 23, 2018

kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to
deal with. Drop the now obsolete code.

Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2")
Cc: James Hogan <jhogan@kernel.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>

a35381e1

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功