提交 · b5331379bc62611d1026173a09c73573384201d9 · openeuler / Kernel

22 8月, 2020 2 次提交

KVM: arm64: Only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is not set · b5331379

由 Will Deacon 提交于 8月 11, 2020

When an MMU notifier call results in unmapping a range that spans multiple
PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary,
since this avoids running into RCU stalls during VM teardown. Unfortunately,
if the VM is destroyed as a result of OOM, then blocking is not permitted
and the call to the scheduler triggers the following BUG():

 | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394
 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper
 | INFO: lockdep is turned off.
 | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1
 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015
 | Call trace:
 |  dump_backtrace+0x0/0x284
 |  show_stack+0x1c/0x28
 |  dump_stack+0xf0/0x1a4
 |  ___might_sleep+0x2bc/0x2cc
 |  unmap_stage2_range+0x160/0x1ac
 |  kvm_unmap_hva_range+0x1a0/0x1c8
 |  kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8
 |  __mmu_notifier_invalidate_range_start+0x218/0x31c
 |  mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0
 |  __oom_reap_task_mm+0x128/0x268
 |  oom_reap_task+0xac/0x298
 |  oom_reaper+0x178/0x17c
 |  kthread+0x1e4/0x1fc
 |  ret_from_fork+0x10/0x30

Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we
only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier
flags.

Cc: <stable@vger.kernel.org>
Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd")
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-3-will@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5331379

KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd

由 Will Deacon 提交于 8月 11, 2020

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdfe7cbd

30 7月, 2020 4 次提交

KVM: arm64: Move S1PTW S2 fault logic out of io_mem_abort() · 022c8328

由 Will Deacon 提交于 7月 29, 2020

To allow for re-injection of stage-2 faults on stage-1 page-table walks
due to either a missing or read-only memslot, move the triage logic out
of io_mem_abort() and into kvm_handle_guest_abort(), where these aborts
can be handled before anything else.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-5-will@kernel.org

022c8328

KVM: arm64: Don't skip cache maintenance for read-only memslots · 54dc0d24

由 Will Deacon 提交于 7月 29, 2020

If a guest performs cache maintenance on a read-only memslot, we should
inform userspace rather than skip the instruction altogether.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-4-will@kernel.org

54dc0d24

KVM: arm64: Handle data and instruction external aborts the same way · 84b951a8

由 Will Deacon 提交于 7月 29, 2020

If the guest generates a synchronous external abort which is not handled
by the host, we inject it back into the guest as a virtual SError, but
only if the original fault was reported on the data side. Instruction
faults are reported as "Unsupported FSC", causing the vCPU run loop to
bail with -EFAULT.

Although synchronous external aborts from a guest are pretty unusual,
treat them the same regardless of whether they are taken as data or
instruction aborts by EL2.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-3-will@kernel.org

84b951a8

KVM: arm64: Rename kvm_vcpu_dabt_isextabt() · c9a636f2

由 Will Deacon 提交于 7月 29, 2020

kvm_vcpu_dabt_isextabt() is not specific to data aborts and, unlike
kvm_vcpu_dabt_issext(), has nothing to do with sign extension.

Rename it to 'kvm_vcpu_abt_issea()'.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Quentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20200729102821.23392-2-will@kernel.org

c9a636f2

28 7月, 2020 1 次提交

KVM: arm64: Don't inherit exec permission across page-table levels · b757b47a

由 Will Deacon 提交于 7月 23, 2020

If a stage-2 page-table contains an executable, read-only mapping at the
pte level (e.g. due to dirty logging being enabled), a subsequent write
fault to the same page which tries to install a larger block mapping
(e.g. due to dirty logging having been disabled) will erroneously inherit
the exec permission and consequently skip I-cache invalidation for the
rest of the block.

Ensure that exec permission is only inherited by write faults when the
new mapping is of the same size as the existing one. A subsequent
instruction abort will result in I-cache invalidation for the entire
block mapping.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NQuentin Perret <qperret@google.com>
Reviewed-by: NQuentin Perret <qperret@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20200723101714.15873-1-will@kernel.org

b757b47a

10 7月, 2020 4 次提交

KVM: arm64: clean up redundant 'kvm_run' parameters · 74cc7e0c

由 Tianjia Zhang 提交于 6月 23, 2020

In the current kvm version, 'kvm_run' has been included in the 'kvm_vcpu'
structure. For historical reasons, many kvm-related function parameters
retain the 'kvm_run' and 'kvm_vcpu' parameters at the same time. This
patch does a unified cleanup of these remaining redundant parameters.
Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200623131418.31473-3-tianjia.zhang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

74cc7e0c

KVM: arm64: Use common KVM implementation of MMU memory caches · c1a33aeb

由 Sean Christopherson 提交于 7月 02, 2020

Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.

No functional change intended.

Cc: Marc Zyngier <maz@kernel.org>
Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-19-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1a33aeb

KVM: arm64: Use common code's approach for __GFP_ZERO with memory caches · e539451b

由 Sean Christopherson 提交于 7月 02, 2020

Add a "gfp_zero" member to arm64's 'struct kvm_mmu_memory_cache' to make
the struct and its usage compatible with the common 'struct
kvm_mmu_memory_cache' in linux/kvm_host.h.  This will minimize code
churn when arm64 moves to the common implementation in a future patch, at
the cost of temporarily having somewhat silly code.

Note, GFP_PGTABLE_USER is equivalent to GFP_KERNEL_ACCOUNT | GFP_ZERO:

  #define GFP_PGTABLE_USER  (GFP_PGTABLE_KERNEL | __GFP_ACCOUNT)
  |
  -> #define GFP_PGTABLE_KERNEL        (GFP_KERNEL | __GFP_ZERO)

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

versus

  #define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT)

    with __GFP_ZERO explicitly OR'd in

  == GFP_KERNEL | __GFP_ACCOUNT | __GFP_ZERO

No functional change intended.
Tested-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-18-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e539451b

KVM: arm64: Drop @max param from mmu_topup_memory_cache() · 61ffb3a5

由 Sean Christopherson 提交于 7月 02, 2020

Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache.  This removes a
BUG_ON() and sets the stage for moving arm64 to the common memory cache
implementation.

No functional change intended.
Tested-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-17-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61ffb3a5

07 7月, 2020 2 次提交

KVM: arm64: Use TTL hint in when invalidating stage-2 translations · efaa5b93

由 Marc Zyngier 提交于 1月 02, 2019

Since we often have a precise idea of the level we're dealing with
when invalidating TLBs, we can provide it to as a hint to our
invalidation helper.
Reviewed-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

efaa5b93

KVM: arm64: Factor out stage 2 page table data from struct kvm · a0e50aa3

由 Christoffer Dall 提交于 1月 04, 2019

As we are about to reuse our stage 2 page table manipulation code for
shadow stage 2 page tables in the context of nested virtualization, we
are going to manage multiple stage 2 page tables for a single VM.

This requires some pretty invasive changes to our data structures,
which moves the vmid and pgd pointers into a separate structure and
change pretty much all of our mmu code to operate on this structure
instead.

The new structure is called struct kvm_s2_mmu.

There is no intended functional change by this patch alone.
Reviewed-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NAlexandru Elisei <alexandru.elisei@arm.com>
[Designed data structure layout in collaboration]
Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
Co-developed-by: NMarc Zyngier <maz@kernel.org>
[maz: Moved the last_vcpu_ran down to the S2 MMU structure as well]
Signed-off-by: NMarc Zyngier <maz@kernel.org>

a0e50aa3

06 7月, 2020 1 次提交

KVM: arm64: Rename HSR to ESR · 3a949f4c

由 Gavin Shan 提交于 6月 30, 2020

kvm/arm32 isn't supported since commit 541ad015 ("arm: Remove
32bit KVM host support"). So HSR isn't meaningful since then. This
renames HSR to ESR accordingly. This shouldn't cause any functional
changes:

   * Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() to make the
     function names self-explanatory.
   * Rename variables from @hsr to @esr to make them self-explanatory.

Note that the renaming on uapi and tracepoint will cause ABI changes,
which we should avoid. Specificly, there are 4 related source files
in this regard:

   * arch/arm64/include/uapi/asm/kvm.h  (struct kvm_debug_exit_arch::hsr)
   * arch/arm64/kvm/handle_exit.c       (struct kvm_debug_exit_arch::hsr)
   * arch/arm64/kvm/trace_arm.h         (tracepoints)
   * arch/arm64/kvm/trace_handle_exit.h (tracepoints)
Signed-off-by: NGavin Shan <gshan@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NAndrew Scull <ascull@google.com>
Link: https://lore.kernel.org/r/20200630015705.103366-1-gshan@redhat.com

3a949f4c

10 6月, 2020 1 次提交

mmap locking API: convert mmap_sem call sites missed by coccinelle · 89154dd5

由 Michel Lespinasse 提交于 6月 08, 2020

Convert the last few remaining mmap_sem rwsem calls to use the new mmap
locking API.  These were missed by coccinelle for some reason (I think
coccinelle does not support some of the preprocessor constructs in these
files ?)

[akpm@linux-foundation.org: convert linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]
[akpm@linux-foundation.org: more linux-next leftovers]
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-6-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

89154dd5

05 6月, 2020 1 次提交

arm64: add support for folded p4d page tables · e9f63768

由 Mike Rapoport 提交于 6月 04, 2020

Implement primitives necessary for the 4th level folding, add walks of p4d
level where appropriate, replace 5level-fixup.h with pgtable-nop4d.h and
remove __ARCH_USE_5LEVEL_HACK.

[arnd@arndb.de: fix gcc-10 shift warning]
  Link: http://lkml.kernel.org/r/20200429185657.4085975-1-arnd@arndb.deSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Geert Uytterhoeven <geert+renesas@glider.be>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: James Morse <james.morse@arm.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200414153455.21744-4-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e9f63768

25 5月, 2020 1 次提交

KVM: arm64: Remove obsolete kvm_virt_to_phys abstraction · 0a78791c

由 Andrew Scull 提交于 5月 19, 2020

This abstraction was introduced to hide the difference between arm and
arm64 but, with the former no longer supported, this abstraction can be
removed and the canonical kernel API used directly instead.
Signed-off-by: NAndrew Scull <ascull@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
CC: Marc Zyngier <maz@kernel.org>
CC: James Morse <james.morse@arm.com>
CC: Suzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20200519104036.259917-1-ascull@google.com

0a78791c

16 5月, 2020 6 次提交

KVM: arm64: Support enabling dirty log gradually in small chunks · c862626e

由 Keqian Zhu 提交于 4月 13, 2020

There is already support of enabling dirty log gradually in small chunks
for x86 in commit 3c9bd400 ("KVM: x86: enable dirty log gradually in
small chunks"). This adds support for arm64.

x86 still writes protect all huge pages when DIRTY_LOG_INITIALLY_ALL_SET
is enabled. However, for arm64, both huge pages and normal pages can be
write protected gradually by userspace.

Under the Huawei Kunpeng 920 2.6GHz platform, I did some tests on 128G
Linux VMs with different page size. The memory pressure is 127G in each
case. The time taken of memory_global_dirty_log_start in QEMU is listed
below:

Page Size      Before    After Optimization
  4K            650ms         1.8ms
  2M             4ms          1.8ms
  1G             2ms          1.8ms

Besides the time reduction, the biggest improvement is that we will minimize
the performance side effect (because of dissolving huge pages and marking
memslots dirty) on guest after enabling dirty log.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200413122023.52583-1-zhukeqian1@huawei.com

c862626e

KVM: arm64: Unify handling THP backed host memory · 0529c902

由 Suzuki K Poulose 提交于 5月 07, 2020

We support mapping host memory backed by PMD transparent hugepages
at stage2 as huge pages. However the checks are now spread across
two different places. Let us unify the handling of the THPs to
keep the code cleaner (and future proof for PUD THP support).
This patch moves transparent_hugepage_adjust() closer to the caller
to avoid a forward declaration for fault_supports_stage2_huge_mappings().

Also, since we already handle the case where the host VA and the guest
PA may not be aligned, the explicit VM_BUG_ON() is not required.
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200507123546.1875-3-yuzenghui@huawei.com

0529c902

KVM: arm64: Clean up the checking for huge mapping · 9f283614

由 Suzuki K Poulose 提交于 5月 07, 2020

If we are checking whether the stage2 can map PAGE_SIZE,
we don't have to do the boundary checks as both the host
VMA and the guest memslots are page aligned. Bail the case
easily.

While we're at it, fixup a typo in the comment below.
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200507123546.1875-2-yuzenghui@huawei.com

9f283614

KVM: arm/arm64: Release kvm->mmu_lock in loop to prevent starvation · 48c963e3

由 Jiang Yi 提交于 4月 15, 2020

Do cond_resched_lock() in stage2_flush_memslot() like what is done in
unmap_stage2_range() and other places holding mmu_lock while processing
a possibly large range of memory.
Signed-off-by: NJiang Yi <giangyi@amazon.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20200415084229.29992-1-giangyi@amazon.com

48c963e3

KVM: Fix spelling in code comments · 656012c7

由 Fuad Tabba 提交于 4月 01, 2020

Fix spelling and typos (e.g., repeated words) in comments.
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com

656012c7

KVM: arm64: Move virt/kvm/arm to arch/arm64 · 9ed24f4b

由 Marc Zyngier 提交于 5月 13, 2020

Now that the 32bit KVM/arm host is a distant memory, let's move the
whole of the KVM/arm64 code into the arm64 tree.

As they said in the song: Welcome Home (Sanitarium).
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20200513104034.74741-1-maz@kernel.org

9ed24f4b

17 3月, 2020 4 次提交

KVM: Terminate memslot walks via used_slots · 0577d1ab

由 Sean Christopherson 提交于 2月 18, 2020

Refactor memslot handling to treat the number of used slots as the de
facto size of the memslot array, e.g. return NULL from id_to_memslot()
when an invalid index is provided instead of relying on npages==0 to
detect an invalid memslot. Rework the sorting and walking of memslots
in advance of dynamically sizing memslots to aid bisection and debug,
e.g. with luck, a bug in the refactoring will bisect here and/or hit a
WARN instead of randomly corrupting memory.

Alternatively, a global null/invalid memslot could be returned, i.e. so
callers of id_to_memslot() don't have to explicitly check for a NULL
memslot, but that approach runs the risk of introducing difficult-to-
debug issues, e.g. if the global null slot is modified. Constifying
the return from id_to_memslot() to combat such issues is possible, but
would require a massive refactoring of arch specific code and would
still be susceptible to casting shenanigans.

Add function comments to update_memslots() and search_memslots() to
explicitly (and loudly) state how memslots are sorted.

Opportunistically stuff @hva with a non-canonical value when deleting a
private memslot on x86 to detect bogus usage of the freed slot.

No functional change intended.
Tested-by: NChristoffer Dall <christoffer.dall@arm.com>
Tested-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0577d1ab

KVM: Simplify kvm_free_memslot() and all its descendents · e96c81ee

由 Sean Christopherson 提交于 2月 18, 2020

Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.

No functional change intended.
Tested-by: NChristoffer Dall <christoffer.dall@arm.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e96c81ee

KVM: Drop "const" attribute from old memslot in commit_memory_region() · 9d4c197c

由 Sean Christopherson 提交于 2月 18, 2020

Drop the "const" attribute from @old in kvm_arch_commit_memory_region()
to allow arch specific code to free arch specific resources in the old
memslot without having to cast away the attribute. Freeing resources in
kvm_arch_commit_memory_region() paves the way for simplifying
kvm_free_memslot() by eliminating the last usage of its @dont param.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d4c197c

KVM: Drop kvm_arch_create_memslot() · 414de7ab

由 Sean Christopherson 提交于 2月 18, 2020

Remove kvm_arch_create_memslot() now that all arch implementations are
effectively nops.  Removing kvm_arch_create_memslot() eliminates the
possibility for arch specific code to allocate memory prior to setting
a memslot, which sets the stage for simplifying kvm_free_memslot().

Cc: Janosch Frank <frankja@linux.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

414de7ab

28 1月, 2020 1 次提交

mm: thp: KVM: Explicitly check for THP when populating secondary MMU · 005ba37c

由 Sean Christopherson 提交于 1月 08, 2020

Add a helper, is_transparent_hugepage(), to explicitly check whether a
compound page is a THP and use it when populating KVM's secondary MMU.
The explicit check fixes a bug where a remapped compound page, e.g. for
an XDP Rx socket, is mapped into a KVM guest and is mistaken for a THP,
which results in KVM incorrectly creating a huge page in its secondary
MMU.

Fixes: 936a5fe6 ("thp: kvm mmu transparent hugepage support")
Reported-by: syzbot+c9d1fb51ac9d0d10c39d@syzkaller.appspotmail.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

005ba37c

23 1月, 2020 2 次提交

KVM: arm/arm64: Fix young bit from mmu notifier · cf2d23e0

由 Gavin Shan 提交于 1月 21, 2020

kvm_test_age_hva() is called upon mmu_notifier_test_young(), but wrong
address range has been passed to handle_hva_to_gpa(). With the wrong
address range, no young bits will be checked in handle_hva_to_gpa().
It means zero is always returned from mmu_notifier_test_young().

This fixes the issue by passing correct address range to the underly
function handle_hva_to_gpa(), so that the hardware young (access) bit
will be visited.

Fixes: 35307b9a ("arm/arm64: KVM: Implement Stage-2 page aging")
Signed-off-by: NGavin Shan <gshan@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200121055659.19560-1-gshan@redhat.com

cf2d23e0

KVM: arm/arm64: Cleanup MMIO handling · 0e20f5e2

由 Marc Zyngier 提交于 12月 13, 2019

Our MMIO handling is a bit odd, in the sense that it uses an
intermediate per-vcpu structure to store the various decoded
information that describe the access.

But the same information is readily available in the HSR/ESR_EL2
field, and we actually use this field to populate the structure.

Let's simplify the whole thing by getting rid of the superfluous
structure and save a (tiny) bit of space in the vcpu structure.

[32bit fix courtesy of Olof Johansson <olof@lixom.net>]
Signed-off-by: NMarc Zyngier <maz@kernel.org>

0e20f5e2

20 1月, 2020 1 次提交

KVM: arm/arm64: Re-check VMA on detecting a poisoned page · 1559b758

由 James Morse 提交于 12月 17, 2019

When we check for a poisoned page, we use the VMA to tell userspace
about the looming disaster. But we pass a pointer to this VMA
after having released the mmap_sem, which isn't a good idea.

Instead, stash the shift value that goes with this pfn while
we are holding the mmap_sem.
Reported-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191211165651.7889-3-maz@kernel.org
Link: https://lore.kernel.org/r/20191217123809.197392-1-james.morse@arm.com

1559b758

13 12月, 2019 1 次提交

KVM: arm/arm64: Properly handle faulting of device mappings · 6d674e28

由 Marc Zyngier 提交于 12月 11, 2019

A device mapping is normally always mapped at Stage-2, since there
is very little gain in having it faulted in.

Nonetheless, it is possible to end-up in a situation where the device
mapping has been removed from Stage-2 (userspace munmaped the VFIO
region, and the MMU notifier did its job), but present in a userspace
mapping (userpace has mapped it back at the same address). In such
a situation, the device mapping will be demand-paged as the guest
performs memory accesses.

This requires to be careful when dealing with mapping size, cache
management, and to handle potential execution of a device mapping.
Reported-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Tested-by: NAlexandru Elisei <alexandru.elisei@arm.com>
Reviewed-by: NJames Morse <james.morse@arm.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191211165651.7889-2-maz@kernel.org

6d674e28

07 12月, 2019 1 次提交

KVM: arm/arm64: Remove excessive permission check in kvm_arch_prepare_memory_region · 97418e96

由 Jia He 提交于 12月 06, 2019

In kvm_arch_prepare_memory_region, arm kvm regards the memory region as
writable if the flag has no KVM_MEM_READONLY, and the vm is readonly if
!VM_WRITE.

But there is common usage for setting kvm memory region as follows:
e.g. qemu side (see the PROT_NONE flag)
1. mmap(NULL, size, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
   memory_region_init_ram_ptr()
2. re mmap the above area with read/write authority.

Such example is used in virtio-fs qemu codes which hasn't been upstreamed
[1]. But seems we can't forbid this example.

Without this patch, it will cause an EPERM during kvm_set_memory_region()
and cause qemu boot crash.

As told by Ard, "the underlying assumption is incorrect, i.e., that the
value of vm_flags at this point in time defines how the VMA is used
during its lifetime. There may be other cases where a VMA is created
with VM_READ vm_flags that are changed to VM_READ|VM_WRITE later, and
we are currently rejecting this use case as well."

[1] https://gitlab.com/virtio-fs/qemu/blob/5a356e/hw/virtio/vhost-user-fs.c#L488Suggested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: NJia He <justin.he@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com>
Link: https://lore.kernel.org/r/20191206020802.196108-1-justin.he@arm.com

97418e96

13 7月, 2019 1 次提交

arm64: switch to generic version of pte allocation · 50f11a8a

由 Mike Rapoport 提交于 7月 11, 2019

The PTE allocations in arm64 are identical to the generic ones modulo the
GFP flags.

Using the generic pte_alloc_one() functions ensures that the user page
tables are allocated with __GFP_ACCOUNT set.

The arm64 definition of PGALLOC_GFP is removed and replaced with
GFP_PGTABLE_USER for p[gum]d_alloc_one() for the user page tables and
GFP_PGTABLE_KERNEL for the kernel page tables. The KVM memory cache is now
using GFP_PGTABLE_USER.

The mappings created with create_pgd_mapping() are now using
GFP_PGTABLE_KERNEL.

The conversion to the generic version of pte_free_kernel() removes the NULL
check for pte.

The pte_free() version on arm64 is identical to the generic one and
can be simply dropped.

[cai@lca.pw: fix a bogus GFP flag in pgd_alloc()]
  Link: https://lore.kernel.org/r/1559656836-24940-1-git-send-email-cai@lca.pw/
[and fix it more]
  Link: https://lore.kernel.org/linux-mm/20190617151252.GF16810@rapoport-lnx/
Link: http://lkml.kernel.org/r/1557296232-15361-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Guo Ren <ren_guo@c-sky.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Sam Creasey <sammy@sammy.net>
Cc: Vincent Chen <deanbo422@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50f11a8a

05 6月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 266 · d94d71cb

由 Thomas Gleixner 提交于 5月 29, 2019

Based on 1 normalized pattern(s):

  this program is free software you can redistribute it and or modify
  it under the terms of the gnu general public license version 2 as
  published by the free software foundation this program is
  distributed in the hope that it will be useful but without any
  warranty without even the implied warranty of merchantability or
  fitness for a particular purpose see the gnu general public license
  for more details you should have received a copy of the gnu general
  public license along with this program if not write to the free
  software foundation 51 franklin street fifth floor boston ma 02110
  1301 usa

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-only

has been chosen to replace the boilerplate/reference in 67 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d94d71cb

25 4月, 2019 1 次提交

kvm: arm: Skip stage2 huge mappings for unaligned ipa backed by THP · 2e8010bb

由 Suzuki K Poulose 提交于 4月 10, 2019

With commit a80868f3, we no longer ensure that the
THP page is properly aligned in the guest IPA. Skip the stage2
huge mapping for unaligned IPA backed by transparent hugepages.

Fixes: a80868f3 ("KVM: arm/arm64: Enforce PTE mappings at stage2 when needed")
Reported-by: NEric Auger <eric.auger@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Chirstoffer Dall <christoffer.dall@arm.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Andrew Murray <andrew.murray@arm.com>
Cc: Eric Auger <eric.auger@redhat.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

2e8010bb

09 4月, 2019 1 次提交

KVM: ARM: Remove pgtable page standard functions from stage-2 page tables · 14b94d07

由 Anshuman Khandual 提交于 3月 12, 2019

ARM64 standard pgtable functions are going to use pgtable_page_[ctor|dtor]
or pgtable_pmd_page_[ctor|dtor] constructs. At present KVM guest stage-2
PUD|PMD|PTE level page tabe pages are allocated with __get_free_page()
via mmu_memory_cache_alloc() but released with standard pud|pmd_free() or
pte_free_kernel(). These will fail once they start calling into pgtable_
[pmd]_page_dtor() for pages which never originally went through respective
constructor functions. Hence convert all stage-2 page table page release
functions to call buddy directly while freeing pages.
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Acked-by: NYu Zhao <yuzhao@google.com>
Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

14b94d07

28 3月, 2019 1 次提交

KVM: arm/arm64: Comments cleanup in mmu.c · 8324c3d5

由 Zenghui Yu 提交于 3月 25, 2019

Some comments in virt/kvm/arm/mmu.c are outdated. Update them to
reflect the current state of the code.
Signed-off-by: NZenghui Yu <yuzenghui@huawei.com>
Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
[maz: commit message tidy-up]
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

8324c3d5

21 3月, 2019 1 次提交

KVM: arm/arm64: Fix handling of stage2 huge mappings · 3c3736cd

由 Suzuki K Poulose 提交于 3月 20, 2019

We rely on the mmu_notifier call backs to handle the split/merge
of huge pages and thus we are guaranteed that, while creating a
block mapping, either the entire block is unmapped at stage2 or it
is missing permission.

However, we miss a case where the block mapping is split for dirty
logging case and then could later be made block mapping, if we cancel the
dirty logging. This not only creates inconsistent TLB entries for
the pages in the the block, but also leakes the table pages for
PMD level.

Handle this corner case for the huge mappings at stage2 by
unmapping the non-huge mapping for the block. This could potentially
release the upper level table. So we need to restart the table walk
once we unmap the range.

Fixes : ad361f09 ("KVM: ARM: Support hugetlbfs backed huge pages")
Reported-by: NZheng Xiang <zhengxiang9@huawei.com>
Cc: Zheng Xiang <zhengxiang9@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

3c3736cd

20 3月, 2019 1 次提交

KVM: arm/arm64: Enforce PTE mappings at stage2 when needed · a80868f3

由 Suzuki K Poulose 提交于 3月 12, 2019

commit 6794ad54 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
made the checks to skip huge mappings, stricter. However it introduced
a bug where we still use huge mappings, ignoring the flag to
use PTE mappings, by not reseting the vma_pagesize to PAGE_SIZE.

Also, the checks do not cover the PUD huge pages, that was
under review during the same period. This patch fixes both
the issues.

Fixes : 6794ad54 ("KVM: arm/arm64: Fix unintended stage 2 PMD mappings")
Reported-by: NZenghui Yu <yuzenghui@huawei.com>
Cc: Zenghui Yu <yuzenghui@huawei.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>

a80868f3

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功