提交 · 615099b01eb7127fb2f4bc956171a6a0accf688b · openeuler / Kernel

21 1月, 2021 1 次提交

KVM: Forbid the use of tagged userspace addresses for memslots · 139bc8a6

由 Marc Zyngier 提交于 1月 21, 2021

The use of a tagged address could be pretty confusing for the
whole memslot infrastructure as well as the MMU notifiers.

Forbid it altogether, as it never quite worked the first place.

Cc: stable@vger.kernel.org
Reported-by: NRick Edgecombe <rick.p.edgecombe@intel.com>
Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>

139bc8a6

08 1月, 2021 1 次提交

kvm: check tlbs_dirty directly · 88bf56d0

由 Lai Jiangshan 提交于 12月 17, 2020

In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
        need_tlb_flush |= kvm->tlbs_dirty;
with need_tlb_flush's type being int and tlbs_dirty's type being long.

It means that tlbs_dirty is always used as int and the higher 32 bits
is useless.  We need to check tlbs_dirty in a correct way and this
change checks it directly without propagating it to need_tlb_flush.

Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
cause problems in practice.  It would require encountering tlbs_dirty
on a 4 billion count boundary, and KVM would need to be using shadow
paging or be running a nested guest.

Cc: stable@vger.kernel.org
Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88bf56d0

20 12月, 2020 1 次提交

mm, kvm: account kvm_vcpu_mmap to kmemcg · 93bb59ca

由 Shakeel Butt 提交于 12月 18, 2020

A VCPU of a VM can allocate couple of pages which can be mmap'ed by the
user space application. At the moment this memory is not charged to the
memcg of the VMM. On a large machine running large number of VMs or
small number of VMs having large number of VCPUs, this unaccounted
memory can be very significant. So, charge this memory to the memcg of
the VMM. Please note that lifetime of these allocations corresponds to
the lifetime of the VMM.

Link: https://lkml.kernel.org/r/20201106202923.2087414-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93bb59ca

15 11月, 2020 5 次提交

KVM: Don't allocate dirty bitmap if dirty ring is enabled · 044c59c4

由 Peter Xu 提交于 9月 30, 2020

Because kvm dirty rings and kvm dirty log is used in an exclusive way,
Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled.
At the meantime, since the dirty_bitmap will be conditionally created
now, we can't use it as a sign of "whether this memory slot enabled
dirty tracking".  Change users like that to check against the kvm
memory slot flags.

Note that there still can be chances where the kvm memory slot got its
dirty_bitmap allocated, _if_ the memory slots are created before
enabling of the dirty rings and at the same time with the dirty
tracking capability enabled, they'll still with the dirty_bitmap.
However it should not hurt much (e.g., the bitmaps will always be
freed if they are there), and the real users normally won't trigger
this because dirty bit tracking flag should in most cases only be
applied to kvm slots only before migration starts, that should be far
latter than kvm initializes (VM starts).
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012226.5868-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

044c59c4

KVM: Make dirty ring exclusive to dirty bitmap log · b2cc64c4

由 Peter Xu 提交于 9月 30, 2020

There's no good reason to use both the dirty bitmap logging and the
new dirty ring buffer to track dirty bits.  We should be able to even
support both of them at the same time, but it could complicate things
which could actually help little.  Let's simply make it the rule
before we enable dirty ring on any arch, that we don't allow these two
interfaces to be used together.

The big world switch would be KVM_CAP_DIRTY_LOG_RING capability
enablement.  That's where we'll switch from the default dirty logging
way to the dirty ring way.  As long as kvm->dirty_ring_size is setup
correctly, we'll once and for all switch to the dirty ring buffer mode
for the current virtual machine.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012224.5818-1-peterx@redhat.com>
[Change errno from EINVAL to ENXIO. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2cc64c4

KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed

由 Peter Xu 提交于 9月 30, 2020

This patch is heavily based on previous work from Lei Cao
<lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]

KVM currently uses large bitmaps to track dirty memory. These bitmaps
are copied to userspace when userspace queries KVM for its dirty page
information. The use of bitmaps is mostly sufficient for live
migration, as large parts of memory are be dirtied from one log-dirty
pass to another. However, in a checkpointing system, the number of
dirty pages is small and in fact it is often bounded---the VM is
paused when it has dirtied a pre-defined number of pages. Traversing a
large, sparsely populated bitmap to find set bits is time-consuming,
as is copying the bitmap to user-space.

A similar issue will be there for live migration when the guest memory
is huge while the page dirty procedure is trivial. In that case for
each dirty sync we need to pull the whole dirty bitmap to userspace
and analyse every bit even if it's mostly zeros.

The preferred data structure for above scenarios is a dense list of
guest frame numbers (GFN). This patch series stores the dirty list in
kernel memory that can be memory mapped into userspace to allow speedy
harvesting.

This patch enables dirty ring for X86 only. However it should be
easily extended to other archs as well.

[1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012222.5767-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb04a1ed

KVM: Pass in kvm pointer into mark_page_dirty_in_slot() · 28bd726a

由 Peter Xu 提交于 9月 30, 2020

The context will be needed to implement the kvm dirty ring.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201001012044.5151-5-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28bd726a

KVM: remove kvm_clear_guest_page · 2f541442

由 Paolo Bonzini 提交于 11月 06, 2020

kvm_clear_guest_page is not used anymore after "KVM: X86: Don't track dirty
for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]", except from kvm_clear_guest.
We can just inline it in its sole user.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f541442

23 10月, 2020 1 次提交

kvm: x86/mmu: Support dirty logging for the TDP MMU · a6a0b05d

由 Ben Gardon 提交于 10月 14, 2020

Dirty logging is a key feature of the KVM MMU and must be supported by
the TDP MMU. Add support for both the write protection and PML dirty
logging modes.

Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.

This series can be viewed in Gerrit at:
	https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-16-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6a0b05d

22 10月, 2020 1 次提交

KVM: Cache as_id in kvm_memory_slot · 9e9eb226

由 Peter Xu 提交于 10月 14, 2020

Cache the address space ID just like the slot ID.  It will be used in
order to fill in the dirty ring entries.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20201014182700.2888246-7-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e9eb226

28 9月, 2020 1 次提交

KVM: use struct_size() and flex_array_size() helpers in kvm_io_bus_unregister_dev() · 871c433b

由 Rustam Kovhaev 提交于 9月 18, 2020

Make use of the struct_size() helper to avoid any potential type
mistakes and protect against potential integer overflows
Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure
Suggested-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com>
Message-Id: <20200918120500.954436-1-rkovhaev@gmail.com>
Reviewed-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

871c433b

12 9月, 2020 1 次提交

KVM: fix memory leak in kvm_io_bus_unregister_dev() · f6588660

由 Rustam Kovhaev 提交于 9月 07, 2020

when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them

Fixes: 90db1043 ("KVM: kvm_io_bus_unregister_dev() should never fail")
Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707Signed-off-by: NRustam Kovhaev <rkovhaev@gmail.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6588660

22 8月, 2020 1 次提交

KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd

由 Will Deacon 提交于 8月 11, 2020

The 'flags' field of 'struct mmu_notifier_range' is used to indicate
whether invalidate_range_{start,end}() are permitted to block. In the
case of kvm_mmu_notifier_invalidate_range_start(), this field is not
forwarded on to the architecture-specific implementation of
kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
whether or not to block.

Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
architectures are aware as to whether or not they are permitted to block.

Cc: <stable@vger.kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Message-Id: <20200811102725.7121-2-will@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdfe7cbd

13 8月, 2020 1 次提交

mm/gup: remove task_struct pointer for all gup code · 64019a2e

由 Peter Xu 提交于 8月 11, 2020

After the cleanup of page fault accounting, gup does not need to pass
task_struct around any more. Remove that parameter in the whole gup
stack.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Link: http://lkml.kernel.org/r/20200707225021.200906-26-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

64019a2e

10 7月, 2020 1 次提交

KVM: Move x86's MMU memory cache helpers to common KVM code · 6926f95a

由 Sean Christopherson 提交于 7月 02, 2020

Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.
Suggested-by: NChristoffer Dall <christoffer.dall@arm.com>
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200703023545.8771-16-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6926f95a

09 7月, 2020 1 次提交

KVM: x86: take as_id into account when checking PGD · 995decb6

由 Vitaly Kuznetsov 提交于 7月 08, 2020

OVMF booted guest running on shadow pages crashes on TRIPLE FAULT after
enabling paging from SMM. The crash is triggered from mmu_check_root() and
is caused by kvm_is_visible_gfn() searching through memslots with as_id = 0
while vCPU may be in a different context (address space).

Introduce kvm_vcpu_is_visible_gfn() and use it from mmu_check_root().
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200708140023.1476020-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

995decb6

02 7月, 2020 1 次提交

kvm: use more precise cast and do not drop __user · 1393b4aa

由 Paolo Bonzini 提交于 7月 02, 2020

Sparse complains on a call to get_compat_sigset, fix it.  The "if"
right above explains that sigmask_arg->sigset is basically a
compat_sigset_t.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1393b4aa

10 6月, 2020 2 次提交

mmap locking API: use coccinelle to convert mmap_sem rwsem call sites · d8ed45c5

由 Michel Lespinasse 提交于 6月 08, 2020

This change converts the existing mmap_sem rwsem calls to use the new mmap
locking API instead.

The change is generated using coccinelle with the following rule:

// spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir .

@@
expression mm;
@@
(
-init_rwsem
+mmap_init_lock
|
-down_write
+mmap_write_lock
|
-down_write_killable
+mmap_write_lock_killable
|
-down_write_trylock
+mmap_write_trylock
|
-up_write
+mmap_write_unlock
|
-downgrade_write
+mmap_write_downgrade
|
-down_read
+mmap_read_lock
|
-down_read_killable
+mmap_read_lock_killable
|
-down_read_trylock
+mmap_read_trylock
|
-up_read
+mmap_read_unlock
)
-(&mm->mmap_sem)
+(mm)
Signed-off-by: NMichel Lespinasse <walken@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NLaurent Dufour <ldufour@linux.ibm.com>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ying Han <yinghan@google.com>
Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8ed45c5

mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4

由 Mike Rapoport 提交于 6月 08, 2020

Patch series "mm: consolidate definitions of page table accessors", v2.

The low level page table accessors (pXY_index(), pXY_offset()) are
duplicated across all architectures and sometimes more than once.  For
instance, we have 31 definition of pgd_offset() for 25 supported
architectures.

Most of these definitions are actually identical and typically it boils
down to, e.g.

static inline unsigned long pmd_index(unsigned long address)
{
        return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
}

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
{
        return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
}

These definitions can be shared among 90% of the arches provided
XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.

For architectures that really need a custom version there is always
possibility to override the generic version with the usual ifdefs magic.

These patches introduce include/linux/pgtable.h that replaces
include/asm-generic/pgtable.h and add the definitions of the page table
accessors to the new header.

This patch (of 12):

The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
functions involving page table manipulations, e.g.  pte_alloc() and
pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
in the files that include <linux/mm.h>.

The include statements in such cases are remove with a simple loop:

	for f in $(git grep -l "include <linux/mm.h>") ; do
		sed -i -e '/include <asm\/pgtable.h>/ d' $f
	done
Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Cain <bcain@codeaurora.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ley Foon Tan <ley.foon.tan@intel.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nick Hu <nickhu@andestech.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vincent Chen <deanbo422@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e31cf2f4

09 6月, 2020 1 次提交

mm/gup.c: convert to use get_user_{page|pages}_fast_only() · dadbb612

由 Souptick Joarder 提交于 6月 07, 2020

API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
align with pin_user_pages_fast_only().

As part of this we will get rid of write parameter.  Instead caller will
pass FOLL_WRITE to get_user_pages_fast_only().  This will not change any
existing functionality of the API.

All the callers are changed to pass FOLL_WRITE.

Also introduce get_user_page_fast_only(), and use it in a few places
that hard-code nr_pages to 1.

Updated the documentation of the API.
Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: Paul Mackerras <paulus@ozlabs.org>		[arch/powerpc/kvm]
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Michal Suchanek <msuchanek@suse.de>
Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dadbb612

08 6月, 2020 1 次提交

KVM: x86: Fix APIC page invalidation race · e649b3f0

由 Eiichi Tsukata 提交于 6月 06, 2020

Commit b1394e74 ("KVM: x86: fix APIC page invalidation") tried
to fix inappropriate APIC page invalidation by re-introducing arch
specific kvm_arch_mmu_notifier_invalidate_range() and calling it from
kvm_mmu_notifier_invalidate_range_start. However, the patch left a
possible race where the VMCS APIC address cache is updated *before*
it is unmapped:

  (Invalidator) kvm_mmu_notifier_invalidate_range_start()
  (Invalidator) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD)
  (KVM VCPU) vcpu_enter_guest()
  (KVM VCPU) kvm_vcpu_reload_apic_access_page()
  (Invalidator) actually unmap page

Because of the above race, there can be a mismatch between the
host physical address stored in the APIC_ACCESS_PAGE VMCS field and
the host physical address stored in the EPT entry for the APIC GPA
(0xfee0000).  When this happens, the processor will not trap APIC
accesses, and will instead show the raw contents of the APIC-access page.
Because Windows OS periodically checks for unexpected modifications to
the LAPIC register, this will show up as a BSOD crash with BugCheck
CRITICAL_STRUCTURE_CORRUPTION (109) we are currently seeing in
https://bugzilla.redhat.com/show_bug.cgi?id=1751017.

The root cause of the issue is that kvm_arch_mmu_notifier_invalidate_range()
cannot guarantee that no additional references are taken to the pages in
the range before kvm_mmu_notifier_invalidate_range_end().  Fortunately,
this case is supported by the MMU notifier API, as documented in
include/linux/mmu_notifier.h:

	 * If the subsystem
         * can't guarantee that no additional references are taken to
         * the pages in the range, it has to implement the
         * invalidate_range() notifier to remove any references taken
         * after invalidate_range_start().

The fix therefore is to reload the APIC-access page field in the VMCS
from kvm_mmu_notifier_invalidate_range() instead of ..._range_start().

Cc: stable@vger.kernel.org
Fixes: b1394e74 ("KVM: x86: fix APIC page invalidation")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=197951Signed-off-by: NEiichi Tsukata <eiichi.tsukata@nutanix.com>
Message-Id: <20200606042627.61070-1-eiichi.tsukata@nutanix.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e649b3f0

05 6月, 2020 1 次提交

KVM: Use vmemdup_user() · 7ec28e26

由 Denis Efremov 提交于 6月 03, 2020

Replace opencoded alloc and copy with vmemdup_user().
Signed-off-by: NDenis Efremov <efremov@linux.com>
Message-Id: <20200603101131.2107303-1-efremov@linux.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ec28e26

04 6月, 2020 1 次提交

KVM: let kvm_destroy_vm_debugfs clean up vCPU debugfs directories · d56f5136

由 Paolo Bonzini 提交于 6月 04, 2020

After commit 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after
last failure point") we are creating the pre-vCPU debugfs files
after the creation of the vCPU file descriptor.  This makes it
possible for userspace to reach kvm_vcpu_release before
kvm_create_vcpu_debugfs has finished.  The vcpu->debugfs_dentry
then does not have any associated inode anymore, and this causes
a NULL-pointer dereference in debugfs_create_file.

The solution is simply to avoid removing the files; they are
cleaned up when the VM file descriptor is closed (and that must be
after KVM_CREATE_VCPU returns).  We can stop storing the dentry
in struct kvm_vcpu too, because it is not needed anywhere after
kvm_create_vcpu_debugfs returns.

Reported-by: syzbot+705f4401d5a93a59b87d@syzkaller.appspotmail.com
Fixes: 63d04348 ("KVM: x86: move kvm_create_vcpu_debugfs after last failure point")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d56f5136

01 6月, 2020 4 次提交

KVM: check userspace_addr for all memslots · 09d952c9

由 Paolo Bonzini 提交于 6月 01, 2020

The userspace_addr alignment and range checks are not performed for private
memory slots that are prepared by KVM itself. This is unnecessary and makes
it questionable to use __*_user functions to access memory later on. We also
rely on the userspace address being aligned since we have an entire family
of functions to map gfn to pfn.

Fortunately skipping the check is completely unnecessary. Only x86 uses
private memslots and their userspace_addr is obtained from vm_mmap,
therefore it must be below PAGE_OFFSET. In fact, any attempt to pass
an address above PAGE_OFFSET would have failed because such an address
would return true for kvm_is_error_hva.
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

09d952c9

KVM: introduce kvm_read_guest_offset_cached() · 0958f0ce

由 Vitaly Kuznetsov 提交于 5月 25, 2020

We already have kvm_write_guest_offset_cached(), introduce read analogue.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-5-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0958f0ce

Revert "KVM: No need to retry for hva_to_pfn_remapped()" · a8387d0b

由 Paolo Bonzini 提交于 5月 29, 2020

This reverts commit 5b494aea.
If unlocked==true then the vma pointer could be invalidated, so the 2nd
follow_pfn() is potentially racy: we do need to get out and redo
find_vma_intersection().
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8387d0b

KVM: check userspace_addr for all memslots · 45f08f4c

由 Paolo Bonzini 提交于 6月 01, 2020

45f08f4c

16 5月, 2020 4 次提交

KVM: Fix spelling in code comments · 656012c7

由 Fuad Tabba 提交于 4月 01, 2020

Fix spelling and typos (e.g., repeated words) in comments.
Signed-off-by: NFuad Tabba <tabba@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20200401140310.29701-1-tabba@google.com

656012c7

kvm: add halt-polling cpu usage stats · cb953129

由 David Matlack 提交于 5月 08, 2020

Two new stats for exposing halt-polling cpu usage:
halt_poll_success_ns
halt_poll_fail_ns

Thus sum of these 2 stats is the total cpu time spent polling. "success"
means the VCPU polled until a virtual interrupt was delivered. "fail"
means the VCPU had to schedule out (either because the maximum poll time
was reached or it needed to yield the CPU).

To avoid touching every arch's kvm_vcpu_stat struct, only update and
export halt-polling cpu usage stats if we're on x86.

Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
~500 years, which seems reasonably large.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>

Message-Id: <20200508182240.68440-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb953129

KVM: VMX: Optimize posted-interrupt delivery for timer fastpath · 379a3c8e

由 Wanpeng Li 提交于 4月 28, 2020

While optimizing posted-interrupt delivery especially for the timer
fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
to introduce substantial latency because the processor has to perform
all vmentry tasks, ack the posted interrupt notification vector,
read the posted-interrupt descriptor etc.

This is not only slow, it is also unnecessary when delivering an
interrupt to the current CPU (as is the case for the LAPIC timer) because
PIR->IRR and IRR->RVI synchronization is already performed on vmentry
Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
fastpath as well.
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

379a3c8e

KVM: No need to retry for hva_to_pfn_remapped() · 5b494aea

由 Peter Xu 提交于 4月 16, 2020

hva_to_pfn_remapped() calls fixup_user_fault(), which has already
handled the retry gracefully.  Even if "unlocked" is set to true, it
means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
however the page fault has already retried and we should have the pfn
set correctly.  No need to do that again.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155906.267462-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b494aea

14 5月, 2020 1 次提交

kvm: Replace vcpu->swait with rcuwait · da4ad88c

由 Davidlohr Bueso 提交于 4月 23, 2020

The use of any sort of waitqueue (simple or regular) for
wait/waking vcpus has always been an overkill and semantically
wrong. Because this is per-vcpu (which is blocked) there is
only ever a single waiting vcpu, thus no need for any sort of
queue.

As such, make use of the rcuwait primitive, with the following
considerations:

  - rcuwait already provides the proper barriers that serialize
  concurrent waiter and waker.

  - Task wakeup is done in rcu read critical region, with a
  stable task pointer.

  - Because there is no concurrency among waiters, we need
  not worry about rcuwait_wait_event() calls corrupting
  the wait->task. As a consequence, this saves the locking
  done in swait when modifying the queue. This also applies
  to per-vcore wait for powerpc kvm-hv.

The x86 tscdeadline_latency test mentioned in 8577370f
("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
latency is reduced by around 15-20% with this change.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mips@vger.kernel.org
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-6-dave@stgolabs.net>
[Avoid extra logic changes. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da4ad88c

08 5月, 2020 1 次提交

KVM: Introduce kvm_make_all_cpus_request_except() · 54163a34

由 Suravee Suthikulpanit 提交于 5月 06, 2020

This allows making request to all other vcpus except the one
specified in the parameter.
Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Message-Id: <1588771076-73790-2-git-send-email-suravee.suthikulpanit@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54163a34

25 4月, 2020 1 次提交

kvm: add capability for halt polling · acd05785

由 David Matlack 提交于 4月 17, 2020

KVM_CAP_HALT_POLL is a per-VM capability that lets userspace
control the halt-polling time, allowing halt-polling to be tuned or
disabled on particular VMs.

With dynamic halt-polling, a VM's VCPUs can poll from anywhere from
[0, halt_poll_ns] on each halt. KVM_CAP_HALT_POLL sets the
upper limit on the poll time.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NJon Cargille <jcargill@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20200417221446.108733-1-jcargill@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

acd05785

21 4月, 2020 4 次提交

KVM: SVM: avoid infinite loop on NPF from bad address · e72436bc

由 Paolo Bonzini 提交于 4月 17, 2020

When a nested page fault is taken from an address that does not have
a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
(via mmu_set_spte) and kvm_mmu_page_fault then invokes svm_need_emulation_on_page_fault.

The default answer there is to return false, but in this case this just
causes the page fault to be retried ad libitum.  Since this is not a
fast path, and the only other case where it is taken is an erratum,
just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
common case where the erratum is not happening.

This fixes an infinite loop in the new set_memory_region_test.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e72436bc

KVM: Remove redundant argument to kvm_arch_vcpu_ioctl_run · 1b94f6f8

由 Tianjia Zhang 提交于 4月 16, 2020

In earlier versions of kvm, 'kvm_run' was an independent structure
and was not included in the vcpu structure. At present, 'kvm_run'
is already included in the vcpu structure, so the parameter
'kvm_run' is redundant.

This patch simplifies the function definition, removes the extra
'kvm_run' parameter, and extracts it from the 'kvm_vcpu' structure
if necessary.
Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com>
Message-Id: <20200416051057.26526-1-tianjia.zhang@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b94f6f8

KVM: x86/mmu: Avoid an extra memslot lookup in try_async_pf() for L2 · c36b7150

由 Paolo Bonzini 提交于 4月 16, 2020

Create a new function kvm_is_visible_memslot() and use it from
kvm_is_visible_gfn(); use the new function in try_async_pf() too,
to avoid an extra memslot lookup.

Opportunistically squish a multi-line comment into a single-line comment.

Note, the end result, KVM_PFN_NOSLOT, is unchanged.

Cc: Jim Mattson <jmattson@google.com>
Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c36b7150

KVM: x86: move kvm_create_vcpu_debugfs after last failure point · 63d04348

由 Paolo Bonzini 提交于 4月 01, 2020

The placement of kvm_create_vcpu_debugfs is more or less irrelevant, since
it cannot fail and userspace should not care about the debugfs entries until
it knows the vcpu has been created. Moving it after the last failure
point removes the need to remove the directory when unwinding the creation.
Reviewed-by: NEmanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20200331224222.393439-1-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

63d04348

16 4月, 2020 1 次提交

KVM: remove redundant assignment to variable r · 788109c1

由 Colin Ian King 提交于 4月 10, 2020

The variable r is being assigned  with a value that is never read
and it is being updated later with a new value.  The initialization is
redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Message-Id: <20200410113526.13822-1-colin.king@canonical.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

788109c1

31 3月, 2020 1 次提交

KVM: Pass kvm_init()'s opaque param to additional arch funcs · b9904085

由 Sean Christopherson 提交于 3月 21, 2020

Pass @opaque to kvm_arch_hardware_setup() and
kvm_arch_check_processor_compat() to allow architecture specific code to
reference @opaque without having to stash it away in a temporary global
variable.  This will enable x86 to separate its vendor specific callback
ops, which are passed via @opaque, into "init" and "runtime" ops without
having to stash away the "init" ops.

No functional change intended.
Reviewed-by: NCornelia Huck <cohuck@redhat.com>
Tested-by: Cornelia Huck <cohuck@redhat.com> #s390
Acked-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200321202603.19355-2-sean.j.christopherson@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b9904085

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功