1. 18 6月, 2021 1 次提交
  2. 27 5月, 2021 3 次提交
    • M
      KVM: VMX: update vcpu posted-interrupt descriptor when assigning device · a2486020
      Marcelo Tosatti 提交于
      For VMX, when a vcpu enters HLT emulation, pi_post_block will:
      
      1) Add vcpu to per-cpu list of blocked vcpus.
      
      2) Program the posted-interrupt descriptor "notification vector"
      to POSTED_INTR_WAKEUP_VECTOR
      
      With interrupt remapping, an interrupt will set the PIR bit for the
      vector programmed for the device on the CPU, test-and-set the
      ON bit on the posted interrupt descriptor, and if the ON bit is clear
      generate an interrupt for the notification vector.
      
      This way, the target CPU wakes upon a device interrupt and wakes up
      the target vcpu.
      
      Problem is that pi_post_block only programs the notification vector
      if kvm_arch_has_assigned_device() is true. Its possible for the
      following to happen:
      
      1) vcpu V HLTs on pcpu P, kvm_arch_has_assigned_device is false,
      notification vector is not programmed
      2) device is assigned to VM
      3) device interrupts vcpu V, sets ON bit
      (notification vector not programmed, so pcpu P remains in idle)
      4) vcpu 0 IPIs vcpu V (in guest), but since pi descriptor ON bit is set,
      kvm_vcpu_kick is skipped
      5) vcpu 0 busy spins on vcpu V's response for several seconds, until
      RCU watchdog NMIs all vCPUs.
      
      To fix this, use the start_assignment kvm_x86_ops callback to kick
      vcpus out of the halt loop, so the notification vector is
      properly reprogrammed to the wakeup vector.
      Reported-by: NPei Zhang <pezhang@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Message-Id: <20210526172014.GA29007@fuller.cnet>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a2486020
    • M
      KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK · 084071d5
      Marcelo Tosatti 提交于
      KVM_REQ_UNBLOCK will be used to exit a vcpu from
      its inner vcpu halt emulation loop.
      
      Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch
      PowerPC to arch specific request bit.
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      
      Message-Id: <20210525134321.303768132@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      084071d5
    • W
      KVM: PPC: exit halt polling on need_resched() · 6bd5b743
      Wanpeng Li 提交于
      This is inspired by commit 262de410 (kvm: exit halt polling on
      need_resched() as well). Due to PPC implements an arch specific halt
      polling logic, we have to the need_resched() check there as well. This
      patch adds a helper function that can be shared between book3s and generic
      halt-polling loops.
      Reviewed-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NVenkatesh Srinivas <venkateshs@chromium.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Venkatesh Srinivas <venkateshs@chromium.org>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: David Matlack <dmatlack@google.com>
      Cc: Paul Mackerras <paulus@ozlabs.org>
      Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1621339235-11131-1-git-send-email-wanpengli@tencent.com>
      [Make the function inline. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6bd5b743
  3. 07 5月, 2021 1 次提交
  4. 03 5月, 2021 1 次提交
  5. 22 4月, 2021 2 次提交
    • W
      KVM: Boost vCPU candidate in user mode which is delivering interrupt · 52acd22f
      Wanpeng Li 提交于
      Both lock holder vCPU and IPI receiver that has halted are condidate for
      boost. However, the PLE handler was originally designed to deal with the
      lock holder preemption problem. The Intel PLE occurs when the spinlock
      waiter is in kernel mode. This assumption doesn't hold for IPI receiver,
      they can be in either kernel or user mode. the vCPU candidate in user mode
      will not be boosted even if they should respond to IPIs. Some benchmarks
      like pbzip2, swaptions etc do the TLB shootdown in kernel mode and most
      of the time they are running in user mode. It can lead to a large number
      of continuous PLE events because the IPI sender causes PLE events
      repeatedly until the receiver is scheduled while the receiver is not
      candidate for a boost.
      
      This patch boosts the vCPU candidiate in user mode which is delivery
      interrupt. We can observe the speed of pbzip2 improves 10% in 96 vCPUs
      VM in over-subscribe scenario (The host machine is 2 socket, 48 cores,
      96 HTs Intel CLX box). There is no performance regression for other
      benchmarks like Unixbench spawn (most of the time contend read/write
      lock in kernel mode), ebizzy (most of the time contend read/write sem
      and TLB shoodtdown in kernel mode).
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Message-Id: <1618542490-14756-1-git-send-email-wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      52acd22f
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  6. 20 4月, 2021 3 次提交
  7. 17 4月, 2021 7 次提交
    • S
      KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot · 8931a454
      Sean Christopherson 提交于
      Defer acquiring mmu_lock in the MMU notifier paths until a "hit" has been
      detected in the memslots, i.e. don't take the lock for notifications that
      don't affect the guest.
      
      For small VMs, spurious locking is a minor annoyance.  And for "volatile"
      setups where the majority of notifications _are_ relevant, this barely
      qualifies as an optimization.
      
      But, for large VMs (hundreds of threads) with static setups, e.g. no
      page migration, no swapping, etc..., the vast majority of MMU notifier
      callbacks will be unrelated to the guest, e.g. will often be in response
      to the userspace VMM adjusting its own virtual address space.  In such
      large VMs, acquiring mmu_lock can be painful as it blocks vCPUs from
      handling page faults.  In some scenarios it can even be "fatal" in the
      sense that it causes unacceptable brownouts, e.g. when rebuilding huge
      pages after live migration, a significant percentage of vCPUs will be
      attempting to handle page faults.
      
      x86's TDP MMU implementation is especially susceptible to spurious
      locking due it taking mmu_lock for read when handling page faults.
      Because rwlock is fair, a single writer will stall future readers, while
      the writer is itself stalled waiting for in-progress readers to complete.
      This is exacerbated by the MMU notifiers often firing multiple times in
      quick succession, e.g. moving a page will (always?) invoke three separate
      notifiers: .invalidate_range_start(), invalidate_range_end(), and
      .change_pte().  Unnecessarily taking mmu_lock each time means even a
      single spurious sequence can be problematic.
      
      Note, this optimizes only the unpaired callbacks.  Optimizing the
      .invalidate_range_{start,end}() pairs is more complex and will be done in
      a future patch.
      Suggested-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8931a454
    • S
      KVM: Move MMU notifier's mmu_lock acquisition into common helper · f922bd9b
      Sean Christopherson 提交于
      Acquire and release mmu_lock in the __kvm_handle_hva_range() helper
      instead of requiring the caller to do the same.  This paves the way for
      future patches to take mmu_lock if and only if an overlapping memslot is
      found, without also having to introduce the on_lock() shenanigans used
      to manipulate the notifier count and sequence.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-8-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f922bd9b
    • S
      KVM: Kill off the old hva-based MMU notifier callbacks · b4c5936c
      Sean Christopherson 提交于
      Yank out the hva-based MMU notifier APIs now that all architectures that
      use the notifiers have moved to the gfn-based APIs.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b4c5936c
    • S
      KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7
      Sean Christopherson 提交于
      Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
      does a similar lookup, and some arch code is all but identical across
      multiple architectures.
      
      In addition to consolidating code, this will allow introducing
      optimizations that will benefit all architectures without incurring
      multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
      relevant range exists in the memslots.
      
      The use of __always_inline to avoid indirect call retpolines, as done by
      x86, may also benefit other architectures.
      
      Consolidating the lookups also fixes a wart in x86, where the legacy MMU
      and TDP MMU each do their own memslot walks.
      
      Lastly, future enhancements to the memslot implementation, e.g. to add an
      interval tree to track host address, will need to touch far less arch
      specific code.
      
      MIPS, PPC, and arm64 will be converted one at a time in future patches.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210402005658.3024832-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3039bcc7
    • S
      KVM: Assert that notifier count is elevated in .change_pte() · c13fda23
      Sean Christopherson 提交于
      In KVM's .change_pte() notification callback, replace the notifier
      sequence bump with a WARN_ON assertion that the notifier count is
      elevated.  An elevated count provides stricter protections than bumping
      the sequence, and the sequence is guarnateed to be bumped before the
      count hits zero.
      
      When .change_pte() was added by commit 828502d3 ("ksm: add
      mmu_notifier set_pte_at_notify()"), bumping the sequence was necessary
      as .change_pte() would be invoked without any surrounding notifications.
      
      However, since commit 6bdb913f ("mm: wrap calls to set_pte_at_notify
      with invalidate_range_start and invalidate_range_end"), all calls to
      .change_pte() are guaranteed to be surrounded by start() and end(), and
      so are guaranteed to run with an elevated notifier count.
      
      Note, wrapping .change_pte() with .invalidate_range_{start,end}() is a
      bug of sorts, as invalidating the secondary MMU's (KVM's) PTE defeats
      the purpose of .change_pte().  Every arch's kvm_set_spte_hva() assumes
      .change_pte() is called when the relevant SPTE is present in KVM's MMU,
      as the original goal was to accelerate Kernel Samepage Merging (KSM) by
      updating KVM's SPTEs without requiring a VM-Exit (due to invalidating
      the SPTE).  I.e. it means that .change_pte() is effectively dead code
      on _all_ architectures.
      
      x86 and MIPS are clearcut nops if the old SPTE is not-present, and that
      is guaranteed due to the prior invalidation.  PPC simply unmaps the SPTE,
      which again should be a nop due to the invalidation.  arm64 is a bit
      murky, but it's also likely a nop because kvm_pgtable_stage2_map() is
      called without a cache pointer, which means it will map an entry if and
      only if an existing PTE was found.
      
      For now, take advantage of the bug to simplify future consolidation of
      KVMs's MMU notifier code.   Doing so will not greatly complicate fixing
      .change_pte(), assuming it's even worth fixing.  .change_pte() has been
      broken for 8+ years and no one has complained.  Even if there are
      KSM+KVM users that care deeply about its performance, the benefits of
      avoiding VM-Exits via .change_pte() need to be reevaluated to justify
      the added complexity and testing burden.  Ripping out .change_pte()
      entirely would be a lot easier.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c13fda23
    • S
      KVM: Explicitly use GFP_KERNEL_ACCOUNT for 'struct kvm_vcpu' allocations · 85f47930
      Sean Christopherson 提交于
      Use GFP_KERNEL_ACCOUNT when allocating vCPUs to make it more obvious that
      that the allocations are accounted, to make it easier to audit KVM's
      allocations in the future, and to be consistent with other cache usage in
      KVM.
      
      When using SLAB/SLUB, this is a nop as the cache itself is created with
      SLAB_ACCOUNT.
      
      When using SLOB, there are caveats within caveats.  SLOB doesn't honor
      SLAB_ACCOUNT, so passing GFP_KERNEL_ACCOUNT will result in vCPU
      allocations now being accounted.   But, even that depends on internal
      SLOB details as SLOB will only go to the page allocator when its cache is
      depleted.  That just happens to be extremely likely for vCPUs because the
      size of kvm_vcpu is larger than the a page for almost all combinations of
      architecture and page size.  Whether or not the SLOB behavior is by
      design is unknown; it's just as likely that no SLOB users care about
      accounding and so no one has bothered to implemented support in SLOB.
      Regardless, accounting vCPU allocations will not break SLOB+KVM+cgroup
      users, if any exist.
      Reviewed-by: NWanpeng Li <kernellwp@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210406190740.4055679-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85f47930
    • S
      KVM: Move arm64's MMU notifier trace events to generic code · 501b9185
      Sean Christopherson 提交于
      Move arm64's MMU notifier trace events into common code in preparation
      for doing the hva->gfn lookup in common code.  The alternative would be
      to trace the gfn instead of hva, but that's not obviously better and
      could also be done in common code.  Tracing the notifiers is also quite
      handy for debug regardless of architecture.
      
      Remove a completely redundant tracepoint from PPC e500.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210326021957.1424875-10-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      501b9185
  8. 23 2月, 2021 1 次提交
    • D
      KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848
      David Stevens 提交于
      Track the range being invalidated by mmu_notifier and skip page fault
      retries if the fault address is not affected by the in-progress
      invalidation. Handle concurrent invalidations by finding the minimal
      range which includes all ranges being invalidated. Although the combined
      range may include unrelated addresses and cannot be shrunk as individual
      invalidation operations complete, it is unlikely the marginal gains of
      proper range tracking are worth the additional complexity.
      
      The primary benefit of this change is the reduction in the likelihood of
      extreme latency when handing a page fault due to another thread having
      been preempted while modifying host virtual addresses.
      Signed-off-by: NDavid Stevens <stevensd@chromium.org>
      Message-Id: <20210222024522.1751719-3-stevensd@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a42d848
  9. 09 2月, 2021 2 次提交
    • S
      KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped() · a9545779
      Sean Christopherson 提交于
      Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
      a so called "remapped" hva/pfn pair.  In theory, the hva could resolve to
      a pfn in high memory on a 32-bit kernel.
      
      This bug was inadvertantly exposed by commit bd2fae8d ("KVM: do not
      assume PTE is writable after follow_pfn"), which added an error PFN value
      to the mix, causing gcc to comlain about overflowing the unsigned long.
      
        arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
        include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
                                        to ‘long unsigned int’ changes value from
                                        ‘9218868437227405314’ to ‘2’ [-Werror=overflow]
         89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
            |                              ^
      virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
      
      Cc: stable@vger.kernel.org
      Fixes: add6a0cd ("KVM: MMU: try to fix up page faults before giving up")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210208201940.1258328-1-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9545779
    • P
      mm: provide a saner PTE walking API for modules · 9fd6dad1
      Paolo Bonzini 提交于
      Currently, the follow_pfn function is exported for modules but
      follow_pte is not.  However, follow_pfn is very easy to misuse,
      because it does not provide protections (so most of its callers
      assume the page is writable!) and because it returns after having
      already unlocked the page table lock.
      
      Provide instead a simplified version of follow_pte that does
      not have the pmdpp and range arguments.  The older version
      survives as follow_invalidate_pte() for use by fs/dax.c.
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9fd6dad1
  10. 04 2月, 2021 2 次提交
    • B
      KVM: x86/mmu: Use an rwlock for the x86 MMU · 531810ca
      Ben Gardon 提交于
      Add a read / write lock to be used in place of the MMU spinlock on x86.
      The rwlock will enable the TDP MMU to handle page faults, and other
      operations in parallel in future commits.
      Reviewed-by: NPeter Feiner <pfeiner@google.com>
      Signed-off-by: NBen Gardon <bgardon@google.com>
      
      Message-Id: <20210202185734.1680553-19-bgardon@google.com>
      [Introduce virt/kvm/mmu_lock.h - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      531810ca
    • P
      KVM: do not assume PTE is writable after follow_pfn · bd2fae8d
      Paolo Bonzini 提交于
      In order to convert an HVA to a PFN, KVM usually tries to use
      the get_user_pages family of functinso.  This however is not
      possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
      
      In doing this however KVM loses the information on whether the
      PFN is writable.  That is usually not a problem because the main
      use of VM_IO vmas with KVM is for BARs in PCI device assignment,
      however it is a bug.  To fix it, use follow_pte and check pte_write
      while under the protection of the PTE lock.  The information can
      be used to fail hva_to_pfn_remapped or passed back to the
      caller via *writable.
      
      Usage of follow_pfn was introduced in commit add6a0cd ("KVM: MMU: try to fix
      up page faults before giving up", 2016-07-05); however, even older version
      have the same issue, all the way back to commit 2e2e3738 ("KVM:
      Handle vma regions with no backing page", 2008-07-20), as they also did
      not check whether the PFN was writable.
      
      Fixes: 2e2e3738 ("KVM: Handle vma regions with no backing page")
      Reported-by: NDavid Stevens <stevensd@google.com>
      Cc: 3pvd@google.com
      Cc: Jann Horn <jannh@google.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bd2fae8d
  11. 21 1月, 2021 1 次提交
  12. 08 1月, 2021 1 次提交
    • L
      kvm: check tlbs_dirty directly · 88bf56d0
      Lai Jiangshan 提交于
      In kvm_mmu_notifier_invalidate_range_start(), tlbs_dirty is used as:
              need_tlb_flush |= kvm->tlbs_dirty;
      with need_tlb_flush's type being int and tlbs_dirty's type being long.
      
      It means that tlbs_dirty is always used as int and the higher 32 bits
      is useless.  We need to check tlbs_dirty in a correct way and this
      change checks it directly without propagating it to need_tlb_flush.
      
      Note: it's _extremely_ unlikely this neglecting of higher 32 bits can
      cause problems in practice.  It would require encountering tlbs_dirty
      on a 4 billion count boundary, and KVM would need to be using shadow
      paging or be running a nested guest.
      
      Cc: stable@vger.kernel.org
      Fixes: a4ee1ca4 ("KVM: MMU: delay flush all tlbs on sync_page path")
      Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
      Message-Id: <20201217154118.16497-1-jiangshanlai@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      88bf56d0
  13. 20 12月, 2020 1 次提交
  14. 15 11月, 2020 5 次提交
    • P
      KVM: Don't allocate dirty bitmap if dirty ring is enabled · 044c59c4
      Peter Xu 提交于
      Because kvm dirty rings and kvm dirty log is used in an exclusive way,
      Let's avoid creating the dirty_bitmap when kvm dirty ring is enabled.
      At the meantime, since the dirty_bitmap will be conditionally created
      now, we can't use it as a sign of "whether this memory slot enabled
      dirty tracking".  Change users like that to check against the kvm
      memory slot flags.
      
      Note that there still can be chances where the kvm memory slot got its
      dirty_bitmap allocated, _if_ the memory slots are created before
      enabling of the dirty rings and at the same time with the dirty
      tracking capability enabled, they'll still with the dirty_bitmap.
      However it should not hurt much (e.g., the bitmaps will always be
      freed if they are there), and the real users normally won't trigger
      this because dirty bit tracking flag should in most cases only be
      applied to kvm slots only before migration starts, that should be far
      latter than kvm initializes (VM starts).
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012226.5868-1-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      044c59c4
    • P
      KVM: Make dirty ring exclusive to dirty bitmap log · b2cc64c4
      Peter Xu 提交于
      There's no good reason to use both the dirty bitmap logging and the
      new dirty ring buffer to track dirty bits.  We should be able to even
      support both of them at the same time, but it could complicate things
      which could actually help little.  Let's simply make it the rule
      before we enable dirty ring on any arch, that we don't allow these two
      interfaces to be used together.
      
      The big world switch would be KVM_CAP_DIRTY_LOG_RING capability
      enablement.  That's where we'll switch from the default dirty logging
      way to the dirty ring way.  As long as kvm->dirty_ring_size is setup
      correctly, we'll once and for all switch to the dirty ring buffer mode
      for the current virtual machine.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012224.5818-1-peterx@redhat.com>
      [Change errno from EINVAL to ENXIO. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2cc64c4
    • P
      KVM: X86: Implement ring-based dirty memory tracking · fb04a1ed
      Peter Xu 提交于
      This patch is heavily based on previous work from Lei Cao
      <lei.cao@stratus.com> and Paolo Bonzini <pbonzini@redhat.com>. [1]
      
      KVM currently uses large bitmaps to track dirty memory.  These bitmaps
      are copied to userspace when userspace queries KVM for its dirty page
      information.  The use of bitmaps is mostly sufficient for live
      migration, as large parts of memory are be dirtied from one log-dirty
      pass to another.  However, in a checkpointing system, the number of
      dirty pages is small and in fact it is often bounded---the VM is
      paused when it has dirtied a pre-defined number of pages. Traversing a
      large, sparsely populated bitmap to find set bits is time-consuming,
      as is copying the bitmap to user-space.
      
      A similar issue will be there for live migration when the guest memory
      is huge while the page dirty procedure is trivial.  In that case for
      each dirty sync we need to pull the whole dirty bitmap to userspace
      and analyse every bit even if it's mostly zeros.
      
      The preferred data structure for above scenarios is a dense list of
      guest frame numbers (GFN).  This patch series stores the dirty list in
      kernel memory that can be memory mapped into userspace to allow speedy
      harvesting.
      
      This patch enables dirty ring for X86 only.  However it should be
      easily extended to other archs as well.
      
      [1] https://patchwork.kernel.org/patch/10471409/Signed-off-by: NLei Cao <lei.cao@stratus.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012222.5767-1-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb04a1ed
    • P
      KVM: Pass in kvm pointer into mark_page_dirty_in_slot() · 28bd726a
      Peter Xu 提交于
      The context will be needed to implement the kvm dirty ring.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20201001012044.5151-5-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      28bd726a
    • P
      KVM: remove kvm_clear_guest_page · 2f541442
      Paolo Bonzini 提交于
      kvm_clear_guest_page is not used anymore after "KVM: X86: Don't track dirty
      for KVM_SET_[TSS_ADDR|IDENTITY_MAP_ADDR]", except from kvm_clear_guest.
      We can just inline it in its sole user.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2f541442
  15. 23 10月, 2020 1 次提交
  16. 22 10月, 2020 1 次提交
  17. 28 9月, 2020 1 次提交
  18. 12 9月, 2020 1 次提交
  19. 22 8月, 2020 1 次提交
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
  20. 13 8月, 2020 1 次提交
  21. 10 7月, 2020 1 次提交
  22. 09 7月, 2020 1 次提交
  23. 02 7月, 2020 1 次提交