1. 17 4月, 2021 1 次提交
  2. 09 3月, 2021 1 次提交
  3. 09 2月, 2021 1 次提交
    • V
      KVM: Raise the maximum number of user memslots · 4fc096a9
      Vitaly Kuznetsov 提交于
      Current KVM_USER_MEM_SLOTS limits are arch specific (512 on Power, 509 on x86,
      32 on s390, 16 on MIPS) but they don't really need to be. Memory slots are
      allocated dynamically in KVM when added so the only real limitation is
      'id_to_index' array which is 'short'. We don't have any other
      KVM_MEM_SLOTS_NUM/KVM_USER_MEM_SLOTS-sized statically defined structures.
      
      Low KVM_USER_MEM_SLOTS can be a limiting factor for some configurations.
      In particular, when QEMU tries to start a Windows guest with Hyper-V SynIC
      enabled and e.g. 256 vCPUs the limit is hit as SynIC requires two pages per
      vCPU and the guest is free to pick any GFN for each of them, this fragments
      memslots as QEMU wants to have a separate memslot for each of these pages
      (which are supposed to act as 'overlay' pages).
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210127175731.2020089-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4fc096a9
  4. 10 12月, 2020 1 次提交
  5. 23 6月, 2020 1 次提交
  6. 18 6月, 2020 1 次提交
    • C
      KVM: s390: reduce number of IO pins to 1 · 77491129
      Christian Borntraeger 提交于
      The current number of KVM_IRQCHIP_NUM_PINS results in an order 3
      allocation (32kb) for each guest start/restart. This can result in OOM
      killer activity even with free swap when the memory is fragmented
      enough:
      
      kernel: qemu-system-s39 invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0
      kernel: CPU: 1 PID: 357274 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu
      kernel: Hardware name: IBM 8562 T02 Z06 (LPAR)
      kernel: Call Trace:
      kernel: ([<00000001f848fe2a>] show_stack+0x7a/0xc0)
      kernel:  [<00000001f8d3437a>] dump_stack+0x8a/0xc0
      kernel:  [<00000001f8687032>] dump_header+0x62/0x258
      kernel:  [<00000001f8686122>] oom_kill_process+0x172/0x180
      kernel:  [<00000001f8686abe>] out_of_memory+0xee/0x580
      kernel:  [<00000001f86e66b8>] __alloc_pages_slowpath+0xd18/0xe90
      kernel:  [<00000001f86e6ad4>] __alloc_pages_nodemask+0x2a4/0x320
      kernel:  [<00000001f86b1ab4>] kmalloc_order+0x34/0xb0
      kernel:  [<00000001f86b1b62>] kmalloc_order_trace+0x32/0xe0
      kernel:  [<00000001f84bb806>] kvm_set_irq_routing+0xa6/0x2e0
      kernel:  [<00000001f84c99a4>] kvm_arch_vm_ioctl+0x544/0x9e0
      kernel:  [<00000001f84b8936>] kvm_vm_ioctl+0x396/0x760
      kernel:  [<00000001f875df66>] do_vfs_ioctl+0x376/0x690
      kernel:  [<00000001f875e304>] ksys_ioctl+0x84/0xb0
      kernel:  [<00000001f875e39a>] __s390x_sys_ioctl+0x2a/0x40
      kernel:  [<00000001f8d55424>] system_call+0xd8/0x2c8
      
      As far as I can tell s390x does not use the iopins as we bail our for
      anything other than KVM_IRQ_ROUTING_S390_ADAPTER and the chip/pin is
      only used for KVM_IRQ_ROUTING_IRQCHIP. So let us use a small number to
      reduce the memory footprint.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Link: https://lore.kernel.org/r/20200617083620.5409-1-borntraeger@de.ibm.com
      77491129
  7. 12 6月, 2020 1 次提交
  8. 01 6月, 2020 2 次提交
    • V
      KVM: x86: acknowledgment mechanism for async pf page ready notifications · 557a961a
      Vitaly Kuznetsov 提交于
      If two page ready notifications happen back to back the second one is not
      delivered and the only mechanism we currently have is
      kvm_check_async_pf_completion() check in vcpu_run() loop. The check will
      only be performed with the next vmexit when it happens and in some cases
      it may take a while. With interrupt based page ready notification delivery
      the situation is even worse: unlike exceptions, interrupts are not handled
      immediately so we must check if the slot is empty. This is slow and
      unnecessary. Introduce dedicated MSR_KVM_ASYNC_PF_ACK MSR to communicate
      the fact that the slot is free and host should check its notification
      queue. Mandate using it for interrupt based 'page ready' APF event
      delivery.
      
      As kvm_check_async_pf_completion() is going away from vcpu_run() we need
      a way to communicate the fact that vcpu->async_pf.done queue has
      transitioned from empty to non-empty state. Introduce
      kvm_arch_async_page_present_queued() and KVM_REQ_APF_READY to do the job.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-7-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      557a961a
    • V
      KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() · 7c0ade6c
      Vitaly Kuznetsov 提交于
      An innocent reader of the following x86 KVM code:
      
      bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
      {
              if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
                      return true;
      ...
      
      may get very confused: if APF mechanism is not enabled, why do we report
      that we 'can inject async page present'? In reality, upon injection
      kvm_arch_async_page_present() will check the same condition again and,
      in case APF is disabled, will just drop the item. This is fine as the
      guest which deliberately disabled APF doesn't expect to get any APF
      notifications.
      
      Rename kvm_arch_can_inject_async_page_present() to
      kvm_arch_can_dequeue_async_page_present() to make it clear what we are
      checking: if the item can be dequeued (meaning either injected or just
      dropped).
      
      On s390 kvm_arch_can_inject_async_page_present() always returns 'true' so
      the rename doesn't matter much.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7c0ade6c
  9. 16 5月, 2020 1 次提交
    • D
      kvm: add halt-polling cpu usage stats · cb953129
      David Matlack 提交于
      Two new stats for exposing halt-polling cpu usage:
      halt_poll_success_ns
      halt_poll_fail_ns
      
      Thus sum of these 2 stats is the total cpu time spent polling. "success"
      means the VCPU polled until a virtual interrupt was delivered. "fail"
      means the VCPU had to schedule out (either because the maximum poll time
      was reached or it needed to yield the CPU).
      
      To avoid touching every arch's kvm_vcpu_stat struct, only update and
      export halt-polling cpu usage stats if we're on x86.
      
      Exporting cpu usage as a u64 and in nanoseconds means we will overflow at
      ~500 years, which seems reasonably large.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NJon Cargille <jcargill@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      
      Message-Id: <20200508182240.68440-1-jcargill@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb953129
  10. 24 3月, 2020 1 次提交
  11. 17 3月, 2020 1 次提交
  12. 28 2月, 2020 9 次提交
  13. 31 1月, 2020 1 次提交
  14. 28 1月, 2020 1 次提交
  15. 10 10月, 2019 1 次提交
  16. 02 7月, 2019 1 次提交
    • P
      s390: ap: kvm: add PQAP interception for AQIC · e5282de9
      Pierre Morel 提交于
      We prepare the interception of the PQAP/AQIC instruction for
      the case the AQIC facility is enabled in the guest.
      
      First of all we do not want to change existing behavior when
      intercepting AP instructions without the SIE allowing the guest
      to use AP instructions.
      
      In this patch we only handle the AQIC interception allowed by
      facility 65 which will be enabled when the complete interception
      infrastructure will be present.
      
      We add a callback inside the KVM arch structure for s390 for
      a VFIO driver to handle a specific response to the PQAP
      instruction with the AQIC command and only this command.
      
      But we want to be able to return a correct answer to the guest
      even there is no VFIO AP driver in the kernel.
      Therefor, we inject the correct exceptions from inside KVM for the
      case the callback is not initialized, which happens when the vfio_ap
      driver is not loaded.
      
      We do consider the responsibility of the driver to always initialize
      the PQAP callback if it defines queues by initializing the CRYCB for
      a guest.
      If the callback has been setup we call it.
      If not we setup an answer considering that no queue is available
      for the guest when no callback has been setup.
      Signed-off-by: NPierre Morel <pmorel@linux.ibm.com>
      Reviewed-by: NTony Krowiak <akrowiak@linux.ibm.com>
      Acked-by: NHarald Freudenberger <freude@linux.ibm.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      e5282de9
  17. 05 6月, 2019 1 次提交
    • S
      KVM: Directly return result from kvm_arch_check_processor_compat() · f257d6dc
      Sean Christopherson 提交于
      Add a wrapper to invoke kvm_arch_check_processor_compat() so that the
      boilerplate ugliness of checking virtualization support on all CPUs is
      hidden from the arch specific code.  x86's implementation in particular
      is quite heinous, as it unnecessarily propagates the out-param pattern
      into kvm_x86_ops.
      
      While the x86 specific issue could be resolved solely by changing
      kvm_x86_ops, make the change for all architectures as returning a value
      directly is prettier and technically more robust, e.g. s390 doesn't set
      the out param, which could lead to subtle breakage in the (highly
      unlikely) scenario where the out-param was not pre-initialized by the
      caller.
      
      Opportunistically annotate svm_check_processor_compat() with __init.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f257d6dc
  18. 20 5月, 2019 1 次提交
  19. 26 4月, 2019 1 次提交
  20. 25 4月, 2019 1 次提交
  21. 22 2月, 2019 1 次提交
  22. 21 2月, 2019 1 次提交
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
  23. 05 2月, 2019 7 次提交
  24. 05 10月, 2018 1 次提交
  25. 01 10月, 2018 1 次提交