1. 01 12月, 2022 1 次提交
  2. 30 11月, 2022 2 次提交
    • D
      KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4
      David Woodhouse 提交于
      Closer inspection of the Xen code shows that we aren't supposed to be
      using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
      explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
      If we randomly set the top bit of ->state_entry_time for a guest that
      hasn't asked for it and doesn't expect it, that could make the runtimes
      fail to add up and confuse the guest. Without the flag it's perfectly
      safe for a vCPU to read its own vcpu_runstate_info; just not for one
      vCPU to read *another's*.
      
      I briefly pondered adding a word for the whole set of VMASST_TYPE_*
      flags but the only one we care about for HVM guests is this, so it
      seemed a bit pointless.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d8ba8ba4
    • D
      KVM: x86/xen: Compatibility fixes for shared runstate area · 5ec3289b
      David Woodhouse 提交于
      The guest runstate area can be arbitrarily byte-aligned. In fact, even
      when a sane 32-bit guest aligns the overall structure nicely, the 64-bit
      fields in the structure end up being unaligned due to the fact that the
      32-bit ABI only aligns them to 32 bits.
      
      So setting the ->state_entry_time field to something|XEN_RUNSTATE_UPDATE
      is buggy, because if it's unaligned then we can't update the whole field
      atomically; the low bytes might be observable before the _UPDATE bit is.
      Xen actually updates the *byte* containing that top bit, on its own. KVM
      should do the same.
      
      In addition, we cannot assume that the runstate area fits within a single
      page. One option might be to make the gfn_to_pfn cache cope with regions
      that cross a page — but getting a contiguous virtual kernel mapping of a
      discontiguous set of IOMEM pages is a distinctly non-trivial exercise,
      and it seems this is the *only* current use case for the GPC which would
      benefit from it.
      
      An earlier version of the runstate code did use a gfn_to_hva cache for
      this purpose, but it still had the single-page restriction because it
      used the uhva directly — because it needs to be able to do so atomically
      when the vCPU is being scheduled out, so it used pagefault_disable()
      around the accesses and didn't just use kvm_write_guest_cached() which
      has a fallback path.
      
      So... use a pair of GPCs for the first and potential second page covering
      the runstate area. We can get away with locking both at once because
      nothing else takes more than one GPC lock at a time so we can invent
      a trivial ordering rule.
      
      The common case where it's all in the same page is kept as a fast path,
      but in both cases, the actual guest structure (compat or not) is built
      up from the fields in @vx, following preset pointers to the state and
      times fields. The only difference is whether those pointers point to
      the kernel stack (in the split case) or to guest memory directly via
      the GPC.  The fast path is also fixed to use a byte access for the
      XEN_RUNSTATE_UPDATE bit, then the only real difference is the dual
      memcpy.
      
      Finally, Xen also does write the runstate area immediately when it's
      configured. Flip the kvm_xen_update_runstate() and …_guest() functions
      and call the latter directly when the runstate area is set. This means
      that other ioctls which modify the runstate also write it immediately
      to the guest when they do so, which is also intended.
      
      Update the xen_shinfo_test to exercise the pathological case where the
      XEN_RUNSTATE_UPDATE flag in the top byte of the state_entry_time is
      actually in a different page to the rest of the 64-bit word.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ec3289b
  3. 29 11月, 2022 12 次提交
    • P
      Merge tag 'kvm-s390-next-6.2-1' of... · 1e79a9e3
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-next-6.2-1' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
      
      - Second batch of the lazy destroy patches
      - First batch of KVM changes for kernel virtual != physical address support
      - Removal of a unused function
      1e79a9e3
    • J
      KVM: x86: Advertise PREFETCHIT0/1 CPUID to user space · 29c46979
      Jiaxi Chen 提交于
      Latest Intel platform Granite Rapids has introduced a new instruction -
      PREFETCHIT0/1, which moves code to memory (cache) closer to the
      processor depending on specific hints.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EDX[bit 14]
      
      PREFETCHIT0/1 is on a KVM-only subleaf. Plus an x86_FEATURE definition
      for this feature bit to direct it to the KVM entry.
      
      Advertise PREFETCHIT0/1 to KVM userspace. This is safe because there are
      no new VMX controls or additional host enabling required for guests to
      use this feature.
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Message-Id: <20221125125845.1182922-9-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29c46979
    • J
      KVM: x86: Advertise AVX-NE-CONVERT CPUID to user space · 9977f087
      Jiaxi Chen 提交于
      AVX-NE-CONVERT is a new set of instructions which can convert low
      precision floating point like BF16/FP16 to high precision floating point
      FP32, and can also convert FP32 elements to BF16. This instruction
      allows the platform to have improved AI capabilities and better
      compatibility.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EDX[bit 5]
      
      AVX-NE-CONVERT is on a KVM-only subleaf. Plus an x86_FEATURE definition
      for this feature bit to direct it to the KVM entry.
      
      Advertise AVX-NE-CONVERT to KVM userspace. This is safe because there
      are no new VMX controls or additional host enabling required for guests
      to use this feature.
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Message-Id: <20221125125845.1182922-8-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9977f087
    • J
      KVM: x86: Advertise AVX-VNNI-INT8 CPUID to user space · 24d74b9f
      Jiaxi Chen 提交于
      AVX-VNNI-INT8 is a new set of instructions in the latest Intel platform
      Sierra Forest, aims for the platform to have superior AI capabilities.
      This instruction multiplies the individual bytes of two unsigned or
      unsigned source operands, then adds and accumulates the results into the
      destination dword element size operand.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EDX[bit 4]
      
      AVX-VNNI-INT8 is on a new and sparse CPUID leaf and all bits on this
      leaf have no truly kernel use case for now. Given that and to save space
      for kernel feature bits, move this new leaf to KVM-only subleaf and plus
      an x86_FEATURE definition for AVX-VNNI-INT8 to direct it to the KVM
      entry.
      
      Advertise AVX-VNNI-INT8 to KVM userspace. This is safe because there are
      no new VMX controls or additional host enabling required for guests to
      use this feature.
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Message-Id: <20221125125845.1182922-7-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      24d74b9f
    • J
      x86: KVM: Advertise AVX-IFMA CPUID to user space · 5e85c4eb
      Jiaxi Chen 提交于
      AVX-IFMA is a new instruction in the latest Intel platform Sierra
      Forest. This instruction packed multiplies unsigned 52-bit integers and
      adds the low/high 52-bit products to Qword Accumulators.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EAX[bit 23]
      
      AVX-IFMA is on an expected-dense CPUID leaf and some other bits on this
      leaf have kernel usages. Given that, define this feature bit like
      X86_FEATURE_<name> in kernel. Considering AVX-IFMA itself has no truly
      kernel usages and /proc/cpuinfo has too much unreadable flags, hide this
      one in /proc/cpuinfo.
      
      Advertise AVX-IFMA to KVM userspace. This is safe because there are no
      new VMX controls or additional host enabling required for guests to use
      this feature.
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Message-Id: <20221125125845.1182922-6-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e85c4eb
    • C
      x86: KVM: Advertise AMX-FP16 CPUID to user space · af2872f6
      Chang S. Bae 提交于
      Latest Intel platform Granite Rapids has introduced a new instruction -
      AMX-FP16, which performs dot-products of two FP16 tiles and accumulates
      the results into a packed single precision tile. AMX-FP16 adds FP16
      capability and also allows a FP16 GPU trained model to run faster
      without loss of accuracy or added SW overhead.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EAX[bit 21]
      
      AMX-FP16 is on an expected-dense CPUID leaf and some other bits on this
      leaf have kernel usages. Given that, define this feature bit like
      X86_FEATURE_<name> in kernel. Considering AMX-FP16 itself has no truly
      kernel usages and /proc/cpuinfo has too much unreadable flags, hide this
      one in /proc/cpuinfo.
      
      Advertise AMX-FP16 to KVM userspace. This is safe because there are no
      new VMX controls or additional host enabling required for guests to use
      this feature.
      Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Message-Id: <20221125125845.1182922-5-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      af2872f6
    • J
      x86: KVM: Advertise CMPccXADD CPUID to user space · 6a19d7aa
      Jiaxi Chen 提交于
      CMPccXADD is a new set of instructions in the latest Intel platform
      Sierra Forest. This new instruction set includes a semaphore operation
      that can compare and add the operands if condition is met, which can
      improve database performance.
      
      The bit definition:
      CPUID.(EAX=7,ECX=1):EAX[bit 7]
      
      CMPccXADD is on an expected-dense CPUID leaf and some other bits on this
      leaf have kernel usages. Given that, define this feature bit like
      X86_FEATURE_<name> in kernel. Considering CMPccXADD itself has no truly
      kernel usages and /proc/cpuinfo has too much unreadable flags, hide this
      one in /proc/cpuinfo.
      
      Advertise CMPCCXADD to KVM userspace. This is safe because there are no
      new VMX controls or additional host enabling required for guests to use
      this feature.
      Signed-off-by: NJiaxi Chen <jiaxi.chen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Message-Id: <20221125125845.1182922-4-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6a19d7aa
    • S
      KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs · 047c7229
      Sean Christopherson 提交于
      Rename kvm_cpu_cap_init_scattered() to kvm_cpu_cap_init_kvm_defined() in
      anticipation of adding KVM-only CPUID leafs that aren't recognized by the
      kernel and thus not scattered, i.e. for leafs that are 100% KVM-defined.
      
      Adjust/add comments to kvm_only_cpuid_leafs and KVM_X86_FEATURE to
      document how to create new kvm_only_cpuid_leafs entries for scattered
      features as well as features that are entirely unknown to the kernel.
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20221125125845.1182922-3-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      047c7229
    • S
      KVM: x86: Add BUILD_BUG_ON() to detect bad usage of "scattered" flags · c4690d01
      Sean Christopherson 提交于
      Add a compile-time assert in the SF() macro to detect improper usage,
      i.e. to detect passing in an X86_FEATURE_* flag that isn't actually
      scattered by the kernel.  Upcoming feature flags will be 100% KVM-only
      and will have X86_FEATURE_* macros that point at a kvm_only_cpuid_leafs
      word, not a kernel-defined word.  Using SF() and thus boot_cpu_has() for
      such feature flags would access memory beyond x86_capability[NCAPINTS]
      and at best incorrectly hide a feature, and at worst leak kernel state to
      userspace.
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20221125125845.1182922-2-jiaxi.chen@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4690d01
    • D
      MAINTAINERS: Add KVM x86/xen maintainer list · 7927e275
      David Woodhouse 提交于
      Adding Paul as co-maintainer of Xen support to help ensure that things
      don't fall through the cracks when I spend three months at a time
      travelling...
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Reviewed-by: NPaul Durrant <paul@xen.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7927e275
    • D
      c3f37199
    • P
      KVM: always declare prototype for kvm_arch_irqchip_in_kernel · 3ca9d84e
      Paolo Bonzini 提交于
      Architecture code might want to use it even if CONFIG_HAVE_KVM_IRQ_ROUTING
      is false; for example PPC XICS has KVM_IRQ_LINE and wants to use
      kvm_arch_irqchip_in_kernel from there, but it does not have
      KVM_SET_GSI_ROUTING so the prototype was not provided.
      
      Fixes: d663b8a2 ("KVM: replace direct irq.h inclusion")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3ca9d84e
  4. 24 11月, 2022 4 次提交
  5. 23 11月, 2022 8 次提交
  6. 21 11月, 2022 12 次提交
  7. 19 11月, 2022 1 次提交