1. 21 10月, 2019 1 次提交
    • F
      KVM: PPC: Report single stepping capability · 1a9167a2
      Fabiano Rosas 提交于
      When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request
      the next instruction to be single stepped via the
      KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure.
      
      This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order
      to inform userspace about the state of single stepping support.
      
      We currently don't have support for guest single stepping implemented
      in Book3S HV so the capability is only present for Book3S PR and
      BookE.
      Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      1a9167a2
  2. 24 9月, 2019 1 次提交
  3. 11 9月, 2019 1 次提交
  4. 09 9月, 2019 1 次提交
    • M
      KVM: arm/arm64: vgic: Allow more than 256 vcpus for KVM_IRQ_LINE · 92f35b75
      Marc Zyngier 提交于
      While parts of the VGIC support a large number of vcpus (we
      bravely allow up to 512), other parts are more limited.
      
      One of these limits is visible in the KVM_IRQ_LINE ioctl, which
      only allows 256 vcpus to be signalled when using the CPU or PPI
      types. Unfortunately, we've cornered ourselves badly by allocating
      all the bits in the irq field.
      
      Since the irq_type subfield (8 bit wide) is currently only taking
      the values 0, 1 and 2 (and we have been careful not to allow anything
      else), let's reduce this field to only 4 bits, and allocate the
      remaining 4 bits to a vcpu2_index, which acts as a multiplier:
      
        vcpu_id = 256 * vcpu2_index + vcpu_index
      
      With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2)
      allowing this to be discovered, it becomes possible to inject
      PPIs to up to 4096 vcpus. But please just don't.
      
      Whilst we're there, add a clarification about the use of KVM_IRQ_LINE
      on arm, which is not completely conditionned by KVM_CAP_IRQCHIP.
      Reported-by: NZenghui Yu <yuzenghui@huawei.com>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NZenghui Yu <yuzenghui@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      92f35b75
  5. 29 8月, 2019 1 次提交
  6. 24 7月, 2019 1 次提交
  7. 20 7月, 2019 1 次提交
  8. 11 7月, 2019 1 次提交
    • E
      KVM: x86: PMU Event Filter · 66bb8a06
      Eric Hankland 提交于
      Some events can provide a guest with information about other guests or the
      host (e.g. L3 cache stats); providing the capability to restrict access
      to a "safe" set of events would limit the potential for the PMU to be used
      in any side channel attacks. This change introduces a new VM ioctl that
      sets an event filter. If the guest attempts to program a counter for
      any blacklisted or non-whitelisted event, the kernel counter won't be
      created, so any RDPMC/RDMSR will show 0 instances of that event.
      Signed-off-by: NEric Hankland <ehankland@google.com>
      [Lots of changes. All remaining bugs are probably mine. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      66bb8a06
  9. 19 6月, 2019 1 次提交
    • L
      KVM: x86: Modify struct kvm_nested_state to have explicit fields for data · 6ca00dfa
      Liran Alon 提交于
      Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format
      of VMX nested state data in a struct.
      
      In order to avoid changing the ioctl values of
      KVM_{GET,SET}_NESTED_STATE, there is a need to preserve
      sizeof(struct kvm_nested_state). This is done by defining the data
      struct as "data.vmx[0]". It was the most elegant way I found to
      preserve struct size while still keeping struct readable and easy to
      maintain. It does have a misfortunate side-effect that now it has to be
      accessed as "data.vmx[0]" rather than just "data.vmx".
      
      Because we are already modifying these structs, I also modified the
      following:
      * Define the "format" field values as macros.
      * Rename vmcs_pa to vmcs12_pa for better readability.
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      [Remove SVM stubs, add KVM_STATE_NESTED_VMX_VMCS12_SIZE. - Paolo]
      Reviewed-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      6ca00dfa
  10. 18 6月, 2019 1 次提交
  11. 15 6月, 2019 1 次提交
  12. 05 6月, 2019 2 次提交
  13. 08 5月, 2019 1 次提交
  14. 01 5月, 2019 3 次提交
  15. 30 4月, 2019 2 次提交
  16. 29 4月, 2019 1 次提交
  17. 24 4月, 2019 2 次提交
    • A
      KVM: arm64: Add capability to advertise ptrauth for guest · a243c16d
      Amit Daniel Kachhap 提交于
      This patch advertises the capability of two cpu feature called address
      pointer authentication and generic pointer authentication. These
      capabilities depend upon system support for pointer authentication and
      VHE mode.
      
      The current arm64 KVM partially implements pointer authentication and
      support of address/generic authentication are tied together. However,
      separate ABI requirements for both of them is added so that any future
      isolated implementation will not require any ABI changes.
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      a243c16d
    • A
      KVM: arm64: Add userspace flag to enable pointer authentication · a22fa321
      Amit Daniel Kachhap 提交于
      Now that the building blocks of pointer authentication are present, lets
      add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and
      KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer
      authentication for the KVM guest on a per-vcpu basis through the ioctl
      KVM_ARM_VCPU_INIT.
      
      This features will allow the KVM guest to allow the handling of
      pointer authentication instructions or to treat them as undefined
      if not set.
      
      Necessary documentations are added to reflect the changes done.
      Reviewed-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Christoffer Dall <christoffer.dall@arm.com>
      Cc: kvmarm@lists.cs.columbia.edu
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      a22fa321
  18. 19 4月, 2019 4 次提交
  19. 16 4月, 2019 1 次提交
  20. 29 3月, 2019 7 次提交
    • D
      KVM: arm64/sve: Document KVM API extensions for SVE · 50036ad0
      Dave Martin 提交于
      This patch adds sections to the KVM API documentation describing
      the extensions for supporting the Scalable Vector Extension (SVE)
      in guests.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      50036ad0
    • D
      KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG · 395f562f
      Dave Martin 提交于
      KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that
      are not documented (but hopefully not surprising either).  To give
      an indication of what these may mean, this patch adds brief
      documentation.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      395f562f
    • D
      KVM: Documentation: Document arm64 core registers in detail · fd3bc912
      Dave Martin 提交于
      Since the the sizes of individual members of the core arm64
      registers vary, the list of register encodings that make sense is
      not a simple linear sequence.
      
      To clarify which encodings to use, this patch adds a brief list
      to the documentation.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Reviewed-by: NJulien Grall <julien.grall@arm.com>
      Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fd3bc912
    • P
      Documentation: kvm: clarify KVM_SET_USER_MEMORY_REGION · e2788c4a
      Paolo Bonzini 提交于
      The documentation does not mention how to delete a slot, add the
      information.
      Reported-by: NNathaniel McCallum <npmccallum@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e2788c4a
    • S
      KVM: doc: Document the life cycle of a VM and its resources · 919f6cd8
      Sean Christopherson 提交于
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      919f6cd8
    • S
      KVM: Reject device ioctls from processes other than the VM's creator · ddba9180
      Sean Christopherson 提交于
      KVM's API requires thats ioctls must be issued from the same process
      that created the VM.  In other words, userspace can play games with a
      VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the
      creator can do anything useful.  Explicitly reject device ioctls that
      are issued by a process other than the VM's creator, and update KVM's
      API documentation to extend its requirements to device ioctls.
      
      Fixes: 852b6d57 ("kvm: add device control API")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ddba9180
    • S
      KVM: doc: Fix incorrect word ordering regarding supported use of APIs · 5e124900
      Sean Christopherson 提交于
      Per Paolo[1], instantiating multiple VMs in a single process is legal;
      but this conflicts with KVM's API documentation, which states:
      
        The only supported use is one virtual machine per process, and one
        vcpu per thread.
      
      However, an earlier section in the documentation states:
      
         Only run VM ioctls from the same process (address space) that was used
         to create the VM.
      
      and:
      
         Only run vcpu ioctls from the same thread that was used to create the
         vcpu.
      
      This suggests that the conflicting documentation is simply an incorrect
      ordering of of words, i.e. what's really meant is that a virtual machine
      can't be shared across multiple processes and a vCPU can't be shared
      across multiple threads.
      
      Tweak the blurb on issuing ioctls to use a more assertive tone, and
      rewrite the "supported use" sentence to reference said blurb instead of
      poorly restating it in different terms.
      
      Opportunistically add missing punctuation.
      
      [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com
      
      Fixes: 9c1b96e3 ("KVM: Document basic API")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      [Improve notes on asynchronous ioctl]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5e124900
  21. 16 3月, 2019 1 次提交
    • S
      KVM: doc: Document the life cycle of a VM and its resources · eca6be56
      Sean Christopherson 提交于
      The series to add memcg accounting to KVM allocations[1] states:
      
        There are many KVM kernel memory allocations which are tied to the
        life of the VM process and should be charged to the VM process's
        cgroup.
      
      While it is correct to account KVM kernel allocations to the cgroup of
      the process that created the VM, it's technically incorrect to state
      that the KVM kernel memory allocations are tied to the life of the VM
      process.  This is because the VM itself, i.e. struct kvm, is not tied to
      the life of the process which created it, rather it is tied to the life
      of its associated file descriptor.  In other words, kvm_destroy_vm() is
      not invoked until fput() decrements its associated file's refcount to
      zero.  A simple example is to fork() in Qemu and have the child sleep
      indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file
      descriptor *and* the rogue child is killed.
      
      The allocations are guaranteed to be *accounted* to the process which
      created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the
      ioctl() with -EIO if kvm->mm != current->mm.  I.e. the child can keep
      the VM "alive" but can't do anything useful with its reference.
      
      Note that because 'struct kvm' also holds a reference to the mm_struct
      of its owner, the above behavior also applies to userspace allocations.
      
      Given that mucking with a VM's file descriptor can lead to subtle and
      undesirable behavior, e.g. memcg charges persisting after a VM is shut
      down, explicitly document a VM's lifecycle and its impact on the VM's
      resources.
      
      Alternatively, KVM could aggressively free resources when the creating
      process exits, e.g. via mmu_notifier->release().  However, mmu_notifier
      isn't guaranteed to be available, and freeing resources when the creator
      exits is likely to be error prone and fragile as KVM would need to
      ensure that it only freed resources that are truly out of reach. In
      practice, the existing behavior shouldn't be problematic as a properly
      configured system will prevent a child process from being moved out of
      the appropriate cgroup hierarchy, i.e. prevent hiding the process from
      the OOM killer, and will prevent an unprivileged user from being able to
      to hold a reference to struct kvm via another method, e.g. debugfs.
      
      [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eca6be56
  22. 15 12月, 2018 1 次提交
    • V
      x86/kvm/hyper-v: Introduce KVM_GET_SUPPORTED_HV_CPUID · 2bc39970
      Vitaly Kuznetsov 提交于
      With every new Hyper-V Enlightenment we implement we're forced to add a
      KVM_CAP_HYPERV_* capability. While this approach works it is fairly
      inconvenient: the majority of the enlightenments we do have corresponding
      CPUID feature bit(s) and userspace has to know this anyways to be able to
      expose the feature to the guest.
      
      Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one
      cap to rule them all!") returning all Hyper-V CPUID feature leaves.
      
      Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible:
      Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000,
      0x40000001) and we would probably confuse userspace in case we decide to
      return these twice.
      
      KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop
      KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2bc39970
  23. 14 12月, 2018 2 次提交
    • P
      kvm: introduce manual dirty log reprotect · 2a31b9db
      Paolo Bonzini 提交于
      There are two problems with KVM_GET_DIRTY_LOG.  First, and less important,
      it can take kvm->mmu_lock for an extended period of time.  Second, its user
      can actually see many false positives in some cases.  The latter is due
      to a benign race like this:
      
        1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects
           them.
        2. The guest modifies the pages, causing them to be marked ditry.
        3. Userspace actually copies the pages.
        4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though
           they were not written to since (3).
      
      This is especially a problem for large guests, where the time between
      (1) and (3) can be substantial.  This patch introduces a new
      capability which, when enabled, makes KVM_GET_DIRTY_LOG not
      write-protect the pages it returns.  Instead, userspace has to
      explicitly clear the dirty log bits just before using the content
      of the page.  The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a
      64-page granularity rather than requiring to sync a full memslot;
      this way, the mmu_lock is taken for small amounts of time, and
      only a small amount of time will pass between write protection
      of pages and the sending of their content.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a31b9db
    • P
      kvm: make KVM_CAP_ENABLE_CAP_VM architecture agnostic · e5d83c74
      Paolo Bonzini 提交于
      The first such capability to be handled in virt/kvm/ will be manual
      dirty page reprotection.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e5d83c74
  24. 18 10月, 2018 2 次提交
    • J
      kvm: x86: Introduce KVM_CAP_EXCEPTION_PAYLOAD · c4f55198
      Jim Mattson 提交于
      This is a per-VM capability which can be enabled by userspace so that
      the faulting linear address will be included with the information
      about a pending #PF in L2, and the "new DR6 bits" will be included
      with the information about a pending #DB in L2. With this capability
      enabled, the L1 hypervisor can now intercept #PF before CR2 is
      modified. Under VMX, the L1 hypervisor can now intercept #DB before
      DR6 and DR7 are modified.
      
      When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should
      generally provide an appropriate payload when injecting a #PF or #DB
      exception via KVM_SET_VCPU_EVENTS. However, to support restoring old
      checkpoints, this payload is not required.
      
      Note that bit 16 of the "new DR6 bits" is set to indicate that a debug
      exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM
      region while advanced debugging of RTM transactional regions was
      enabled. This is the reverse of DR6.RTM, which is cleared in this
      scenario.
      
      This capability also enables exception.pending in struct
      kvm_vcpu_events, which allows userspace to distinguish between pending
      and injected exceptions.
      Reported-by: NJim Mattson <jmattson@google.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c4f55198
    • J
      kvm: x86: Add exception payload fields to kvm_vcpu_events · 59073aaf
      Jim Mattson 提交于
      The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a
      later commit) adds the following fields to struct kvm_vcpu_events:
      exception_has_payload, exception_payload, and exception.pending.
      
      With this capability set, all of the details of vcpu->arch.exception,
      including the payload for a pending exception, are reported to
      userspace in response to KVM_GET_VCPU_EVENTS.
      
      With this capability clear, the original ABI is preserved, and the
      exception.injected field is set for either pending or injected
      exceptions.
      
      When userspace calls KVM_SET_VCPU_EVENTS with
      KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer
      translated to exception.pending. KVM_SET_VCPU_EVENTS can now only
      establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set.
      Reported-by: NJim Mattson <jmattson@google.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      59073aaf