1. 01 2月, 2018 3 次提交
    • V
      x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested · d391f120
      Vitaly Kuznetsov 提交于
      I was investigating an issue with seabios >= 1.10 which stopped working
      for nested KVM on Hyper-V. The problem appears to be in
      handle_ept_violation() function: when we do fast mmio we need to skip
      the instruction so we do kvm_skip_emulated_instruction(). This, however,
      depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
      However, this is not the case.
      
      Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
      EPT MISCONFIG occurs. While on real hardware it was observed to be set,
      some hypervisors follow the spec and don't set it; we end up advancing
      IP with some random value.
      
      I checked with Microsoft and they confirmed they don't fill
      VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
      
      Fix the issue by doing instruction skip through emulator when running
      nested.
      
      Fixes: 68c3b4d1Suggested-by: NRadim Krčmář <rkrcmar@redhat.com>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      d391f120
    • M
      kvm: embed vcpu id to dentry of vcpu anon inode · e46b4692
      Masatake YAMATO 提交于
      All d-entries for vcpu have the same, "anon_inode:kvm-vcpu". That means
      it is impossible to know the mapping between fds for vcpu and vcpu
      from userland.
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu
      
      It is also impossible to know the mapping between vma for kvm_run
      structure and vcpu from userland.
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu
      
      This change adds vcpu id to d-entries for vcpu. With this change
      you can get the following output:
      
          # LC_ALL=C ls -l /proc/617/fd | grep vcpu
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 18 -> anon_inode:kvm-vcpu:0
          lrwx------. 1 qemu qemu 64 Jan  7 16:50 19 -> anon_inode:kvm-vcpu:1
      
          # LC_ALL=C grep vcpu /proc/617/maps
          7f9d842d0000-7f9d842d3000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:0
          7f9d842d3000-7f9d842d6000 rw-s 00000000 00:0d 20393                      anon_inode:kvm-vcpu:1
      
      With the mappings known from the output, a tool like strace can report more details
      of qemu-kvm process activities. Here is the strace output of my local prototype:
      
          # ./strace -KK -f -p 617 2>&1 | grep 'KVM_RUN\| K'
          ...
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=0, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          [pid   664] ioctl(18, KVM_RUN, 0)       = 0 (KVM_EXIT_MMIO)
           K ready_for_interrupt_injection=1, if_flag=1, flags=0, cr8=0000000000000000, apic_base=0x000000fee00d00
           K phys_addr=0, len=1634035803, [33, 0, 0, 0, 0, 0, 0, 0], is_write=112
          ...
      Signed-off-by: NMasatake YAMATO <yamato@redhat.com>
      Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      e46b4692
    • K
      kvm: Map PFN-type memory regions as writable (if possible) · a340b3e2
      KarimAllah Ahmed 提交于
      For EPT-violations that are triggered by a read, the pages are also mapped with
      write permissions (if their memory region is also writable). That would avoid
      getting yet another fault on the same page when a write occurs.
      
      This optimization only happens when you have a "struct page" backing the memory
      region. So also enable it for memory regions that do not have a "struct page".
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a340b3e2
  2. 31 1月, 2018 9 次提交
  3. 26 1月, 2018 12 次提交
  4. 25 1月, 2018 4 次提交
  5. 24 1月, 2018 5 次提交
  6. 23 1月, 2018 3 次提交
  7. 17 1月, 2018 1 次提交
  8. 16 1月, 2018 3 次提交