1. 24 3月, 2018 3 次提交
  2. 21 3月, 2018 1 次提交
    • L
      KVM: nVMX: Do not load EOI-exitmap while running L2 · e40ff1d6
      Liran Alon 提交于
      When L1 IOAPIC redirection-table is written, a request of
      KVM_REQ_SCAN_IOAPIC is set on all vCPUs. This is done such that
      all vCPUs will now recalc their IOAPIC handled vectors and load
      it to their EOI-exitmap.
      
      However, it could be that one of the vCPUs is currently running
      L2. In this case, load_eoi_exitmap() will be called which would
      write to vmcs02->eoi_exit_bitmap, which is wrong because
      vmcs02->eoi_exit_bitmap should always be equal to
      vmcs12->eoi_exit_bitmap. Furthermore, at this point
      KVM_REQ_SCAN_IOAPIC was already consumed and therefore we will
      never update vmcs01->eoi_exit_bitmap. This could lead to remote_irr
      of some IOAPIC level-triggered entry to remain set forever.
      
      Fix this issue by delaying the load of EOI-exitmap to when vCPU
      is running L1.
      
      One may wonder why not just delay entire KVM_REQ_SCAN_IOAPIC
      processing to when vCPU is running L1. This is done in order to handle
      correctly the case where LAPIC & IO-APIC of L1 is pass-throughed into
      L2. In this case, vmcs12->virtual_interrupt_delivery should be 0. In
      current nVMX implementation, that results in
      vmcs02->virtual_interrupt_delivery to also be 0. Thus,
      vmcs02->eoi_exit_bitmap is not used. Therefore, every L2 EOI cause
      a #VMExit into L0 (either on MSR_WRITE to x2APIC MSR or
      APIC_ACCESS/APIC_WRITE/EPT_MISCONFIG to APIC MMIO page).
      In order for such L2 EOI to be broadcasted, if needed, from LAPIC
      to IO-APIC, vcpu->arch.ioapic_handled_vectors must be updated
      while L2 is running. Therefore, patch makes sure to delay only the
      loading of EOI-exitmap but not the update of
      vcpu->arch.ioapic_handled_vectors.
      Reviewed-by: NArbel Moshe <arbel.moshe@oracle.com>
      Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e40ff1d6
  3. 17 3月, 2018 11 次提交
  4. 07 3月, 2018 3 次提交
    • W
      KVM: Introduce paravirtualization hints and KVM_HINTS_DEDICATED · a4429e53
      Wanpeng Li 提交于
      This patch introduces kvm_para_has_hint() to query for hints about
      the configuration of the guests.  The first hint KVM_HINTS_DEDICATED,
      is set if the guest has dedicated physical CPUs for each vCPU (i.e.
      pinning and no over-commitment).  This allows optimizing spinlocks
      and tells the guest to avoid PV TLB flush.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a4429e53
    • K
      KVM: x86: KVM_CAP_SYNC_REGS · 01643c51
      Ken Hofsass 提交于
      This commit implements an enhanced x86 version of S390
      KVM_CAP_SYNC_REGS functionality. KVM_CAP_SYNC_REGS "allow[s]
      userspace to access certain guest registers without having
      to call SET/GET_*REGS”. This reduces ioctl overhead which
      is particularly important when userspace is making synchronous
      guest state modifications (e.g. when emulating and/or intercepting
      instructions).
      
      Originally implemented upstream for the S390, the x86 differences
      follow:
      - userspace can select the register sets to be synchronized with kvm_run
      using bit-flags in the kvm_valid_registers and kvm_dirty_registers
      fields.
      - vcpu_events is available in addition to the regs and sregs register
      sets.
      Signed-off-by: NKen Hofsass <hofsass@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      [Removed wrapper around check for reserved kvm_valid_regs. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      01643c51
    • R
      kvm: x86: hyperv: guest->host event signaling via eventfd · faeb7833
      Roman Kagan 提交于
      In Hyper-V, the fast guest->host notification mechanism is the
      SIGNAL_EVENT hypercall, with a single parameter of the connection ID to
      signal.
      
      Currently this hypercall incurs a user exit and requires the userspace
      to decode the parameters and trigger the notification of the potentially
      different I/O context.
      
      To avoid the costly user exit, process this hypercall and signal the
      corresponding eventfd in KVM, similar to ioeventfd.  The association
      between the connection id and the eventfd is established via the newly
      introduced KVM_HYPERV_EVENTFD ioctl, and maintained in an
      (srcu-protected) IDR.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      [asm/hyperv.h changes approved by KY Srinivasan. - Radim]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      faeb7833
  5. 02 3月, 2018 2 次提交
  6. 01 3月, 2018 1 次提交
    • T
      x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table · 945fd17a
      Thomas Gleixner 提交于
      The separation of the cpu_entry_area from the fixmap missed the fact that
      on 32bit non-PAE kernels the cpu_entry_area mapping might not be covered in
      initial_page_table by the previous synchronizations.
      
      This results in suspend/resume failures because 32bit utilizes initial page
      table for resume. The absence of the cpu_entry_area mapping results in a
      triple fault, aka. insta reboot.
      
      With PAE enabled this works by chance because the PGD entry which covers
      the fixmap and other parts incindentally provides the cpu_entry_area
      mapping as well.
      
      Synchronize the initial page table after setting up the cpu entry
      area. Instead of adding yet another copy of the same code, move it to a
      function and invoke it from the various places.
      
      It needs to be investigated if the existing calls in setup_arch() and
      setup_per_cpu_areas() can be replaced by the later invocation from
      setup_cpu_entry_areas(), but that's beyond the scope of this fix.
      
      Fixes: 92a0f81d ("x86/cpu_entry_area: Move it out of the fixmap")
      Reported-by: NWoody Suwalski <terraluna977@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NWoody Suwalski <terraluna977@gmail.com>
      Cc: William Grant <william.grant@canonical.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802282137290.1392@nanos.tec.linutronix.de
      945fd17a
  7. 28 2月, 2018 2 次提交
  8. 24 2月, 2018 2 次提交
  9. 23 2月, 2018 1 次提交
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
  10. 21 2月, 2018 3 次提交
  11. 20 2月, 2018 6 次提交
  12. 17 2月, 2018 3 次提交
  13. 15 2月, 2018 2 次提交