1. 01 6月, 2020 2 次提交
    • V
      KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info · 68fd66f1
      Vitaly Kuznetsov 提交于
      Currently, APF mechanism relies on the #PF abuse where the token is being
      passed through CR2. If we switch to using interrupts to deliver page-ready
      notifications we need a different way to pass the data. Extent the existing
      'struct kvm_vcpu_pv_apf_data' with token information for page-ready
      notifications.
      
      While on it, rename 'reason' to 'flags'. This doesn't change the semantics
      as we only have reasons '1' and '2' and these can be treated as bit flags
      but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
      making 'reason' name misleading.
      
      The newly introduced apf_put_user_ready() temporary puts both flags and
      token information, this will be changed to put token only when we switch
      to interrupt based notifications.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      68fd66f1
    • P
      KVM: MMU: pass arbitrary CR0/CR4/EFER to kvm_init_shadow_mmu · 929d1cfa
      Paolo Bonzini 提交于
      This allows fetching the registers from the hsave area when setting
      up the NPT shadow MMU, and is needed for KVM_SET_NESTED_STATE (which
      runs long after the CR0, CR4 and EFER values in vcpu have been switched
      to hold L2 guest state).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      929d1cfa
  2. 28 5月, 2020 3 次提交
  3. 19 5月, 2020 2 次提交
    • T
      x86/kvm: Sanitize kvm_async_pf_task_wait() · 6bca69ad
      Thomas Gleixner 提交于
      While working on the entry consolidation I stumbled over the KVM async page
      fault handler and kvm_async_pf_task_wait() in particular. It took me a
      while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
      invocations are just cargo cult programming. Several patches "fixed" RCU
      splats by curing the symptoms without noticing that the code is flawed 
      from a design perspective.
      
      The main problem is that this async injection is not based on a proper
      handshake mechanism and only respects the minimal requirement, i.e. the
      guest is not in a state where it has interrupts disabled.
      
      Aside of that the actual code is a convoluted one fits it all swiss army
      knife. It is invoked from different places with different RCU constraints:
      
        1) Host side:
      
           vcpu_enter_guest()
             kvm_x86_ops->handle_exit()
               kvm_handle_page_fault()
                 kvm_async_pf_task_wait()
      
           The invocation happens from fully preemptible context.
      
        2) Guest side:
      
           The async page fault interrupted:
      
               a) user space
      
      	 b) preemptible kernel code which is not in a RCU read side
      	    critical section
      
           	 c) non-preemtible kernel code or a RCU read side critical section
      	    or kernel code with CONFIG_PREEMPTION=n which allows not to
      	    differentiate between #2b and #2c.
      
      RCU is watching for:
      
        #1  The vCPU exited and current is definitely not the idle task
      
        #2a The #PF entry code on the guest went through enter_from_user_mode()
            which reactivates RCU
      
        #2b There is no preemptible, interrupts enabled code in the kernel
            which can run with RCU looking away. (The idle task is always
            non preemptible).
      
      I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU
      voodoo at all.
      
      In #2c RCU is eventually not watching, but as that state cannot schedule
      anyway there is no point to worry about it so it has to invoke
      rcu_irq_enter() before running that code. This can be optimized, but this
      will be done as an extra step in course of the entry code consolidation
      work.
      
      So the proper solution for this is to:
      
        - Split kvm_async_pf_task_wait() into schedule and halt based waiting
          interfaces which share the enqueueing code.
      
        - Add comments (condensed form of this changelog) to spare others the
          time waste and pain of reverse engineering all of this with the help of
          uncomprehensible changelogs and code history.
      
        - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(),
          user mode and schedulable kernel side async page faults (#1, #2a, #2b)
      
        - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel
          case (#2c).
      
          For this case also remove the rcu_irq_exit()/enter() pair around the
          halt as it is just a pointless exercise:
      
             - vCPUs can VMEXIT at any random point and can be scheduled out for
               an arbitrary amount of time by the host and this is not any
               different except that it voluntary triggers the exit via halt.
      
             - The interrupted context could have RCU watching already. So the
      	 rcu_irq_exit() before the halt is not gaining anything aside of
      	 confusing the reader. Claiming that this might prevent RCU stalls
      	 is just an illusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
      
      6bca69ad
    • P
      KVM: x86: only do L1TF workaround on affected processors · d43e2675
      Paolo Bonzini 提交于
      KVM stores the gfn in MMIO SPTEs as a caching optimization.  These are split
      in two parts, as in "[high 11111 low]", to thwart any attempt to use these bits
      in an L1TF attack.  This works as long as there are 5 free bits between
      MAXPHYADDR and bit 50 (inclusive), leaving bit 51 free so that the MMIO
      access triggers a reserved-bit-set page fault.
      
      The bit positions however were computed wrongly for AMD processors that have
      encryption support.  In this case, x86_phys_bits is reduced (for example
      from 48 to 43, to account for the C bit at position 47 and four bits used
      internally to store the SEV ASID and other stuff) while x86_cache_bits in
      would remain set to 48, and _all_ bits between the reduced MAXPHYADDR
      and bit 51 are set.  Then low_phys_bits would also cover some of the
      bits that are set in the shadow_mmio_value, terribly confusing the gfn
      caching mechanism.
      
      To fix this, avoid splitting gfns as long as the processor does not have
      the L1TF bug (which includes all AMD processors).  When there is no
      splitting, low_phys_bits can be set to the reduced MAXPHYADDR removing
      the overlap.  This fixes "npt=0" operation on EPYC processors.
      
      Thanks to Maxim Levitsky for bisecting this bug.
      
      Cc: stable@vger.kernel.org
      Fixes: 52918ed5 ("KVM: SVM: Override default MMIO mask if memory encryption is enabled")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d43e2675
  4. 16 5月, 2020 3 次提交
  5. 14 5月, 2020 1 次提交
  6. 21 4月, 2020 13 次提交
  7. 31 3月, 2020 1 次提交
  8. 17 3月, 2020 15 次提交