1. 01 6月, 2020 12 次提交
  2. 28 5月, 2020 23 次提交
  3. 20 5月, 2020 2 次提交
  4. 19 5月, 2020 3 次提交
    • T
      x86/kvm: Restrict ASYNC_PF to user space · 3a7c8faf
      Thomas Gleixner 提交于
      The async page fault injection into kernel space creates more problems than
      it solves. The host has absolutely no knowledge about the state of the
      guest if the fault happens in CPL0. The only restriction for the host is
      interrupt disabled state. If interrupts are enabled in the guest then the
      exception can hit arbitrary code. The HALT based wait in non-preemotible
      code is a hacky replacement for a proper hypercall.
      
      For the ongoing work to restrict instrumentation and make the RCU idle
      interaction well defined the required extra work for supporting async
      pagefault in CPL0 is just not justified and creates complexity for a
      dubious benefit.
      
      The CPL3 injection is well defined and does not cause any issues as it is
      more or less the same as a regular page fault from CPL3.
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200505134059.369802541@linutronix.de
      
      3a7c8faf
    • T
      x86/kvm: Sanitize kvm_async_pf_task_wait() · 6bca69ad
      Thomas Gleixner 提交于
      While working on the entry consolidation I stumbled over the KVM async page
      fault handler and kvm_async_pf_task_wait() in particular. It took me a
      while to realize that the randomly sprinkled around rcu_irq_enter()/exit()
      invocations are just cargo cult programming. Several patches "fixed" RCU
      splats by curing the symptoms without noticing that the code is flawed 
      from a design perspective.
      
      The main problem is that this async injection is not based on a proper
      handshake mechanism and only respects the minimal requirement, i.e. the
      guest is not in a state where it has interrupts disabled.
      
      Aside of that the actual code is a convoluted one fits it all swiss army
      knife. It is invoked from different places with different RCU constraints:
      
        1) Host side:
      
           vcpu_enter_guest()
             kvm_x86_ops->handle_exit()
               kvm_handle_page_fault()
                 kvm_async_pf_task_wait()
      
           The invocation happens from fully preemptible context.
      
        2) Guest side:
      
           The async page fault interrupted:
      
               a) user space
      
      	 b) preemptible kernel code which is not in a RCU read side
      	    critical section
      
           	 c) non-preemtible kernel code or a RCU read side critical section
      	    or kernel code with CONFIG_PREEMPTION=n which allows not to
      	    differentiate between #2b and #2c.
      
      RCU is watching for:
      
        #1  The vCPU exited and current is definitely not the idle task
      
        #2a The #PF entry code on the guest went through enter_from_user_mode()
            which reactivates RCU
      
        #2b There is no preemptible, interrupts enabled code in the kernel
            which can run with RCU looking away. (The idle task is always
            non preemptible).
      
      I.e. all schedulable states (#1, #2a, #2b) do not need any of this RCU
      voodoo at all.
      
      In #2c RCU is eventually not watching, but as that state cannot schedule
      anyway there is no point to worry about it so it has to invoke
      rcu_irq_enter() before running that code. This can be optimized, but this
      will be done as an extra step in course of the entry code consolidation
      work.
      
      So the proper solution for this is to:
      
        - Split kvm_async_pf_task_wait() into schedule and halt based waiting
          interfaces which share the enqueueing code.
      
        - Add comments (condensed form of this changelog) to spare others the
          time waste and pain of reverse engineering all of this with the help of
          uncomprehensible changelogs and code history.
      
        - Invoke kvm_async_pf_task_wait_schedule() from kvm_handle_page_fault(),
          user mode and schedulable kernel side async page faults (#1, #2a, #2b)
      
        - Invoke kvm_async_pf_task_wait_halt() for the non schedulable kernel
          case (#2c).
      
          For this case also remove the rcu_irq_exit()/enter() pair around the
          halt as it is just a pointless exercise:
      
             - vCPUs can VMEXIT at any random point and can be scheduled out for
               an arbitrary amount of time by the host and this is not any
               different except that it voluntary triggers the exit via halt.
      
             - The interrupted context could have RCU watching already. So the
      	 rcu_irq_exit() before the halt is not gaining anything aside of
      	 confusing the reader. Claiming that this might prevent RCU stalls
      	 is just an illusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200505134059.262701431@linutronix.de
      
      6bca69ad
    • A
      x86/kvm: Handle async page faults directly through do_page_fault() · ef68017e
      Andy Lutomirski 提交于
      KVM overloads #PF to indicate two types of not-actually-page-fault
      events.  Right now, the KVM guest code intercepts them by modifying
      the IDT and hooking the #PF vector.  This makes the already fragile
      fault code even harder to understand, and it also pollutes call
      traces with async_page_fault and do_async_page_fault for normal page
      faults.
      
      Clean it up by moving the logic into do_page_fault() using a static
      branch.  This gets rid of the platform trap_init override mechanism
      completely.
      
      [ tglx: Fixed up 32bit, removed error code from the async functions and
        	massaged coding style ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20200505134059.169270470@linutronix.de
      
      ef68017e