1. 06 8月, 2018 7 次提交
  2. 25 5月, 2018 1 次提交
  3. 15 5月, 2018 3 次提交
  4. 25 4月, 2018 1 次提交
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
  5. 05 4月, 2018 1 次提交
    • S
      Revert "KVM: X86: Fix SMRAM accessing even if VM is shutdown" · 2c151b25
      Sean Christopherson 提交于
      The bug that led to commit 95e057e2
      was a benign warning (no adverse affects other than the warning
      itself) that was detected by syzkaller.  Further inspection shows
      that the WARN_ON in question, in handle_ept_misconfig(), is
      unnecessary and flawed (this was also briefly discussed in the
      original patch: https://patchwork.kernel.org/patch/10204649).
      
        * The WARN_ON is unnecessary as kvm_mmu_page_fault() will WARN
          if reserved bits are set in the SPTEs, i.e. it covers the case
          where an EPT misconfig occurred because of a KVM bug.
      
        * The WARN_ON is flawed because it will fire on any system error
          code that is hit while handling the fault, e.g. -ENOMEM can be
          returned by mmu_topup_memory_caches() while handling a legitmate
          MMIO EPT misconfig.
      
      The original behavior of returning -EFAULT when userspace munmaps
      an HVA without first removing the memslot is correct and desirable,
      i.e. KVM is letting userspace know it has generated a bad address.
      Returning RET_PF_EMULATE masks the WARN_ON in the EPT misconfig path,
      but does not fix the underlying bug, i.e. the WARN_ON is bogus.
      
      Furthermore, returning RET_PF_EMULATE has the unwanted side effect of
      causing KVM to attempt to emulate an instruction on any page fault
      with an invalid HVA translation, e.g. a not-present EPT violation
      on a VM_PFNMAP VMA whose fault handler failed to insert a PFN.
      
        * There is no guarantee that the fault is directly related to the
          instruction, i.e. the fault could have been triggered by a side
          effect memory access in the guest, e.g. while vectoring a #DB or
          writing a tracing record.  This could cause KVM to effectively
          mask the fault if KVM doesn't model the behavior leading to the
          fault, i.e. emulation could succeed and resume the guest.
      
        * If emulation does fail, KVM will return EMULATION_FAILED instead
          of -EFAULT, which is a red herring as the user will either debug
          a bogus emulation attempt or scratch their head wondering why we
          were attempting emulation in the first place.
      
      TL;DR: revert to returning -EFAULT and remove the bogus WARN_ON in
      handle_ept_misconfig in a future patch.
      
      This reverts commit 95e057e2.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2c151b25
  6. 16 3月, 2018 1 次提交
  7. 24 2月, 2018 1 次提交
    • W
      KVM: X86: Fix SMRAM accessing even if VM is shutdown · 95e057e2
      Wanpeng Li 提交于
      Reported by syzkaller:
      
         WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
         RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
         Call Trace:
          vmx_handle_exit+0xbd/0xe20 [kvm_intel]
          kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
          kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
          do_vfs_ioctl+0xa4/0x6a0
          SyS_ioctl+0x79/0x90
          entry_SYSCALL_64_fastpath+0x25/0x9c
      
      The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
      a second thread to mmap and operate on the same vCPU.  This triggers a race
      condition when running the testcase with multiple threads. Sometimes one thread
      exits with a triple fault while another thread mmaps and operates on the same
      vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
      results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
      in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
      exit with KVM_EXIT_INTERNAL_ERROR.
      
      Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      95e057e2
  8. 13 2月, 2018 1 次提交
  9. 16 1月, 2018 1 次提交
  10. 11 1月, 2018 2 次提交
  11. 14 12月, 2017 2 次提交
    • G
      x86: kvm: mmu: make kvm_mmu_clear_all_pte_masks static · 858ac87f
      Gimcuan Hui 提交于
      The kvm_mmu_clear_all_pte_masks interface is only used by kvm_mmu_module_init
      locally, and does not need to be called by other module, make it static.
      
      This patch cleans up sparse warning:
      symbol 'kvm_mmu_clear_all_pte_masks' was not declared. Should it be static?
      Signed-off-by: NGimcuan Hui <gimcuan@gmail.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      858ac87f
    • W
      KVM: MMU: Fix infinite loop when there is no available mmu page · ed52870f
      Wanpeng Li 提交于
      The below test case can cause infinite loop in kvm when ept=0.
      
          #include <unistd.h>
          #include <sys/syscall.h>
          #include <string.h>
          #include <stdint.h>
          #include <linux/kvm.h>
          #include <fcntl.h>
          #include <sys/ioctl.h>
      
          long r[5];
          int main()
          {
          	r[2] = open("/dev/kvm", O_RDONLY);
          	r[3] = ioctl(r[2], KVM_CREATE_VM, 0);
          	r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7);
          	ioctl(r[4], KVM_RUN, 0);
          }
      
      It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in
      kvm return 1 when kvm fails to allocate root page table which can result
      in beblow infinite loop:
      
          vcpu_run() {
          	for (;;) {
      	    	r = vcpu_enter_guest()::kvm_mmu_reload() returns 1
      	    	if (r <= 0)
      	    		break;
      	    	if (need_resched())
      	    		cond_resched();
            }
          }
      
      This patch fixes it by returning -ENOSPC when there is no available kvm mmu
      page for root page table.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: stable@vger.kernel.org
      Fixes: 26eeb53c (KVM: MMU: Bail out immediately if there is no available mmu page)
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed52870f
  12. 05 12月, 2017 1 次提交
    • B
      KVM: X86: Restart the guest when insn_len is zero and SEV is enabled · 00b10fe1
      Brijesh Singh 提交于
      On AMD platforms, under certain conditions insn_len may be zero on #NPF.
      This can happen if a guest gets a page-fault on data access but the HW
      table walker is not able to read the instruction page (e.g instruction
      page is not present in memory).
      
      Typically, when insn_len is zero, x86_emulate_instruction() walks the
      guest page table and fetches the instruction bytes from guest memory.
      When SEV is enabled, the guest memory is encrypted with guest-specific
      key hence hypervisor will not able to fetch the instruction bytes.
      In those cases we simply restart the guest.
      
      I have encountered this issue when running kernbench inside the guest.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: x86@kernel.org
      Cc: kvm@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      00b10fe1
  13. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  14. 12 10月, 2017 6 次提交
  15. 10 10月, 2017 2 次提交
    • L
      KVM: MMU: always terminate page walks at level 1 · 829ee279
      Ladi Prosek 提交于
      is_last_gpte() is not equivalent to the pseudo-code given in commit
      6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
      value of last_nonleaf_level may override the result even if level == 1.
      
      It is critical for is_last_gpte() to return true on level == 1 to
      terminate page walks. Otherwise memory corruption may occur as level
      is used as an index to various data structures throughout the page
      walking code.  Even though the actual bug would be wherever the MMU is
      initialized (as in the previous patch), be defensive and ensure here
      that is_last_gpte() returns the correct value.
      
      This patch is also enough to fix CVE-2017-12188.
      
      Fixes: 6bb69c9b
      Cc: stable@vger.kernel.org
      Cc: Andy Honig <ahonig@google.com>
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      [Panic if walk_addr_generic gets an incorrect level; this is a serious
       bug and it's not worth a WARN_ON where the recovery path might hide
       further exploitable issues; suggested by Andrew Honig. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      829ee279
    • L
      KVM: nVMX: update last_nonleaf_level when initializing nested EPT · fd19d3b4
      Ladi Prosek 提交于
      The function updates context->root_level but didn't call
      update_last_nonleaf_level so the previous and potentially wrong value
      was used for page walks.  For example, a zero value of last_nonleaf_level
      would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's
      walk_addr_generic function (CVE-2017-12188).
      
      Fixes: 155a97a3Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd19d3b4
  16. 05 10月, 2017 1 次提交
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  17. 26 8月, 2017 1 次提交
  18. 25 8月, 2017 3 次提交
    • P
      KVM: MMU: speedup update_permission_bitmask · 09f037aa
      Paolo Bonzini 提交于
      update_permission_bitmask currently does a 128-iteration loop to,
      essentially, compute a constant array.  Computing the 8 bits in parallel
      reduces it to 16 iterations, and is enough to speed it up substantially
      because many boolean operations in the inner loop become constants or
      simplify noticeably.
      
      Because update_permission_bitmask is actually the top item in the profile
      for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
      clock cycles, or up to 30%:
      
                                               before     after
         cpuid                                 35173      25954
         vmcall                                35122      27079
         inl_from_pmtimer                      52635      42675
         inl_from_qemu                         53604      44599
         inl_from_kernel                       38498      30798
         outl_to_kernel                        34508      28816
         wr_tsc_adjust_msr                     34185      26818
         rd_tsc_adjust_msr                     37409      27049
         mmio-no-eventfd:pci-mem               50563      45276
         mmio-wildcard-eventfd:pci-mem         34495      30823
         mmio-datamatch-eventfd:pci-mem        35612      31071
         portio-no-eventfd:pci-io              44925      40661
         portio-wildcard-eventfd:pci-io        29708      27269
         portio-datamatch-eventfd:pci-io       31135      27164
      
      (I wrote a small C program to compare the tables for all values of CR0.WP,
      CR4.SMAP and CR4.SMEP, and they match).
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      09f037aa
    • Y
      KVM: MMU: Add 5 level EPT & Shadow page table support. · 855feb67
      Yu Zhang 提交于
      Extends the shadow paging code, so that 5 level shadow page
      table can be constructed if VM is running in 5 level paging
      mode.
      
      Also extends the ept code, so that 5 level ept table can be
      constructed if maxphysaddr of VM exceeds 48 bits. Unlike the
      shadow logic, KVM should still use 4 level ept table for a VM
      whose physical address width is less than 48 bits, even when
      the VM is running in 5 level paging mode.
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      [Unconditionally reset the MMU context in kvm_cpuid_update.
       Changing MAXPHYADDR invalidates the reserved bit bitmasks.
       - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      855feb67
    • Y
      KVM: MMU: Rename PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL. · 2a7266a8
      Yu Zhang 提交于
      Now we have 4 level page table and 5 level page table in 64 bits
      long mode, let's rename the PT64_ROOT_LEVEL to PT64_ROOT_4LEVEL,
      then we can use PT64_ROOT_5LEVEL for 5 level page table, it's
      helpful to make the code more clear.
      
      Also PT64_ROOT_MAX_LEVEL is defined as 4, so that we can just
      redefine it to 5 whenever a replacement is needed for 5 level
      paging.
      Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2a7266a8
  19. 18 8月, 2017 3 次提交
  20. 12 8月, 2017 1 次提交