1. 12 10月, 2017 24 次提交
  2. 10 10月, 2017 2 次提交
    • L
      KVM: MMU: always terminate page walks at level 1 · 829ee279
      Ladi Prosek 提交于
      is_last_gpte() is not equivalent to the pseudo-code given in commit
      6bb69c9b ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
      value of last_nonleaf_level may override the result even if level == 1.
      
      It is critical for is_last_gpte() to return true on level == 1 to
      terminate page walks. Otherwise memory corruption may occur as level
      is used as an index to various data structures throughout the page
      walking code.  Even though the actual bug would be wherever the MMU is
      initialized (as in the previous patch), be defensive and ensure here
      that is_last_gpte() returns the correct value.
      
      This patch is also enough to fix CVE-2017-12188.
      
      Fixes: 6bb69c9b
      Cc: stable@vger.kernel.org
      Cc: Andy Honig <ahonig@google.com>
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      [Panic if walk_addr_generic gets an incorrect level; this is a serious
       bug and it's not worth a WARN_ON where the recovery path might hide
       further exploitable issues; suggested by Andrew Honig. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      829ee279
    • L
      KVM: nVMX: update last_nonleaf_level when initializing nested EPT · fd19d3b4
      Ladi Prosek 提交于
      The function updates context->root_level but didn't call
      update_last_nonleaf_level so the previous and potentially wrong value
      was used for page walks.  For example, a zero value of last_nonleaf_level
      would allow a potential out-of-bounds access in arch/x86/mmu/paging_tmpl.h's
      walk_addr_generic function (CVE-2017-12188).
      
      Fixes: 155a97a3Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fd19d3b4
  3. 06 10月, 2017 1 次提交
  4. 05 10月, 2017 2 次提交
    • J
      x86/kvm: Move kvm_fastop_exception to .fixup section · f26e6016
      Josh Poimboeuf 提交于
      When compiling the kernel with the '-frecord-gcc-switches' flag, objtool
      complains:
      
        arch/x86/kvm/emulate.o: warning: objtool: .GCC.command.line+0x0: special: can't find new instruction
      
      And also the kernel fails to link.
      
      The problem is that the 'kvm_fastop_exception' code gets placed into the
      throwaway '.GCC.command.line' section instead of '.text'.
      
      Exception fixup code is conventionally placed in the '.fixup' section,
      so put it there where it belongs.
      Reported-and-tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f26e6016
    • B
      kvm/x86: Avoid async PF preempting the kernel incorrectly · a2b7861b
      Boqun Feng 提交于
      Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
      schedule() to reschedule in some cases.  This could result in
      accidentally ending the current RCU read-side critical section early,
      causing random memory corruption in the guest, or otherwise preempting
      the currently running task inside between preempt_disable and
      preempt_enable.
      
      The difficulty to handle this well is because we don't know whether an
      async PF delivered in a preemptible section or RCU read-side critical section
      for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
      are both no-ops in that case.
      
      To cure this, we treat any async PF interrupting a kernel context as one
      that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
      the schedule() path in that case.
      
      To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
      so that we know whether it's called from a context interrupting the
      kernel, and the parameter is set properly in all the callsites.
      
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      a2b7861b
  5. 04 10月, 2017 1 次提交
  6. 30 9月, 2017 2 次提交
  7. 29 9月, 2017 3 次提交
    • B
      kvm/x86: Handle async PF in RCU read-side critical sections · b862789a
      Boqun Feng 提交于
      Sasha Levin reported a WARNING:
      
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      ...
      | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      | 1.10.1-1ubuntu1 04/01/2014
      | Call Trace:
      ...
      | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
      | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
      | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
      | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
      | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
      | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
      | __schedule+0x201/0x2240 kernel/sched/core.c:3292
      | schedule+0x113/0x460 kernel/sched/core.c:3421
      | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
      | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
      | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
      | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
      | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
      | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
      | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
      | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
      | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
      | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
      | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
      | sprintf+0xbe/0xf0 lib/vsprintf.c:2386
      | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
      | get_link fs/namei.c:1047 [inline]
      | link_path_walk+0x1041/0x1490 fs/namei.c:2127
      ...
      
      This happened when the host hit a page fault, and delivered it as in an
      async page fault, while the guest was in an RCU read-side critical
      section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
      but rcu_preempt_note_context_switch() would treat the reschedule as a
      sleep in RCU read-side critical section, which is not allowed (even in
      preemptible RCU).  Thus the WARN.
      
      To cure this, make kvm_async_pf_task_wait() go to the halt path if the
      PF happens in a RCU read-side critical section.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b862789a
    • W
      KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume · 305d0ab4
      Wanpeng Li 提交于
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
       RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       Call Trace:
        ? emulator_read_emulated+0x15/0x20 [kvm]
        ? segmented_read+0xae/0xf0 [kvm]
        vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        x86_emulate_instruction+0x733/0x810 [kvm]
        vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
        ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
        kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
      
      A nested #PF is triggered during L0 emulating instruction for L2. However, it
      doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
      it by queuing the #PF exception instead ,requesting an immediate VM exit from
      L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
      
      This should actually work all the time, making vmx_inject_page_fault_nested
      totally unnecessary.  However, that's not working yet, so this patch can work
      around the issue in the meanwhile.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      305d0ab4
    • J
      x86/asm: Fix inline asm call constraints for GCC 4.4 · 520a13c5
      Josh Poimboeuf 提交于
      The kernel test bot (run by Xiaolong Ye) reported that the following commit:
      
        f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      
      is causing double faults in a kernel compiled with GCC 4.4.
      
      Linus subsequently diagnosed the crash pattern and the buggy commit and found that
      the issue is with this code:
      
        register unsigned int __asm_call_sp asm("esp");
        #define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
      
      Even on a 64-bit kernel, it's using ESP instead of RSP.  That causes GCC
      to produce the following bogus code:
      
        ffffffff8147461d:       89 e0                   mov    %esp,%eax
        ffffffff8147461f:       4c 89 f7                mov    %r14,%rdi
        ffffffff81474622:       4c 89 fe                mov    %r15,%rsi
        ffffffff81474625:       ba 20 00 00 00          mov    $0x20,%edx
        ffffffff8147462a:       89 c4                   mov    %eax,%esp
        ffffffff8147462c:       e8 bf 52 05 00          callq  ffffffff814c98f0 <copy_user_generic_unrolled>
      
      Despite the absurdity of it backing up and restoring the stack pointer
      for no reason, the bug is actually the fact that it's only backing up
      and restoring the lower 32 bits of the stack pointer.  The upper 32 bits
      are getting cleared out, corrupting the stack pointer.
      
      So change the '__asm_call_sp' register variable to be associated with
      the actual full-size stack pointer.
      
      This also requires changing the __ASM_SEL() macro to be based on the
      actual compiled arch size, rather than the CONFIG value, because
      CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
      Otherwise Clang fails to build the kernel because it complains about the
      use of a 64-bit register (RSP) in a 32-bit file.
      Reported-and-Bisected-and-Tested-by: Nkernel test robot <xiaolong.ye@intel.com>
      Diagnosed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      520a13c5
  8. 28 9月, 2017 2 次提交
    • P
      KVM: VMX: use cmpxchg64 · c0a1666b
      Paolo Bonzini 提交于
      This fixes a compilation failure on 32-bit systems.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c0a1666b
    • Z
      xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping · 0d805ee7
      Zhenzhong Duan 提交于
      When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
      mapping overlaps with kernel module virtual space. When mapping in this space
      is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
      left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
      finish at 2MB boundary.
      
      When module loading is just on top of the 2MB space, got below warning:
      
      WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
      Call Trace:
       [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
       [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff8103ca54>] module_alloc+0x64/0x70
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac9a7>] move_module+0x27/0x150
       [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
       [<ffffffff810af0a8>] load_module+0x78/0x640
       [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
       [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
       [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
      
      Then the mapping of 2MB is cleared, finally oops when the page in that space is
      accessed.
      
      BUG: unable to handle kernel paging request at ffff880022600000
      IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
      PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
      Oops: 0002 [#1] SMP
      Call Trace:
       [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
       [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
       [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
       [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
       [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
       [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
       [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
       [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
       [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
       [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
       [<ffffffff81510c10>] do_page_fault+0x140/0x470
       [<ffffffff8150d7d5>] page_fault+0x25/0x30
      
      Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
      The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
      
      -v2: add comment about XEN alignment from Juergen.
      
      References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.htmlSigned-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      
      [boris: added 'xen/mmu' tag to commit subject]
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      0d805ee7
  9. 27 9月, 2017 3 次提交
    • P
      KVM: VMX: simplify and fix vmx_vcpu_pi_load · 31afb2ea
      Paolo Bonzini 提交于
      The simplify part: do not touch pi_desc.nv, we can set it when the
      VCPU is first created.  Likewise, pi_desc.sn is only handled by
      vmx_vcpu_pi_load, do not touch it in __pi_post_block.
      
      The fix part: do not check kvm_arch_has_assigned_device, instead
      check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
      This matches what the previous patch did in pi_post_block.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      31afb2ea
    • P
      KVM: VMX: avoid double list add with VT-d posted interrupts · 8b306e2f
      Paolo Bonzini 提交于
      In some cases, for example involving hot-unplug of assigned
      devices, pi_post_block can forget to remove the vCPU from the
      blocked_vcpu_list.  When this happens, the next call to
      pi_pre_block corrupts the list.
      
      Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
      and WARN instead of adding the element twice in the list.  Second,
      always do the list removal in pi_post_block if vcpu->pre_pcpu is
      set (not -1).
      
      The new code keeps interrupts disabled for the whole duration of
      pi_pre_block/pi_post_block.  This is not strictly necessary, but
      easier to follow.  For the same reason, PI.ON is checked only
      after the cmpxchg, and to handle it we just call the post-block
      code.  This removes duplication of the list removal code.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8b306e2f
    • P
      KVM: VMX: extract __pi_post_block · cd39e117
      Paolo Bonzini 提交于
      Simple code movement patch, preparing for the next one.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd39e117