1. 05 10月, 2017 1 次提交
  2. 30 9月, 2017 2 次提交
  3. 29 9月, 2017 3 次提交
    • B
      kvm/x86: Handle async PF in RCU read-side critical sections · b862789a
      Boqun Feng 提交于
      Sasha Levin reported a WARNING:
      
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | WARNING: CPU: 0 PID: 6974 at kernel/rcu/tree_plugin.h:329
      | rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      ...
      | CPU: 0 PID: 6974 Comm: syz-fuzzer Not tainted 4.13.0-next-20170908+ #246
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      | 1.10.1-1ubuntu1 04/01/2014
      | Call Trace:
      ...
      | RIP: 0010:rcu_preempt_note_context_switch kernel/rcu/tree_plugin.h:329 [inline]
      | RIP: 0010:rcu_note_context_switch+0x16c/0x2210 kernel/rcu/tree.c:458
      | RSP: 0018:ffff88003b2debc8 EFLAGS: 00010002
      | RAX: 0000000000000001 RBX: 1ffff1000765bd85 RCX: 0000000000000000
      | RDX: 1ffff100075d7882 RSI: ffffffffb5c7da20 RDI: ffff88003aebc410
      | RBP: ffff88003b2def30 R08: dffffc0000000000 R09: 0000000000000001
      | R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003b2def08
      | R13: 0000000000000000 R14: ffff88003aebc040 R15: ffff88003aebc040
      | __schedule+0x201/0x2240 kernel/sched/core.c:3292
      | schedule+0x113/0x460 kernel/sched/core.c:3421
      | kvm_async_pf_task_wait+0x43f/0x940 arch/x86/kernel/kvm.c:158
      | do_async_page_fault+0x72/0x90 arch/x86/kernel/kvm.c:271
      | async_page_fault+0x22/0x30 arch/x86/entry/entry_64.S:1069
      | RIP: 0010:format_decode+0x240/0x830 lib/vsprintf.c:1996
      | RSP: 0018:ffff88003b2df520 EFLAGS: 00010283
      | RAX: 000000000000003f RBX: ffffffffb5d1e141 RCX: ffff88003b2df670
      | RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffb5d1e140
      | RBP: ffff88003b2df560 R08: dffffc0000000000 R09: 0000000000000000
      | R10: ffff88003b2df718 R11: 0000000000000000 R12: ffff88003b2df5d8
      | R13: 0000000000000064 R14: ffffffffb5d1e140 R15: 0000000000000000
      | vsnprintf+0x173/0x1700 lib/vsprintf.c:2136
      | sprintf+0xbe/0xf0 lib/vsprintf.c:2386
      | proc_self_get_link+0xfb/0x1c0 fs/proc/self.c:23
      | get_link fs/namei.c:1047 [inline]
      | link_path_walk+0x1041/0x1490 fs/namei.c:2127
      ...
      
      This happened when the host hit a page fault, and delivered it as in an
      async page fault, while the guest was in an RCU read-side critical
      section.  The guest then tries to reschedule in kvm_async_pf_task_wait(),
      but rcu_preempt_note_context_switch() would treat the reschedule as a
      sleep in RCU read-side critical section, which is not allowed (even in
      preemptible RCU).  Thus the WARN.
      
      To cure this, make kvm_async_pf_task_wait() go to the halt path if the
      PF happens in a RCU read-side critical section.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NBoqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b862789a
    • W
      KVM: nVMX: Fix nested #PF intends to break L1's vmlauch/vmresume · 305d0ab4
      Wanpeng Li 提交于
      ------------[ cut here ]------------
       WARNING: CPU: 4 PID: 5280 at /home/kernel/linux/arch/x86/kvm//vmx.c:11394 nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       CPU: 4 PID: 5280 Comm: qemu-system-x86 Tainted: G        W  OE   4.13.0+ #17
       RIP: 0010:nested_vmx_vmexit+0xc2b/0xd70 [kvm_intel]
       Call Trace:
        ? emulator_read_emulated+0x15/0x20 [kvm]
        ? segmented_read+0xae/0xf0 [kvm]
        vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        ? vmx_inject_page_fault_nested+0x60/0x70 [kvm_intel]
        x86_emulate_instruction+0x733/0x810 [kvm]
        vmx_handle_exit+0x2f4/0xda0 [kvm_intel]
        ? kvm_arch_vcpu_ioctl_run+0xd2f/0x1c60 [kvm]
        kvm_arch_vcpu_ioctl_run+0xdab/0x1c60 [kvm]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        entry_SYSCALL_64_fastpath+0x23/0xc2
      
      A nested #PF is triggered during L0 emulating instruction for L2. However, it
      doesn't consider we should not break L1's vmlauch/vmresme. This patch fixes
      it by queuing the #PF exception instead ,requesting an immediate VM exit from
      L2 and keeping the exception for L1 pending for a subsequent nested VM exit.
      
      This should actually work all the time, making vmx_inject_page_fault_nested
      totally unnecessary.  However, that's not working yet, so this patch can work
      around the issue in the meanwhile.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      305d0ab4
    • J
      x86/asm: Fix inline asm call constraints for GCC 4.4 · 520a13c5
      Josh Poimboeuf 提交于
      The kernel test bot (run by Xiaolong Ye) reported that the following commit:
      
        f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      
      is causing double faults in a kernel compiled with GCC 4.4.
      
      Linus subsequently diagnosed the crash pattern and the buggy commit and found that
      the issue is with this code:
      
        register unsigned int __asm_call_sp asm("esp");
        #define ASM_CALL_CONSTRAINT "+r" (__asm_call_sp)
      
      Even on a 64-bit kernel, it's using ESP instead of RSP.  That causes GCC
      to produce the following bogus code:
      
        ffffffff8147461d:       89 e0                   mov    %esp,%eax
        ffffffff8147461f:       4c 89 f7                mov    %r14,%rdi
        ffffffff81474622:       4c 89 fe                mov    %r15,%rsi
        ffffffff81474625:       ba 20 00 00 00          mov    $0x20,%edx
        ffffffff8147462a:       89 c4                   mov    %eax,%esp
        ffffffff8147462c:       e8 bf 52 05 00          callq  ffffffff814c98f0 <copy_user_generic_unrolled>
      
      Despite the absurdity of it backing up and restoring the stack pointer
      for no reason, the bug is actually the fact that it's only backing up
      and restoring the lower 32 bits of the stack pointer.  The upper 32 bits
      are getting cleared out, corrupting the stack pointer.
      
      So change the '__asm_call_sp' register variable to be associated with
      the actual full-size stack pointer.
      
      This also requires changing the __ASM_SEL() macro to be based on the
      actual compiled arch size, rather than the CONFIG value, because
      CONFIG_X86_64 compiles some files with '-m32' (e.g., realmode and vdso).
      Otherwise Clang fails to build the kernel because it complains about the
      use of a 64-bit register (RSP) in a 32-bit file.
      Reported-and-Bisected-and-Tested-by: Nkernel test robot <xiaolong.ye@intel.com>
      Diagnosed-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Dmitriy Vyukov <dvyukov@google.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthias Kaehlcke <mka@chromium.org>
      Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: f5caf621 ("x86/asm: Fix inline asm call constraints for Clang")
      Link: http://lkml.kernel.org/r/20170928215826.6sdpmwtkiydiytim@trebleSigned-off-by: NIngo Molnar <mingo@kernel.org>
      520a13c5
  4. 28 9月, 2017 2 次提交
    • P
      KVM: VMX: use cmpxchg64 · c0a1666b
      Paolo Bonzini 提交于
      This fixes a compilation failure on 32-bit systems.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c0a1666b
    • Z
      xen/mmu: Call xen_cleanhighmap() with 4MB aligned for page tables mapping · 0d805ee7
      Zhenzhong Duan 提交于
      When bootup a PVM guest with large memory(Ex.240GB), XEN provided initial
      mapping overlaps with kernel module virtual space. When mapping in this space
      is cleared by xen_cleanhighmap(), in certain case there could be an 2MB mapping
      left. This is due to XEN initialize 4MB aligned mapping but xen_cleanhighmap()
      finish at 2MB boundary.
      
      When module loading is just on top of the 2MB space, got below warning:
      
      WARNING: at mm/vmalloc.c:106 vmap_pte_range+0x14e/0x190()
      Call Trace:
       [<ffffffff81117083>] warn_alloc_failed+0xf3/0x160
       [<ffffffff81146022>] __vmalloc_area_node+0x182/0x1c0
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff81145df7>] __vmalloc_node_range+0xa7/0x110
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff8103ca54>] module_alloc+0x64/0x70
       [<ffffffff810ac91e>] ? module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac91e>] module_alloc_update_bounds+0x1e/0x80
       [<ffffffff810ac9a7>] move_module+0x27/0x150
       [<ffffffff810aefa0>] layout_and_allocate+0x120/0x1b0
       [<ffffffff810af0a8>] load_module+0x78/0x640
       [<ffffffff811ff90b>] ? security_file_permission+0x8b/0x90
       [<ffffffff810af6d2>] sys_init_module+0x62/0x1e0
       [<ffffffff815154c2>] system_call_fastpath+0x16/0x1b
      
      Then the mapping of 2MB is cleared, finally oops when the page in that space is
      accessed.
      
      BUG: unable to handle kernel paging request at ffff880022600000
      IP: [<ffffffff81260877>] clear_page_c_e+0x7/0x10
      PGD 1788067 PUD 178c067 PMD 22434067 PTE 0
      Oops: 0002 [#1] SMP
      Call Trace:
       [<ffffffff81116ef7>] ? prep_new_page+0x127/0x1c0
       [<ffffffff81117d42>] get_page_from_freelist+0x1e2/0x550
       [<ffffffff81133010>] ? ii_iovec_copy_to_user+0x90/0x140
       [<ffffffff81119c9d>] __alloc_pages_nodemask+0x12d/0x230
       [<ffffffff81155516>] alloc_pages_vma+0xc6/0x1a0
       [<ffffffff81006ffd>] ? pte_mfn_to_pfn+0x7d/0x100
       [<ffffffff81134cfb>] do_anonymous_page+0x16b/0x350
       [<ffffffff81139c34>] handle_pte_fault+0x1e4/0x200
       [<ffffffff8100712e>] ? xen_pmd_val+0xe/0x10
       [<ffffffff810052c9>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
       [<ffffffff81139dab>] handle_mm_fault+0x15b/0x270
       [<ffffffff81510c10>] do_page_fault+0x140/0x470
       [<ffffffff8150d7d5>] page_fault+0x25/0x30
      
      Call xen_cleanhighmap() with 4MB aligned for page tables mapping to fix it.
      The unnecessory call of xen_cleanhighmap() in DEBUG mode is also removed.
      
      -v2: add comment about XEN alignment from Juergen.
      
      References: https://lists.xen.org/archives/html/xen-devel/2012-07/msg01562.htmlSigned-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      
      [boris: added 'xen/mmu' tag to commit subject]
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      0d805ee7
  5. 27 9月, 2017 3 次提交
    • P
      KVM: VMX: simplify and fix vmx_vcpu_pi_load · 31afb2ea
      Paolo Bonzini 提交于
      The simplify part: do not touch pi_desc.nv, we can set it when the
      VCPU is first created.  Likewise, pi_desc.sn is only handled by
      vmx_vcpu_pi_load, do not touch it in __pi_post_block.
      
      The fix part: do not check kvm_arch_has_assigned_device, instead
      check the SN bit to figure out whether vmx_vcpu_pi_put ran before.
      This matches what the previous patch did in pi_post_block.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      31afb2ea
    • P
      KVM: VMX: avoid double list add with VT-d posted interrupts · 8b306e2f
      Paolo Bonzini 提交于
      In some cases, for example involving hot-unplug of assigned
      devices, pi_post_block can forget to remove the vCPU from the
      blocked_vcpu_list.  When this happens, the next call to
      pi_pre_block corrupts the list.
      
      Fix this in two ways.  First, check vcpu->pre_pcpu in pi_pre_block
      and WARN instead of adding the element twice in the list.  Second,
      always do the list removal in pi_post_block if vcpu->pre_pcpu is
      set (not -1).
      
      The new code keeps interrupts disabled for the whole duration of
      pi_pre_block/pi_post_block.  This is not strictly necessary, but
      easier to follow.  For the same reason, PI.ON is checked only
      after the cmpxchg, and to handle it we just call the post-block
      code.  This removes duplication of the list removal code.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8b306e2f
    • P
      KVM: VMX: extract __pi_post_block · cd39e117
      Paolo Bonzini 提交于
      Simple code movement patch, preparing for the next one.
      
      Cc: Huangweidong <weidong.huang@huawei.com>
      Cc: Gonglei <arei.gonglei@huawei.com>
      Cc: wangxin <wangxinxin.wang@huawei.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Tested-by: NLongpeng (Mike) <longpeng2@huawei.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cd39e117
  6. 26 9月, 2017 17 次提交
    • E
      x86/fpu: Use using_compacted_format() instead of open coded X86_FEATURE_XSAVES · 738f48cb
      Eric Biggers 提交于
      This is the canonical method to use.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-11-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      738f48cb
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_user_to_xstate() · 98c0fad9
      Eric Biggers 提交于
      Tighten the checks in copy_user_to_xstate().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-10-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      98c0fad9
    • E
      x86/fpu: Eliminate the 'xfeatures' local variable in copy_user_to_xstate() · 3d703477
      Eric Biggers 提交于
      We now have this field in hdr.xfeatures.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-9-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d703477
    • E
      x86/fpu: Copy the full header in copy_user_to_xstate() · af2c4322
      Eric Biggers 提交于
      This is in preparation to verify the full xstate header as supplied by user-space.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-8-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af2c4322
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in copy_kernel_to_xstate() · af95774b
      Eric Biggers 提交于
      Tighten the checks in copy_kernel_to_xstate().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-7-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af95774b
    • E
      x86/fpu: Eliminate the 'xfeatures' local variable in copy_kernel_to_xstate() · b89eda48
      Eric Biggers 提交于
      We have this information in the xstate_header.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-6-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b89eda48
    • E
      x86/fpu: Copy the full state_header in copy_kernel_to_xstate() · 80d8ae86
      Eric Biggers 提交于
      This is in preparation to verify the full xstate header as supplied by user-space.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-5-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80d8ae86
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in __fpu__restore_sig() · b11e2e18
      Eric Biggers 提交于
      Tighten the checks in __fpu__restore_sig() and update comments.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-4-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b11e2e18
    • E
      x86/fpu: Use validate_xstate_header() to validate the xstate_header in xstateregs_set() · cf9df81b
      Eric Biggers 提交于
      Tighten the checks in xstateregs_set().
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-3-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cf9df81b
    • E
      x86/fpu: Introduce validate_xstate_header() · e63e5d5c
      Eric Biggers 提交于
      Move validation of user-supplied xstate_header into a helper function,
      in preparation of calling it from both the ptrace and sigreturn syscall
      paths.
      
      The new function also considers it to be an error if *any* reserved bits
      are set, whereas before we were just clearing most of them silently.
      
      This should reduce the chance of bugs that fail to correctly validate
      user-supplied XSAVE areas.  It also will expose any broken userspace
      programs that set the other reserved bits; this is desirable because
      such programs will lose compatibility with future CPUs and kernels if
      those bits are ever used for anything.  (There shouldn't be any such
      programs, and in fact in the case where the compacted format is in use
      we were already validating xfeatures.  But you never know...)
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170924105913.9157-2-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e63e5d5c
    • I
      x86/fpu: Rename fpu__activate_fpstate_read/write() to fpu__prepare_[read|write]() · 369a036d
      Ingo Molnar 提交于
      As per the new nomenclature we don't 'activate' the FPU state
      anymore, we initialize it. So drop the _activate_fpstate name
      from these functions, which were a bit of a mouthful anyway,
      and name them:
      
      	fpu__prepare_read()
      	fpu__prepare_write()
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      369a036d
    • I
      x86/fpu: Rename fpu__activate_curr() to fpu__initialize() · 2ce03d85
      Ingo Molnar 提交于
      Rename this function to better express that it's all about
      initializing the FPU state of a task which goes hand in hand
      with the fpu::initialized field.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-33-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ce03d85
    • I
      x86/fpu: Simplify and speed up fpu__copy() · e10078eb
      Ingo Molnar 提交于
      fpu__copy() has a preempt_disable()/enable() pair, which it had to do to
      be able to atomically unlazy the current task when doing an FNSAVE.
      
      But we don't unlazy tasks anymore, we always do direct saves/restores of
      FPU context.
      
      So remove both the unnecessary critical section, and update the comments.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-32-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e10078eb
    • I
      x86/fpu: Fix stale comments about lazy FPU logic · 7f1487c5
      Ingo Molnar 提交于
      We don't do any lazy restore anymore, what we have are two pieces of optimization:
      
       - no-FPU tasks that don't save/restore the FPU context (kernel threads are such)
      
       - cached FPU registers maintained via the fpu->last_cpu field. This means that
         if an FPU task context switches to a non-FPU task then we can maintain the
         FPU registers as an in-FPU copies (cache), and skip the restoration of them
         once we switch back to the original FPU-using task.
      
      Update all the comments that still referred to old 'lazy' and 'unlazy' concepts.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-31-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f1487c5
    • I
      x86/fpu: Rename fpu::fpstate_active to fpu::initialized · e4a81bfc
      Ingo Molnar 提交于
      The x86 FPU code used to have a complex state machine where both the FPU
      registers and the FPU state context could be 'active' (or inactive)
      independently of each other - which enabled features like lazy FPU restore.
      
      Much of this complexity is gone in the current code: now we basically can
      have FPU-less tasks (kernel threads) that don't use (and save/restore) FPU
      state at all, plus full FPU users that save/restore directly with no laziness
      whatsoever.
      
      But the fpu::fpstate_active still carries bits of the old complexity - meanwhile
      this flag has become a simple flag that shows whether the FPU context saving
      area in the thread struct is initialized and used, or not.
      
      Rename it to fpu::initialized to express this simplicity in the name as well.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-30-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a81bfc
    • I
      x86/fpu: Remove fpu__current_fpstate_write_begin/end() · 685c930d
      Ingo Molnar 提交于
      These functions are not used anymore, so remove them.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-29-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      685c930d
    • I
      x86/fpu: Fix fpu__activate_fpstate_read() and update comments · 4618e909
      Ingo Molnar 提交于
      fpu__activate_fpstate_read() can be called for the current task
      when coredumping - or for stopped tasks when ptrace-ing.
      
      Implement this properly in the code and update the comments.
      
      This also fixes an incorrect (but harmless) warning introduced by
      one of the earlier patches.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Link: http://lkml.kernel.org/r/20170923130016.21448-28-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4618e909
  7. 25 9月, 2017 9 次提交
    • K
      perf/x86/intel/uncore: Correct num_boxes for IIO and IRP · 29b46dfb
      Kan Liang 提交于
      There are 6 IIO/IRP boxes for CBDMA, PCIe0-2, MCP 0 and MCP 1
      separately. Correct the num_boxes.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: acme@kernel.org
      Link: http://lkml.kernel.org/r/1505149816-12580-1-git-send-email-kan.liang@intel.com
      29b46dfb
    • K
      perf/x86/intel/rapl: Add missing CPU IDs · 450a9789
      Kan Liang 提交于
      DENVERTON and GEMINI_LAKE support same RAPL counters as Apollo Lake.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: peterz@infradead.org
      Cc: piotr.luc@intel.com
      Cc: harry.pan@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170908213449.6224-3-kan.liang@intel.com
      450a9789
    • K
      perf/x86/msr: Add missing CPU IDs · 1aaccc40
      Kan Liang 提交于
      Goldmont, Glodmont plus and Xeon Phi have MSR_SMI_COUNT as well.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: peterz@infradead.org
      Cc: piotr.luc@intel.com
      Cc: harry.pan@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170908213449.6224-2-kan.liang@intel.com
      1aaccc40
    • K
      perf/x86/intel/cstate: Add missing CPU IDs · b09c146f
      Kan Liang 提交于
      Skylake server uses the same C-state residency events as Sandy Bridge.
      
      Denverton and Gemini lake use the same C-state residency events as
      Apollo Lake.
      Signed-off-by: NKan Liang <Kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ak@linux.intel.com
      Cc: peterz@infradead.org
      Cc: piotr.luc@intel.com
      Cc: harry.pan@intel.com
      Cc: srinivas.pandruvada@linux.intel.com
      Link: http://lkml.kernel.org/r/20170908213449.6224-1-kan.liang@intel.com
      b09c146f
    • V
      x86: Don't cast away the __user in __get_user_asm_u64() · 5ac751d9
      Ville Syrjälä 提交于
      Don't cast away the __user in __get_user_asm_u64() on x86-32.
      Prevents sparse getting upset.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20170912164000.13745-1-ville.syrjala@linux.intel.com
      5ac751d9
    • S
      x86/sysfs: Fix off-by-one error in loop termination · 7d709943
      Sean Fu 提交于
      An off-by-one error in loop terminantion conditions in
      create_setup_data_nodes() will lead to memory leak when
      create_setup_data_node() failed.
      Signed-off-by: NSean Fu <fxinrong@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1505090001-1157-1-git-send-email-fxinrong@gmail.com
      7d709943
    • L
      x86/mm: Fix fault error path using unsafe vma pointer · a3c4fb7c
      Laurent Dufour 提交于
      commit 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal
      generation code") passes down a vma pointer to the error path, but that is
      done once the mmap_sem is released when calling mm_fault_error() from
      __do_page_fault().
      
      This is dangerous as the vma structure is no more safe to be used once the
      mmap_sem has been released. As only the protection key value is required in
      the error processing, we could just pass down this value.
      
      Fix it by passing a pointer to a protection key value down to the fault
      signal generation code. The use of a pointer allows to keep the check
      generating a warning message in fill_sig_info_pkey() when the vma was not
      known. If the pointer is valid, the protection value can be accessed by
      deferencing the pointer.
      
      [ tglx: Made *pkey u32 as that's the type which is passed in siginfo ]
      
      Fixes: 7b2d0dba ("x86/mm/pkeys: Pass VMA down in to fault signal generation code")
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1504513935-12742-1-git-send-email-ldufour@linux.vnet.ibm.com
      a3c4fb7c
    • E
      x86/fpu: Reinitialize FPU registers if restoring FPU state fails · d5c8028b
      Eric Biggers 提交于
      Userspace can change the FPU state of a task using the ptrace() or
      rt_sigreturn() system calls.  Because reserved bits in the FPU state can
      cause the XRSTOR instruction to fail, the kernel has to carefully
      validate that no reserved bits or other invalid values are being set.
      
      Unfortunately, there have been bugs in this validation code.  For
      example, we were not checking that the 'xcomp_bv' field in the
      xstate_header was 0.  As-is, such bugs are exploitable to read the FPU
      registers of other processes on the system.  To do so, an attacker can
      create a task, assign to it an invalid FPU state, then spin in a loop
      and monitor the values of the FPU registers.  Because the task's FPU
      registers are not being restored, sometimes the FPU registers will have
      the values from another process.
      
      This is likely to continue to be a problem in the future because the
      validation done by the CPU instructions like XRSTOR is not immediately
      visible to kernel developers.  Nor will invalid FPU states ever be
      encountered during ordinary use --- they will only be seen during
      fuzzing or exploits.  There can even be reserved bits outside the
      xstate_header which are easy to forget about.  For example, the MXCSR
      register contains reserved bits, which were not validated by the
      KVM_SET_XSAVE ioctl until commit a575813b ("KVM: x86: Fix load
      damaged SSEx MXCSR register").
      
      Therefore, mitigate this class of vulnerability by restoring the FPU
      registers from init_fpstate if restoring from the task's state fails.
      
      We actually used to do this, but it was (perhaps unwisely) removed by
      commit 9ccc27a5 ("x86/fpu: Remove error return values from
      copy_kernel_to_*regs() functions").  This new patch is also a bit
      different.  First, it only clears the registers, not also the bad
      in-memory state; this is simpler and makes it easier to make the
      mitigation cover all callers of __copy_kernel_to_fpregs().  Second, it
      does the register clearing in an exception handler so that no extra
      instructions are added to context switches.  In fact, we *remove*
      instructions, since previously we were always zeroing the register
      containing 'err' even if CONFIG_X86_DEBUG_FPU was disabled.
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Link: http://lkml.kernel.org/r/20170922174156.16780-4-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-27-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d5c8028b
    • E
      x86/fpu: Don't let userspace set bogus xcomp_bv · 814fb7bb
      Eric Biggers 提交于
      On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
      set a task's extended state (xstate) or "FPU" registers.  ptrace() can
      set them for another task using the PTRACE_SETREGSET request with
      NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
      In either case, registers can be set to any value, but the kernel
      assumes that the XSAVE area itself remains valid in the sense that the
      CPU can restore it.
      
      However, in the case where the kernel is using the uncompacted xstate
      format (which it does whenever the XSAVES instruction is unavailable),
      it was possible for userspace to set the xcomp_bv field in the
      xstate_header to an arbitrary value.  However, all bits in that field
      are reserved in the uncompacted case, so when switching to a task with
      nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This
      caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit.  In
      addition, since the error is otherwise ignored, the FPU registers from
      the task previously executing on the CPU were leaked.
      
      Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
      the uncompacted case, and returning an error otherwise.
      
      The reason for validating xcomp_bv rather than simply overwriting it
      with 0 is that we want userspace to see an error if it (incorrectly)
      provides an XSAVE area in compacted format rather than in uncompacted
      format.
      
      Note that as before, in case of error we clear the task's FPU state.
      This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
      better to return an error before changing anything.  But it seems the
      "clear on error" behavior is fine for now, and it's a little tricky to
      do otherwise because it would mean we couldn't simply copy the full
      userspace state into kernel memory in one __copy_from_user().
      
      This bug was found by syzkaller, which hit the above-mentioned
      WARN_ON_FPU():
      
          WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
          CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
          RIP: 0010:__switch_to+0x5b5/0x5d0
          RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
          RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
          RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
          RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
          R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
          FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
          Call Trace:
          Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f
      
      Here is a C reproducer.  The expected behavior is that the program spin
      forever with no output.  However, on a buggy kernel running on a
      processor with the "xsave" feature but without the "xsaves" feature
      (e.g. Sandy Bridge through Broadwell for Intel), within a second or two
      the program reports that the xmm registers were corrupted, i.e. were not
      restored correctly.  With CONFIG_X86_DEBUG_FPU=y it also hits the above
      kernel warning.
      
          #define _GNU_SOURCE
          #include <stdbool.h>
          #include <inttypes.h>
          #include <linux/elf.h>
          #include <stdio.h>
          #include <sys/ptrace.h>
          #include <sys/uio.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          int main(void)
          {
              int pid = fork();
              uint64_t xstate[512];
              struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };
      
              if (pid == 0) {
                  bool tracee = true;
                  for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                      tracee = (fork() != 0);
                  uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
                  asm volatile("   movdqu %0, %%xmm0\n"
                               "   mov %0, %%rbx\n"
                               "1: movdqu %%xmm0, %0\n"
                               "   mov %0, %%rax\n"
                               "   cmp %%rax, %%rbx\n"
                               "   je 1b\n"
                               : "+m" (xmm0) : : "rax", "rbx", "xmm0");
                  printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                         tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
              } else {
                  usleep(100000);
                  ptrace(PTRACE_ATTACH, pid, 0, 0);
                  wait(NULL);
                  ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
                  xstate[65] = -1;
                  ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
                  ptrace(PTRACE_CONT, pid, 0, 0);
                  wait(NULL);
              }
              return 1;
          }
      
      Note: the program only tests for the bug using the ptrace() system call.
      The bug can also be reproduced using the rt_sigreturn() system call, but
      only when called from a 32-bit program, since for 64-bit programs the
      kernel restores the FPU state from the signal frame by doing XRSTOR
      directly from userspace memory (with proper error checking).
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: <stable@vger.kernel.org> [v3.17+]
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Fixes: 0b29643a ("x86/xsaves: Change compacted format xsave area header")
      Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      814fb7bb
  8. 24 9月, 2017 3 次提交