1. 08 3月, 2016 2 次提交
    • K
      perf/x86/intel: Fix PEBS warning by only restoring active PMU in pmi · c3d266c8
      Kan Liang 提交于
      This patch tries to fix a PEBS warning found in my stress test. The
      following perf command can easily trigger the pebs warning or spurious
      NMI error on Skylake/Broadwell/Haswell platforms:
      
        sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a
      
      Also the NMI watchdog must be enabled.
      
      For this case, the events number is larger than counter number. So
      perf has to do multiplexing.
      
      In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out
      old events, rotate_ctx, schedule in new events and finally
      perf_pmu_enable().
      
      If the old events include precise event, the MSR_IA32_PEBS_ENABLE
      should be cleared when perf_pmu_disable().  The MSR_IA32_PEBS_ENABLE
      should keep 0 until the perf_pmu_enable() is called and the new event is
      precise event.
      
      However, there is a corner case which could restore PEBS_ENABLE to
      stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL
      will be set to 0 to stop overflow and followed PMI. But there may be
      pending PMI from an earlier overflow, which cannot be stopped. So even
      GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At
      the end of the PMI handler, __intel_pmu_enable_all() will be called,
      which will restore the stale values if old events haven't scheduled
      out.
      
      Once the stale pebs value is set, it's impossible to be corrected if
      the new events are non-precise. Because the pebs_enabled will be set
      to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE
      setting. As a result, the following NMI with stale PEBS_ENABLE
      trigger pebs warning.
      
      The pending PMI after enabled=0 will become harmless if the NMI handler
      does not change the state. This patch checks cpuc->enabled in pmi and
      only restore the state when PMU is active.
      
      Here is the dump:
      
        Call Trace:
         <NMI>  [<ffffffff813c3a2e>] dump_stack+0x63/0x85
         [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0
         [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20
         [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320
         [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330
         [<ffffffff811f2f11>] ?  unmap_kernel_range_noflush+0x11/0x20
         [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0
         [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170
         [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50
         [<ffffffff810310b9>] nmi_handle+0x69/0x120
         [<ffffffff810316f6>] default_do_nmi+0xe6/0x100
         [<ffffffff810317f2>] do_nmi+0xe2/0x130
         [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40
         <<EOE>>  <IRQ>  [<ffffffff81006df8>] ?  x86_perf_event_set_period+0xd8/0x180
         [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100
         [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300
         [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10
         [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280
         [<ffffffff8119c970>] ?  __perf_install_in_context+0xc0/0xc0
         [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280
         [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60
         [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50
         [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0
         <EOI>  [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff81123de5>] ?  smp_call_function_single+0xd5/0x130
         [<ffffffff81123ddb>] ?  smp_call_function_single+0xcb/0x130
         [<ffffffff81199080>] ?  __perf_read_group_add.part.61+0x1a0/0x1a0
         [<ffffffff8119765a>] event_function_call+0x10a/0x120
         [<ffffffff8119c660>] ? ctx_resched+0x90/0x90
         [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70
         [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0
         [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60
         [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0
         [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0
         [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0
         [<ffffffff81036d29>] ? sched_clock+0x9/0x10
         [<ffffffff8124a919>] SyS_ioctl+0x79/0x90
         [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4
        ---[ end trace aef202839fe9a71d ]---
        Uhhuh. NMI received for unknown reason 2d on CPU 2.
        Do you have a strange power saving mode enabled?
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com
      [ Fixed various typos and other small details. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c3d266c8
    • J
      perf/x86/intel: Use PAGE_SIZE for PEBS buffer size on Core2 · e72daf3f
      Jiri Olsa 提交于
      Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in
      intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we
      don't do this.
      
      The hard lockup is easily triggered by running 'perf test attr'
      repeatedly. Most of the time it gets stuck on sample session with
      small periods.
      
        # perf test attr -vv
        14: struct perf_event_attr setup                             :
        --- start ---
        ...
          'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e72daf3f
  2. 03 3月, 2016 1 次提交
    • I
      perf/x86/uncore: Fix build on UP-IOAPIC configs · 6f6e1516
      Ingo Molnar 提交于
      Commit:
      
        cf6d445f ("perf/x86/uncore: Track packages, not per CPU data")
      
      reorganized the uncore code to track packages, and introduced a dependency
      on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds:
      
        arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function)
      
      Include asm/apicdef.h explicitly to pick it up.
      
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Harish Chegondi <harish.chegondi@intel.com>
      Cc: Jacob Pan <jacob.jun.pan@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6f6e1516
  3. 29 2月, 2016 23 次提交
  4. 28 2月, 2016 1 次提交
    • D
      mm: ASLR: use get_random_long() · 5ef11c35
      Daniel Cashman 提交于
      Replace calls to get_random_int() followed by a cast to (unsigned long)
      with calls to get_random_long().  Also address shifting bug which, in
      case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits.
      Signed-off-by: NDaniel Cashman <dcashman@android.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Nick Kralevich <nnk@google.com>
      Cc: Jeff Vander Stoep <jeffv@google.com>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef11c35
  5. 27 2月, 2016 2 次提交
  6. 26 2月, 2016 1 次提交
  7. 25 2月, 2016 2 次提交
    • M
      KVM: x86: MMU: fix ubsan index-out-of-range warning · 17e4bce0
      Mike Krinkin 提交于
      Ubsan reports the following warning due to a typo in
      update_accessed_dirty_bits template, the patch fixes
      the typo:
      
      [  168.791851] ================================================================================
      [  168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15
      [  168.791866] index 4 is out of range for type 'u64 [4]'
      [  168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G           O L  4.5.0-rc5-next-20160222 #7
      [  168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013
      [  168.791876]  0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3
      [  168.791882]  ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0
      [  168.791886]  0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600
      [  168.791891] Call Trace:
      [  168.791899]  [<ffffffff81c9f780>] dump_stack+0xcc/0x12c
      [  168.791904]  [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4
      [  168.791910]  [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a
      [  168.791914]  [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3
      [  168.791918]  [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd
      [  168.791922]  [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360
      [  168.791954]  [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm]
      [  168.791958]  [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360
      [  168.791987]  [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm]
      [  168.792014]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792019]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792044]  [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm]
      [  168.792076]  [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm]
      [  168.792121]  [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm]
      [  168.792130]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792178]  [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm]
      [  168.792208]  [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm]
      [  168.792234]  [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm]
      [  168.792238]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792263]  [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm]
      [  168.792290]  [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm]
      [  168.792314]  [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm]
      [  168.792340]  [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm]
      [  168.792367]  [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm]
      [  168.792374]  [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel]
      [  168.792400]  [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm]
      [  168.792424]  [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm]
      [  168.792449]  [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm]
      [  168.792474]  [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm]
      [  168.792499]  [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm]
      [  168.792524]  [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm]
      [  168.792532]  [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel]
      [  168.792539]  [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel]
      [  168.792546]  [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel]
      [  168.792572]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792597]  [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm]
      [  168.792621]  [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm]
      [  168.792627]  [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630
      [  168.792651]  [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm]
      [  168.792656]  [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190
      [  168.792681]  [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm]
      [  168.792704]  [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm]
      [  168.792727]  [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm]
      [  168.792732]  [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350
      [  168.792735]  [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40
      [  168.792740]  [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40
      [  168.792744]  [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0
      [  168.792747]  [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10
      [  168.792751]  [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90
      [  168.792756]  [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0
      [  168.792759]  [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210
      [  168.792763]  [<ffffffff8174aef3>] ? __fget+0x273/0x4a0
      [  168.792766]  [<ffffffff8174acd0>] ? __fget+0x50/0x4a0
      [  168.792770]  [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0
      [  168.792773]  [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90
      [  168.792777]  [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1
      [  168.792780] ================================================================================
      Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com>
      Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      17e4bce0
    • A
      x86/entry/compat: Add missing CLAC to entry_INT80_32 · 3d44d51b
      Andy Lutomirski 提交于
      This doesn't seem to fix a regression -- I don't think the CLAC was
      ever there.
      
      I double-checked in a debugger: entries through the int80 gate do
      not automatically clear AC.
      
      Stable maintainers: I can provide a backport to 4.3 and earlier if
      needed.  This needs to be backported all the way to 3.10.
      Reported-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: <stable@vger.kernel.org> # v3.10 and later
      Fixes: 63bcff2a ("x86, smap: Add STAC and CLAC instructions to control user space access")
      Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3d44d51b
  8. 24 2月, 2016 4 次提交
    • P
      KVM: x86: fix conversion of addresses to linear in 32-bit protected mode · 0c1d77f4
      Paolo Bonzini 提交于
      Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c",
      2015-09-06) broke boot of the Hurd.  The bug is that the "default:"
      case actually could modify "la", but after the patch this change is
      not reflected in *linear.
      
      The bug is visible whenever a non-zero segment base causes the linear
      address to wrap around the 4GB mark.
      
      Fixes: e8dd2d2d
      Cc: stable@vger.kernel.org
      Reported-by: NAurelien Jarno <aurelien@aurel32.net>
      Tested-by: NAurelien Jarno <aurelien@aurel32.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0c1d77f4
    • P
      KVM: x86: fix missed hardware breakpoints · 172b2386
      Paolo Bonzini 提交于
      Sometimes when setting a breakpoint a process doesn't stop on it.
      This is because the debug registers are not loaded correctly on
      VCPU load.
      
      The following simple reproducer from Oleg Nesterov tries using debug
      registers in two threads.  To see the bug, run a 2-VCPU guest with
      "taskset -c 0" and run "./bp 0 1" inside the guest.
      
          #include <unistd.h>
          #include <signal.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/wait.h>
          #include <sys/ptrace.h>
          #include <sys/user.h>
          #include <asm/debugreg.h>
          #include <assert.h>
      
          #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
      
          unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len)
          {
              unsigned long dr7;
      
              dr7 = ((len | type) & 0xf)
                  << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE);
              if (enable)
                  dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE));
      
              return dr7;
          }
      
          int write_dr(int pid, int dr, unsigned long val)
          {
              return ptrace(PTRACE_POKEUSER, pid,
                      offsetof (struct user, u_debugreg[dr]),
                      val);
          }
      
          void set_bp(pid_t pid, void *addr)
          {
              unsigned long dr7;
              assert(write_dr(pid, 0, (long)addr) == 0);
              dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1);
              assert(write_dr(pid, 7, dr7) == 0);
          }
      
          void *get_rip(int pid)
          {
              return (void*)ptrace(PTRACE_PEEKUSER, pid,
                      offsetof(struct user, regs.rip), 0);
          }
      
          void test(int nr)
          {
              void *bp_addr = &&label + nr, *bp_hit;
              int pid;
      
              printf("test bp %d\n", nr);
              assert(nr < 16); // see 16 asm nops below
      
              pid = fork();
              if (!pid) {
                  assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
                  kill(getpid(), SIGSTOP);
                  for (;;) {
                      label: asm (
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                          "nop; nop; nop; nop;"
                      );
                  }
              }
      
              assert(pid == wait(NULL));
              set_bp(pid, bp_addr);
      
              for (;;) {
                  assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0);
                  assert(pid == wait(NULL));
      
                  bp_hit = get_rip(pid);
                  if (bp_hit != bp_addr)
                      fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n",
                          bp_hit - &&label, nr);
              }
          }
      
          int main(int argc, const char *argv[])
          {
              while (--argc) {
                  int nr = atoi(*++argv);
                  if (!fork())
                      test(nr);
              }
      
              while (wait(NULL) > 0)
                  ;
              return 0;
          }
      
      Cc: stable@vger.kernel.org
      Suggested-by: NNadav Amit <namit@cs.technion.ac.il>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      172b2386
    • A
      x86/entry/32: Add an ASM_CLAC to entry_SYSENTER_32 · 04d1d281
      Andy Lutomirski 提交于
      Both before and after 5f310f73 ("x86/entry/32: Re-implement
      SYSENTER using the new C path"), we relied on a uaccess very early
      in the SYSENTER path to clear AC.  After that change, though, we can
      potentially make it all the way into C code with AC set, which
      enlarges the attack surface for SMAP bypass by doing SYSENTER with
      AC set.
      
      Strengthen the SMAP protection by addding the missing ASM_CLAC right
      at the beginning.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3e36be110724896e32a4a1fe73bacb349d3cba94.1456262295.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      04d1d281
    • L
      x86: fix SMAP in 32-bit environments · de9e478b
      Linus Torvalds 提交于
      In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space
      accesses") I changed how the stac/clac instructions were generated
      around the user space accesses, which then made it possible to do
      batched accesses efficiently for user string copies etc.
      
      However, in doing so, I completely spaced out, and didn't even think
      about the 32-bit case.  And nobody really even seemed to notice, because
      SMAP doesn't even exist until modern Skylake processors, and you'd have
      to be crazy to run 32-bit kernels on a modern CPU.
      
      Which brings us to Andy Lutomirski.
      
      He actually tested the 32-bit kernel on new hardware, and noticed that
      it doesn't work.  My bad.  The trivial fix is to add the required
      uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>.
      
      I feel a bit bad about this patch, just because that header file really
      should be cleaned up to avoid all the duplicated code in it, and this
      commit just expands on the problem.  But this just fixes the bug without
      any bigger cleanup surgery.
      Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de9e478b
  9. 23 2月, 2016 1 次提交
  10. 19 2月, 2016 1 次提交
  11. 18 2月, 2016 2 次提交
    • T
      x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b
      Toshi Kani 提交于
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: NHenning Schild <henning.schild@siemens.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4eafd8b
    • B
      Revert "PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed" · 67b4eab9
      Bjorn Helgaas 提交于
      Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and
      pci_dev->irq_managed").
      
      This is part of reverting 991de2e5 ("PCI, x86: Implement
      pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it
      introduced.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211
      Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()")
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael@kernel.org>
      CC: Jiang Liu <jiang.liu@linux.intel.com>
      67b4eab9