- 08 3月, 2016 2 次提交
-
-
由 Kan Liang 提交于
This patch tries to fix a PEBS warning found in my stress test. The following perf command can easily trigger the pebs warning or spurious NMI error on Skylake/Broadwell/Haswell platforms: sudo perf record -e 'cpu/umask=0x04,event=0xc4/pp,cycles,branches,ref-cycles,cache-misses,cache-references' --call-graph fp -b -c1000 -a Also the NMI watchdog must be enabled. For this case, the events number is larger than counter number. So perf has to do multiplexing. In perf_mux_hrtimer_handler, it does perf_pmu_disable(), schedule out old events, rotate_ctx, schedule in new events and finally perf_pmu_enable(). If the old events include precise event, the MSR_IA32_PEBS_ENABLE should be cleared when perf_pmu_disable(). The MSR_IA32_PEBS_ENABLE should keep 0 until the perf_pmu_enable() is called and the new event is precise event. However, there is a corner case which could restore PEBS_ENABLE to stale value during the above period. In perf_pmu_disable(), GLOBAL_CTRL will be set to 0 to stop overflow and followed PMI. But there may be pending PMI from an earlier overflow, which cannot be stopped. So even GLOBAL_CTRL is cleared, the kernel still be possible to get PMI. At the end of the PMI handler, __intel_pmu_enable_all() will be called, which will restore the stale values if old events haven't scheduled out. Once the stale pebs value is set, it's impossible to be corrected if the new events are non-precise. Because the pebs_enabled will be set to 0. x86_pmu.enable_all() will ignore the MSR_IA32_PEBS_ENABLE setting. As a result, the following NMI with stale PEBS_ENABLE trigger pebs warning. The pending PMI after enabled=0 will become harmless if the NMI handler does not change the state. This patch checks cpuc->enabled in pmi and only restore the state when PMU is active. Here is the dump: Call Trace: <NMI> [<ffffffff813c3a2e>] dump_stack+0x63/0x85 [<ffffffff810a46f2>] warn_slowpath_common+0x82/0xc0 [<ffffffff810a483a>] warn_slowpath_null+0x1a/0x20 [<ffffffff8100fe2e>] intel_pmu_drain_pebs_nhm+0x2be/0x320 [<ffffffff8100caa9>] intel_pmu_handle_irq+0x279/0x460 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff811f290d>] ? vunmap_page_range+0x20d/0x330 [<ffffffff811f2f11>] ? unmap_kernel_range_noflush+0x11/0x20 [<ffffffff8148379f>] ? ghes_copy_tofrom_phys+0x10f/0x2a0 [<ffffffff814839c8>] ? ghes_read_estatus+0x98/0x170 [<ffffffff81005a7d>] perf_event_nmi_handler+0x2d/0x50 [<ffffffff810310b9>] nmi_handle+0x69/0x120 [<ffffffff810316f6>] default_do_nmi+0xe6/0x100 [<ffffffff810317f2>] do_nmi+0xe2/0x130 [<ffffffff817aea71>] end_repeat_nmi+0x1a/0x1e [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 [<ffffffff810639b6>] ? native_write_msr_safe+0x6/0x40 <<EOE>> <IRQ> [<ffffffff81006df8>] ? x86_perf_event_set_period+0xd8/0x180 [<ffffffff81006eec>] x86_pmu_start+0x4c/0x100 [<ffffffff8100722d>] x86_pmu_enable+0x28d/0x300 [<ffffffff811994d7>] perf_pmu_enable.part.81+0x7/0x10 [<ffffffff8119cb70>] perf_mux_hrtimer_handler+0x200/0x280 [<ffffffff8119c970>] ? __perf_install_in_context+0xc0/0xc0 [<ffffffff8110f92d>] __hrtimer_run_queues+0xfd/0x280 [<ffffffff811100d8>] hrtimer_interrupt+0xa8/0x190 [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff81051bd8>] local_apic_timer_interrupt+0x38/0x60 [<ffffffff817af01d>] smp_apic_timer_interrupt+0x3d/0x50 [<ffffffff817ad15c>] apic_timer_interrupt+0x8c/0xa0 <EOI> [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff81123de5>] ? smp_call_function_single+0xd5/0x130 [<ffffffff81123ddb>] ? smp_call_function_single+0xcb/0x130 [<ffffffff81199080>] ? __perf_read_group_add.part.61+0x1a0/0x1a0 [<ffffffff8119765a>] event_function_call+0x10a/0x120 [<ffffffff8119c660>] ? ctx_resched+0x90/0x90 [<ffffffff811971e0>] ? cpu_clock_event_read+0x30/0x30 [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60 [<ffffffff8119772b>] _perf_event_enable+0x5b/0x70 [<ffffffff81197388>] perf_event_for_each_child+0x38/0xa0 [<ffffffff811976d0>] ? _perf_event_disable+0x60/0x60 [<ffffffff811a0ffd>] perf_ioctl+0x12d/0x3c0 [<ffffffff8134d855>] ? selinux_file_ioctl+0x95/0x1e0 [<ffffffff8124a3a1>] do_vfs_ioctl+0xa1/0x5a0 [<ffffffff81036d29>] ? sched_clock+0x9/0x10 [<ffffffff8124a919>] SyS_ioctl+0x79/0x90 [<ffffffff817ac4b2>] entry_SYSCALL_64_fastpath+0x1a/0xa4 ---[ end trace aef202839fe9a71d ]--- Uhhuh. NMI received for unknown reason 2d on CPU 2. Do you have a strange power saving mode enabled? Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1457046448-6184-1-git-send-email-kan.liang@intel.com [ Fixed various typos and other small details. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
Using PAGE_SIZE buffers makes the WRMSR to PERF_GLOBAL_CTRL in intel_pmu_enable_all() mysteriously hang on Core2. As a workaround, we don't do this. The hard lockup is easily triggered by running 'perf test attr' repeatedly. Most of the time it gets stuck on sample session with small periods. # perf test attr -vv 14: struct perf_event_attr setup : --- start --- ... 'PERF_TEST_ATTR=/tmp/tmpuEKz3B /usr/bin/perf record -o /tmp/tmpuEKz3B/perf.data -c 123 kill >/dev/null 2>&1' ret 1 Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/r/20160301190352.GA8355@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 3月, 2016 1 次提交
-
-
由 Ingo Molnar 提交于
Commit: cf6d445f ("perf/x86/uncore: Track packages, not per CPU data") reorganized the uncore code to track packages, and introduced a dependency on MAX_APIC_ID. This constant is not available on UP-IOAPIC builds: arch/x86/events/intel/uncore.c:1350:44: error: 'MAX_LOCAL_APIC' undeclared here (not in a function) Include asm/apicdef.h explicitly to pick it up. Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 2月, 2016 23 次提交
-
-
由 Thomas Gleixner 提交于
RAPL is a per package facility and we already have a mechanism for a dedicated per package reader. So there is no point to have multiple CPUs doing the same. The current implementation actually starts two timers on two CPUs if one does: perf stat -C1,2 -e -e power/energy-pkg .... which makes the whole concept of 1 reader per package moot. What's worse is that the above returns the double of the actual energy consumption, but that's a different problem to address and cannot be solved by removing the pointless per cpuness of that mechanism. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.845369524@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Store the PMU pointer in event->pmu_private and use it instead of the per CPU data. Preparatory step to get rid of the per CPU allocations. The usage sites are the perf fast path, so we keep that even after the conversion to per package storage as a CPU to package lookup involves 3 loads versus 1 with the pmu_private pointer. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.748151799@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
This lock is taken in hard interrupt context even on Preempt-RT. Make it raw so RT does not have to patch it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.669411833@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Split out code from init into seperate functions. Tidy up the code and get rid of pointless comments. I wish there would be comments for code which is not obvious.... Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.588544679@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The output is inconsistent. Use a proper pr_fmt prefix and split out the advertisement into a seperate function. Remove the WARN_ON() in the failure case. It's pointless as we already know where it failed. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.504551295@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
No point in doing the same calculation over and over. Do it once in rapl_check_hw_unit(). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.409238136@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
There is no point in having a quirk machinery for a single possible function. Get rid of it and move the quirk to a place where it actually makes sense. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.311639465@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Like uncore the rapl driver lacks error handling. It leaks memory and leaves the hotplug notifier registered. Add the proper error checks, cleanup the memory and register the hotplug notifier only on success. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.231222076@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The Knights Landings support added the events and the detection case, but then returns 0 without actually initializing the driver. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Fixes: 3a2a7797 "perf/x86/intel/rapl: Add support for Knights Landing (KNL)" Link: http://lkml.kernel.org/r/20160222221012.149331888@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
CQM is a strict per package facility. Use the proper cpumasks to lookup the readers. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221012.054916179@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Andi wanted to do this before, but the patch fell down the cracks. Implement it with the proper error handling. Requested-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.799159968@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The only missing bit is to completely clear the hardware state on failure exit. This is now a pretty simple exercise. Undo the box->init_box() setup on all packages which have been initialized so far. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.702452407@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Uncore is a per package facility, but the code tries to mimick a per CPU facility with completely convoluted constructs. Simplify the whole machinery by tracking per package information. While at it, avoid the kfree/alloc dance when a CPU goes offline and online again. There is no point in freeing the box after it was allocated. We just keep proper refcounting and the first CPU which comes online in a package does the initialization/activation of the box. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.622258933@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
For per package oriented services we must be able to rely on the number of CPU packages to be within bounds. Create a tracking facility, which - calculates the number of possible packages depending on nr_cpu_ids after boot - makes sure that the package id is within the number of possible packages. If the apic id is outside we map it to a logical package id if there is enough space available. Provide interfaces for drivers to query the mapping and do translations from physcial to logical ids. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.541071755@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Store the PMU pointer in event->pmu_private, so we can get rid of the per CPU data storage. We keep it after converting to per package data, because a CPU to package lookup will be 3 loads versus one and these usage sites are in the perf fast path. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.460851335@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
No users outside of this file. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.285504825@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Clean up the code a bit before reworking it completely. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.204771538@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
When tearing down the boxes nothing undoes the hardware state which was setup by box->init_box(). Add a box->exit_box() callback and implement it for the uncores which have an init_box() callback. This misses the cleanup in the error exit pathes, but I cannot be bothered to implement it before cleaning up the rest of the driver, which makes that task way simpler. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221011.023930023@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
The storage array is size limited, but misses a sanity check Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221010.929967806@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
This driver lacks any form of proper error handling. If initialization fails or hotplug prepare fails, it lets the facility with half initialized stuff around. Fix the state and memory leaks in a first step. As a second step we need to undo the hardware state which is set via uncore_box_init() on some of the uncore implementations. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221010.848880559@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
No point in doing partial rollbacks. Robustify uncore_exit_type() so it does not dereference type->pmus unconditionally and remove all the partial rollback hackery. Preparatory patch for proper error handling. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221010.751077467@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
uncore_cpumask_init() is only ever called from intel_uncore_init() where the mask is guaranteed to be empty. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20160222221010.657326866@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
BDX-DE and BDX-EP share the same uncore code path. But there is no sbox in BDX-DE. This patch remove SBOX support for BDX-DE. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: <tonyb@cybernetics.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Battersby <tonyb@cybernetics.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/37D7C6CF3E00A74B8858931C1DB2F0770589D336@SHSMSX103.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 2月, 2016 1 次提交
-
-
由 Daniel Cashman 提交于
Replace calls to get_random_int() followed by a cast to (unsigned long) with calls to get_random_long(). Also address shifting bug which, in case of x86 removed entropy mask for mmap_rnd_bits values > 31 bits. Signed-off-by: NDaniel Cashman <dcashman@android.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: David S. Miller <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Nick Kralevich <nnk@google.com> Cc: Jeff Vander Stoep <jeffv@google.com> Cc: Mark Salyzyn <salyzyn@android.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 2月, 2016 2 次提交
-
-
由 Bjorn Helgaas 提交于
991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") appeared in v4.3 and helps support IOAPIC hotplug. Олег reported that the Elcus-1553 TA1-PCI driver worked in v4.2 but not v4.3 and bisected it to 991de2e5. Sunjin reported that the RocketRAID 272x driver worked in v4.2 but not v4.3. In both cases booting with "pci=routirq" is a workaround. I think the problem is that after 991de2e5, we no longer call pcibios_enable_irq() for upstream bridges. Prior to 991de2e5, when a driver called pci_enable_device(), we recursively called pcibios_enable_irq() for upstream bridges via pci_enable_bridge(). After 991de2e5, we call pcibios_enable_irq() from pci_device_probe() instead of the pci_enable_device() path, which does *not* call pcibios_enable_irq() for upstream bridges. Revert 991de2e5 to fix these driver regressions. Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211 Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") Reported-and-tested-by: NОлег Мороз <oleg.moroz@mcc.vniiem.ru> Reported-by: NSunjin Yang <fan4326@gmail.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NRafael J. Wysocki <rafael@kernel.org> CC: Jiang Liu <jiang.liu@linux.intel.com>
-
由 Colin Ian King 提交于
In the unlikely event that regno == nr_registers then we get an array overrun on regoff because the invalid register check is currently off-by-one. Fix this with a check that regno is >= nr_registers instead. Detected with static analysis using CoverityScan. Fixes: fcc7ffd6 "x86, mpx: Decode MPX instruction to get bound violation information" Signed-off-by: NColin Ian King <colin.king@canonical.com> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1456512931-3388-1-git-send-email-colin.king@canonical.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 26 2月, 2016 1 次提交
-
-
由 Dexuan Cui 提交于
"d1cd1210: x86, pageattr: Prevent overflow in slow_virt_to_phys() for X86_PAE" was unintentionally removed by the recent "34437e67: x86/mm: Fix slow_virt_to_phys() to handle large PAT bit". And, the variable 'phys_addr' was defined as "unsigned long" by mistake -- it should be "phys_addr_t". As a result, Hyper-V network driver in 32-PAE Linux guest can't work again. Fixes: commit 34437e67: "x86/mm: Fix slow_virt_to_phys() to handle large PAT bit" Signed-off-by: NDexuan Cui <decui@microsoft.com> Reviewed-by: NToshi Kani <toshi.kani@hpe.com> Cc: olaf@aepfle.de Cc: gregkh@linuxfoundation.org Cc: jasowang@redhat.com Cc: driverdev-devel@linuxdriverproject.org Cc: linux-mm@kvack.org Cc: apw@canonical.com Cc: Andrew Morton <akpm@linux-foundation.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Link: http://lkml.kernel.org/r/1456394292-9030-1-git-send-email-decui@microsoft.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 25 2月, 2016 2 次提交
-
-
由 Mike Krinkin 提交于
Ubsan reports the following warning due to a typo in update_accessed_dirty_bits template, the patch fixes the typo: [ 168.791851] ================================================================================ [ 168.791862] UBSAN: Undefined behaviour in arch/x86/kvm/paging_tmpl.h:252:15 [ 168.791866] index 4 is out of range for type 'u64 [4]' [ 168.791871] CPU: 0 PID: 2950 Comm: qemu-system-x86 Tainted: G O L 4.5.0-rc5-next-20160222 #7 [ 168.791873] Hardware name: LENOVO 23205NG/23205NG, BIOS G2ET95WW (2.55 ) 07/09/2013 [ 168.791876] 0000000000000000 ffff8801cfcaf208 ffffffff81c9f780 0000000041b58ab3 [ 168.791882] ffffffff82eb2cc1 ffffffff81c9f6b4 ffff8801cfcaf230 ffff8801cfcaf1e0 [ 168.791886] 0000000000000004 0000000000000001 0000000000000000 ffffffffa1981600 [ 168.791891] Call Trace: [ 168.791899] [<ffffffff81c9f780>] dump_stack+0xcc/0x12c [ 168.791904] [<ffffffff81c9f6b4>] ? _atomic_dec_and_lock+0xc4/0xc4 [ 168.791910] [<ffffffff81da9e81>] ubsan_epilogue+0xd/0x8a [ 168.791914] [<ffffffff81daafa2>] __ubsan_handle_out_of_bounds+0x15c/0x1a3 [ 168.791918] [<ffffffff81daae46>] ? __ubsan_handle_shift_out_of_bounds+0x2bd/0x2bd [ 168.791922] [<ffffffff811287ef>] ? get_user_pages_fast+0x2bf/0x360 [ 168.791954] [<ffffffffa1794050>] ? kvm_largepages_enabled+0x30/0x30 [kvm] [ 168.791958] [<ffffffff81128530>] ? __get_user_pages_fast+0x360/0x360 [ 168.791987] [<ffffffffa181b818>] paging64_walk_addr_generic+0x1b28/0x2600 [kvm] [ 168.792014] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm] [ 168.792019] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350 [ 168.792044] [<ffffffffa1819cf0>] ? init_kvm_mmu+0x1100/0x1100 [kvm] [ 168.792076] [<ffffffffa181c36d>] paging64_gva_to_gpa+0x7d/0x110 [kvm] [ 168.792121] [<ffffffffa181c2f0>] ? paging64_walk_addr_generic+0x2600/0x2600 [kvm] [ 168.792130] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90 [ 168.792178] [<ffffffffa17d9a4a>] emulator_read_write_onepage+0x27a/0x1150 [kvm] [ 168.792208] [<ffffffffa1794d44>] ? __kvm_read_guest_page+0x54/0x70 [kvm] [ 168.792234] [<ffffffffa17d97d0>] ? kvm_task_switch+0x160/0x160 [kvm] [ 168.792238] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90 [ 168.792263] [<ffffffffa17daa07>] emulator_read_write+0xe7/0x6d0 [kvm] [ 168.792290] [<ffffffffa183b620>] ? em_cr_write+0x230/0x230 [kvm] [ 168.792314] [<ffffffffa17db005>] emulator_write_emulated+0x15/0x20 [kvm] [ 168.792340] [<ffffffffa18465f8>] segmented_write+0xf8/0x130 [kvm] [ 168.792367] [<ffffffffa1846500>] ? em_lgdt+0x20/0x20 [kvm] [ 168.792374] [<ffffffffa14db512>] ? vmx_read_guest_seg_ar+0x42/0x1e0 [kvm_intel] [ 168.792400] [<ffffffffa1846d82>] writeback+0x3f2/0x700 [kvm] [ 168.792424] [<ffffffffa1846990>] ? em_sidt+0xa0/0xa0 [kvm] [ 168.792449] [<ffffffffa185554d>] ? x86_decode_insn+0x1b3d/0x4f70 [kvm] [ 168.792474] [<ffffffffa1859032>] x86_emulate_insn+0x572/0x3010 [kvm] [ 168.792499] [<ffffffffa17e71dd>] x86_emulate_instruction+0x3bd/0x2110 [kvm] [ 168.792524] [<ffffffffa17e6e20>] ? reexecute_instruction.part.110+0x2e0/0x2e0 [kvm] [ 168.792532] [<ffffffffa14e9a81>] handle_ept_misconfig+0x61/0x460 [kvm_intel] [ 168.792539] [<ffffffffa14e9a20>] ? handle_pause+0x450/0x450 [kvm_intel] [ 168.792546] [<ffffffffa15130ea>] vmx_handle_exit+0xd6a/0x1ad0 [kvm_intel] [ 168.792572] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm] [ 168.792597] [<ffffffffa17f6bcd>] kvm_arch_vcpu_ioctl_run+0xd3d/0x6090 [kvm] [ 168.792621] [<ffffffffa17f6a6c>] ? kvm_arch_vcpu_ioctl_run+0xbdc/0x6090 [kvm] [ 168.792627] [<ffffffff8293b530>] ? __ww_mutex_lock_interruptible+0x1630/0x1630 [ 168.792651] [<ffffffffa17f5e90>] ? kvm_arch_vcpu_runnable+0x4f0/0x4f0 [kvm] [ 168.792656] [<ffffffff811eeb30>] ? preempt_notifier_unregister+0x190/0x190 [ 168.792681] [<ffffffffa17e0447>] ? kvm_arch_vcpu_load+0x127/0x650 [kvm] [ 168.792704] [<ffffffffa178e9a3>] kvm_vcpu_ioctl+0x553/0xda0 [kvm] [ 168.792727] [<ffffffffa178e450>] ? vcpu_put+0x40/0x40 [kvm] [ 168.792732] [<ffffffff8129e350>] ? debug_check_no_locks_freed+0x350/0x350 [ 168.792735] [<ffffffff82946087>] ? _raw_spin_unlock+0x27/0x40 [ 168.792740] [<ffffffff8163a943>] ? handle_mm_fault+0x1673/0x2e40 [ 168.792744] [<ffffffff8129daa8>] ? trace_hardirqs_on_caller+0x478/0x6c0 [ 168.792747] [<ffffffff8129dcfd>] ? trace_hardirqs_on+0xd/0x10 [ 168.792751] [<ffffffff812e848b>] ? debug_lockdep_rcu_enabled+0x7b/0x90 [ 168.792756] [<ffffffff81725a80>] do_vfs_ioctl+0x1b0/0x12b0 [ 168.792759] [<ffffffff817258d0>] ? ioctl_preallocate+0x210/0x210 [ 168.792763] [<ffffffff8174aef3>] ? __fget+0x273/0x4a0 [ 168.792766] [<ffffffff8174acd0>] ? __fget+0x50/0x4a0 [ 168.792770] [<ffffffff8174b1f6>] ? __fget_light+0x96/0x2b0 [ 168.792773] [<ffffffff81726bf9>] SyS_ioctl+0x79/0x90 [ 168.792777] [<ffffffff82946880>] entry_SYSCALL_64_fastpath+0x23/0xc1 [ 168.792780] ================================================================================ Signed-off-by: NMike Krinkin <krinkin.m.u@gmail.com> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
This doesn't seem to fix a regression -- I don't think the CLAC was ever there. I double-checked in a debugger: entries through the int80 gate do not automatically clear AC. Stable maintainers: I can provide a backport to 4.3 and earlier if needed. This needs to be backported all the way to 3.10. Reported-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> # v3.10 and later Fixes: 63bcff2a ("x86, smap: Add STAC and CLAC instructions to control user space access") Link: http://lkml.kernel.org/r/b02b7e71ae54074be01fc171cbd4b72517055c0e.1456345086.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 2月, 2016 4 次提交
-
-
由 Paolo Bonzini 提交于
Commit e8dd2d2d ("Silence compiler warning in arch/x86/kvm/emulate.c", 2015-09-06) broke boot of the Hurd. The bug is that the "default:" case actually could modify "la", but after the patch this change is not reflected in *linear. The bug is visible whenever a non-zero segment base causes the linear address to wrap around the 4GB mark. Fixes: e8dd2d2d Cc: stable@vger.kernel.org Reported-by: NAurelien Jarno <aurelien@aurel32.net> Tested-by: NAurelien Jarno <aurelien@aurel32.net> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Sometimes when setting a breakpoint a process doesn't stop on it. This is because the debug registers are not loaded correctly on VCPU load. The following simple reproducer from Oleg Nesterov tries using debug registers in two threads. To see the bug, run a 2-VCPU guest with "taskset -c 0" and run "./bp 0 1" inside the guest. #include <unistd.h> #include <signal.h> #include <stdlib.h> #include <stdio.h> #include <sys/wait.h> #include <sys/ptrace.h> #include <sys/user.h> #include <asm/debugreg.h> #include <assert.h> #define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) unsigned long encode_dr7(int drnum, int enable, unsigned int type, unsigned int len) { unsigned long dr7; dr7 = ((len | type) & 0xf) << (DR_CONTROL_SHIFT + drnum * DR_CONTROL_SIZE); if (enable) dr7 |= (DR_GLOBAL_ENABLE << (drnum * DR_ENABLE_SIZE)); return dr7; } int write_dr(int pid, int dr, unsigned long val) { return ptrace(PTRACE_POKEUSER, pid, offsetof (struct user, u_debugreg[dr]), val); } void set_bp(pid_t pid, void *addr) { unsigned long dr7; assert(write_dr(pid, 0, (long)addr) == 0); dr7 = encode_dr7(0, 1, DR_RW_EXECUTE, DR_LEN_1); assert(write_dr(pid, 7, dr7) == 0); } void *get_rip(int pid) { return (void*)ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), 0); } void test(int nr) { void *bp_addr = &&label + nr, *bp_hit; int pid; printf("test bp %d\n", nr); assert(nr < 16); // see 16 asm nops below pid = fork(); if (!pid) { assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0); kill(getpid(), SIGSTOP); for (;;) { label: asm ( "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" "nop; nop; nop; nop;" ); } } assert(pid == wait(NULL)); set_bp(pid, bp_addr); for (;;) { assert(ptrace(PTRACE_CONT, pid, 0, 0) == 0); assert(pid == wait(NULL)); bp_hit = get_rip(pid); if (bp_hit != bp_addr) fprintf(stderr, "ERR!! hit wrong bp %ld != %d\n", bp_hit - &&label, nr); } } int main(int argc, const char *argv[]) { while (--argc) { int nr = atoi(*++argv); if (!fork()) test(nr); } while (wait(NULL) > 0) ; return 0; } Cc: stable@vger.kernel.org Suggested-by: NNadav Amit <namit@cs.technion.ac.il> Reported-by: NAndrey Wagin <avagin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
Both before and after 5f310f73 ("x86/entry/32: Re-implement SYSENTER using the new C path"), we relied on a uaccess very early in the SYSENTER path to clear AC. After that change, though, we can potentially make it all the way into C code with AC set, which enlarges the attack surface for SMAP bypass by doing SYSENTER with AC set. Strengthen the SMAP protection by addding the missing ASM_CLAC right at the beginning. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3e36be110724896e32a4a1fe73bacb349d3cba94.1456262295.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Linus Torvalds 提交于
In commit 11f1a4b9 ("x86: reorganize SMAP handling in user space accesses") I changed how the stac/clac instructions were generated around the user space accesses, which then made it possible to do batched accesses efficiently for user string copies etc. However, in doing so, I completely spaced out, and didn't even think about the 32-bit case. And nobody really even seemed to notice, because SMAP doesn't even exist until modern Skylake processors, and you'd have to be crazy to run 32-bit kernels on a modern CPU. Which brings us to Andy Lutomirski. He actually tested the 32-bit kernel on new hardware, and noticed that it doesn't work. My bad. The trivial fix is to add the required uaccess begin/end markers around the raw accesses in <asm/uaccess_32.h>. I feel a bit bad about this patch, just because that header file really should be cleaned up to avoid all the duplicated code in it, and this commit just expands on the problem. But this just fixes the bug without any bigger cleanup surgery. Reported-and-tested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 2月, 2016 1 次提交
-
-
由 Bryan O'Donoghue 提交于
Currently when setting up an IMR around the kernel's .text section we lock that IMR, preventing further modification. While superficially this appears to be the right thing to do, in fact this doesn't account for a legitimate change in the memory map such as when executing a new kernel via kexec. In such a scenario a second kernel can have a different size and location to it's predecessor and can view some of the memory occupied by it's predecessor as legitimately usable DMA RAM. If this RAM were then subsequently allocated to DMA agents within the system it could conceivably trigger an IMR violation. This patch fixes the this potential situation by keeping the kernel's .text section IMR lock bit false by default. Suggested-by: NIngo Molnar <mingo@kernel.org> Reported-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boon.leong.ong@intel.com Cc: paul.gortmaker@windriver.com Link: http://lkml.kernel.org/r/1456190999-12685-2-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 2月, 2016 1 次提交
-
-
由 Hugh Dickins 提交于
Commit 3565fce3 ("mm, x86: get_user_pages() for dax mappings") has moved up the pte_page(pte) in x86's fast gup_pte_range(), for no discernible reason: put it back where it belongs, after the pte_flags check and the pfn_valid cross-check. That may be the cause of the NULL pointer dereference in gup_pte_range(), seen when vfio called vaddr_get_pfn() when starting a qemu-kvm based VM. Signed-off-by: NHugh Dickins <hughd@google.com> Reported-by: NMichael Long <Harn-Solo@gmx.de> Tested-by: NMichael Long <Harn-Solo@gmx.de> Acked-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 2月, 2016 2 次提交
-
-
由 Toshi Kani 提交于
A kernel page fault oops with the callstack below was observed when a read syscall was made to a pmem device after a huge amount (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64 system: BUG: unable to handle kernel paging request at ffff880840000ff8 IP: vmalloc_fault+0x1be/0x300 PGD c7f03a067 PUD 0 Oops: 0000 [#1] SM Call Trace: __do_page_fault+0x285/0x3e0 do_page_fault+0x2f/0x80 ? put_prev_entity+0x35/0x7a0 page_fault+0x28/0x30 ? memcpy_erms+0x6/0x10 ? schedule+0x35/0x80 ? pmem_rw_bytes+0x6a/0x190 [nd_pmem] ? schedule_timeout+0x183/0x240 btt_log_read+0x63/0x140 [nd_btt] : ? __symbol_put+0x60/0x60 ? kernel_read+0x50/0x80 SyS_finit_module+0xb9/0xf0 entry_SYSCALL_64_fastpath+0x1a/0xa4 Since v4.1, ioremap() supports large page (pud/pmd) mappings in x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc range is limited to pte mappings. vmalloc faults do not normally happen in ioremap'd ranges since ioremap() sets up the kernel page tables, which are shared by user processes. pgd_ctor() sets the kernel's PGD entries to user's during fork(). When allocation of the vmalloc ranges crosses a 512GB boundary, ioremap() allocates a new pud table and updates the kernel PGD entry to point it. If user process's PGD entry does not have this update yet, a read/write syscall to the range will cause a vmalloc fault, which hits the Oops above as it does not handle a large page properly. Following changes are made to vmalloc_fault(). 64-bit: - No change for the PGD sync operation as it handles large pages already. - Add pud_huge() and pmd_huge() to the validation code to handle large pages. - Change pud_page_vaddr() to pud_pfn() since an ioremap range is not directly mapped (while the if-statement still works with a bogus addr). - Change pmd_page() to pmd_pfn() since an ioremap range is not backed by struct page (while the if-statement still works with a bogus addr). 32-bit: - No change for the sync operation since the index3 PGD entry covers the entire vmalloc range, which is always valid. (A separate change to sync PGD entry is necessary if this memory layout is changed regardless of the page size.) - Add pmd_huge() to the validation code to handle large pages. This is for completeness since vmalloc_fault() won't happen in ioremap'd ranges as its PGD entry is always valid. Reported-by: NHenning Schild <henning.schild@siemens.com> Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Acked-by: NBorislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> # 4.1+ Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: linux-mm@kvack.org Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Bjorn Helgaas 提交于
Revert 811a4e6f ("PCI: Add helpers to manage pci_dev->irq and pci_dev->irq_managed"). This is part of reverting 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") to fix regressions it introduced. Link: https://bugzilla.kernel.org/show_bug.cgi?id=111211 Fixes: 991de2e5 ("PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()") Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NRafael J. Wysocki <rafael@kernel.org> CC: Jiang Liu <jiang.liu@linux.intel.com>
-