- 27 5月, 2015 3 次提交
-
-
由 Toshi Kani 提交于
mtrr_type_lookup() returns verbatim 0xFF when MTRRs are disabled. This patch defines MTRR_TYPE_INVALID to clarify the meaning of this value, and documents its usage. Document the return values of the kernel virtual address mapping helpers pud_set_huge(), pmd_set_huge, pud_clear_huge() and pmd_clear_huge(). There is no functional change in this patch. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-5-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-5-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
'mtrr_state.enabled' contains the FE (fixed MTRRs enabled) and E (MTRRs enabled) flags in MSR_MTRRdefType. Intel SDM, section 11.11.2.1, defines these flags as follows: - All MTRRs are disabled when the E flag is clear. The FE flag has no affect when the E flag is clear. - The default type is enabled when the E flag is set. - MTRR variable ranges are enabled when the E flag is set. - MTRR fixed ranges are enabled when both E and FE flags are set. MTRR state checks in __mtrr_type_lookup() do not match with SDM. Hence, this patch makes the following changes: - The current code detects MTRRs disabled when both E and FE flags are clear in mtrr_state.enabled. Fix to detect MTRRs disabled when the E flag is clear. - The current code does not check if the FE bit is set in mtrr_state.enabled when looking at the fixed entries. Fix to check the FE flag. - The current code returns the default type when the E flag is clear in mtrr_state.enabled. However, the default type is UC when the E flag is clear. Remove the code as this case is handled as MTRR disabled with the 1st change. In addition, this patch defines the E and FE flags in mtrr_state.enabled as follows. - FE flag: MTRR_STATE_MTRR_FIXED_ENABLED - E flag: MTRR_STATE_MTRR_ENABLED print_mtrr_state() and x86_get_mtrr_mem_range() are also updated accordingly. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-4-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
When an MTRR entry is inclusive to a requested range, i.e. the start and end of the request are not within the MTRR entry range but the range contains the MTRR entry entirely: range_start ... [mtrr_start ... mtrr_end] ... range_end __mtrr_type_lookup() ignores such a case because both start_state and end_state are set to zero. This bug can cause the following issues: 1) reserve_memtype() tracks an effective memory type in case a request type is WB (ex. /dev/mem blindly uses WB). Missing to track with its effective type causes a subsequent request to map the same range with the effective type to fail. 2) pud_set_huge() and pmd_set_huge() check if a requested range has any overlap with MTRRs. Missing to detect an overlap may cause a performance penalty or undefined behavior. This patch fixes the bug by adding a new flag, 'inclusive', to detect the inclusive case. This case is then handled in the same way as end_state:1 since the first region is the same. With this fix, __mtrr_type_lookup() handles the inclusive case properly. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1431714237-880-3-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1432628901-18044-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 5月, 2015 2 次提交
-
-
由 Stephane Eranian 提交于
This patch enables RAPL counters (energy consumption counters) support for Intel Broadwell-U processors (Model 61): To use: $ perf stat -a -I 1000 -e power/energy-cores/,power/energy-pkg/,power/energy-ram/ sleep 10 Signed-off-by: NStephane Eranian <eranian@google.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jacob.jun.pan@linux.intel.com Cc: kan.liang@intel.com Cc: peterz@infradead.org Cc: sonnyrao@chromium.org Link: http://lkml.kernel.org/r/20150423070709.GA4970@thinkpadSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Toshi Kani 提交于
__mtrr_type_lookup() checks MTRR fixed ranges when mtrr_state.have_fixed is set and start is less than 0x100000. However, the 'else if (start < 0x1000000)' in the code checks with an incorrect address as it has an extra-zero in the address. The code still runs correctly as this check is meaningless, though. This patch replaces the incorrect address check with 'else' with no condition. Signed-off-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dave.hansen@intel.com Cc: linux-mm <linux-mm@kvack.org> Cc: pebolle@tiscali.nl Link: http://lkml.kernel.org/r/1427234921-19737-4-git-send-email-toshi.kani@hp.com Link: http://lkml.kernel.org/r/1431332153-18566-8-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 5月, 2015 1 次提交
-
-
由 Kan Liang 提交于
iTLB-load-misses and LLC-load-misses count incorrectly on SLM. There is no ITLB.MISSES support on SLM. Event PAGE_WALKS.I_SIDE_WALK should be used to count iTLB-load-misses. This event counts when an instruction (I) page walk is completed or started. Since a page walk implies a TLB miss, the number of TLB misses can be counted by counting the number of pagewalks. DMND_DATA_RD counts both demand and DCU prefetch data reads. However, LLC-load-misses should only count demand reads. There is no way to not include prefetches with a single counter on SLM. So the LLC-load-misses support should be removed on SLM. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1429608881-5055-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 5月, 2015 1 次提交
-
-
由 Boris Ostrovsky 提交于
Commit 61f01dd9 ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") makes AMD processors set SS to __KERNEL_DS in __switch_to() to deal with cases when SS is NULL. This breaks Xen PV guests who do not want to load SS with__KERNEL_DS. Since the problem that the commit is trying to address would have to be fixed in the hypervisor (if it in fact exists under Xen) there is no reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here. This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features op which will clear this flag. (And since this structure is no longer HVM-specific we should do some renaming). Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: NSander Eikelenboom <linux@eikelenboom.it> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-
- 27 4月, 2015 1 次提交
-
-
由 Andy Lutomirski 提交于
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with SS == 0 results in an invalid usermode state in which SS is apparently equal to __USER_DS but causes #SS if used. Work around the issue by setting SS to __KERNEL_DS __switch_to, thus ensuring that SYSRET never happens with SS set to NULL. This was exposed by a recent vDSO cleanup. Fixes: e7d6eefa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 4月, 2015 3 次提交
-
-
由 Sonny Rao 提交于
This keeps all the related PCI IDs together in the driver where they are used. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Sonny Rao 提交于
perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs This uncore is the same as the Haswell desktop part but uses a different PCI ID. Signed-off-by: NSonny Rao <sonnyrao@chromium.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
The core_pmu does not define cpu_* callbacks, which handles allocation of 'struct cpu_hw_events::shared_regs' data, initialization of debug store and PMU_FL_EXCL_CNTRS counters. While this probably won't happen on bare metal, virtual CPU can define x86_pmu.extra_regs together with PMU version 1 and thus be using core_pmu -> using shared_regs data without it being allocated. That could could leave to following panic: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40 SNIP [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230 [<ffffffff811a993a>] ? dput+0x9a/0x150 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110 [<ffffffff8119c731>] ? path_put+0x31/0x40 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170 [<ffffffff81014a29>] ? sched_clock+0x9/0x10 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0 [<ffffffff81195ee9>] set_task_comm+0x69/0x80 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0 Adding cpu_(prepare|starting|dying) for core_pmu to have shared_regs data allocated for core_pmu. AFAICS there's no harm to initialize debug store and PMU_FL_EXCL_CNTRS either for core_pmu. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 4月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Dan Carpenter reported that pt_event_add() has buggy error handling logic: it returns 0 instead of -EBUSY when it fails to start a newly added event. Furthermore, the control flow in this function is messy, with cleanup labels mixed with direct returns. Fix the bug and clean up the code by converting it to a straight fast path for the regular non-failing case, plus a clear sequence of cascading goto labels to do all cleanup. NOTE: I materially changed the existing clean up logic in the pt_event_start() failure case to use the direct perf_aux_output_end() path, not pt_event_del(), because perf_aux_output_end() is enough here. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Julia Lawall <julia.lawall@lip6.fr> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 4月, 2015 4 次提交
-
-
由 Kan Liang 提交于
Same as Haswell, Broadwell also support the LBR callstack. Signed-off-by: NKan Liang <kan.liang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1427962377-40955-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jacob Pan 提交于
RAPL energy hardware unit can vary within a single CPU package, e.g. HSW server DRAM has a fixed energy unit of 15.3 uJ (2^-16) whereas the unit on other domains can be enumerated from power unit MSR. There might be other variations in the future, this patch adds per cpu model quirk to allow special handling of certain cpus. hw_unit is also removed from per cpu data since it is not per cpu and the sampling rate for energy counter is typically not high. Without this patch, DRAM domain on HSW servers will be counted 4x higher than the real energy counter. Signed-off-by: NJacob Pan <jacob.jun.pan@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1427405325-780-1-git-send-email-jacob.jun.pan@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Ingo reported that cycles:pp didn't work for him on some machines. It turns out that in this commit: af4bdcf6 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events Andi forgot to explicitly allow that event when he disabled event flags for PEBS on those uarchs. Reported-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: af4bdcf6 ("perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Somehow we ended up with overlapping flags when merging the RDPMC control flag - this is bad, fix it. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 4月, 2015 1 次提交
-
-
由 Joe Perches 提交于
The seq_printf return value, because it's frequently misused, will eventually be converted to void. See: commit 1f33c41c ("seq_file: Rename seq_overflow() to seq_has_overflowed() and make public") Signed-off-by: NJoe Perches <joe@perches.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 4月, 2015 1 次提交
-
-
由 Ingo Molnar 提交于
Dan Carpenter pointed out that the control flow in pt_pmu_hw_init() is a bit messy: for example the kfree(de_attrs) is entirely superfluous. Another problem is the inconsistent mixing of label based and direct return error handling. Add modern, label based error handling instead and clean up the code a bit as well. Note that we'll still do a kfree(NULL) in the normal case - this does not matter as this is an init path and kfree() returns early if it sees a NULL. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20150409090805.GG17605@mwandaSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 4月, 2015 4 次提交
-
-
由 Borislav Petkov 提交于
... instead of a naked number, for better readability. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Commit: d56fe4bf ("x86/asm/entry/64: Always set up SYSENTER MSRs") missed to add "ULL" to the 0 and wrmsrl_safe() complains: arch/x86/kernel/cpu/common.c: In function ‘syscall_init’: arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0); Fix it. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Aravind Gopalakrishnan 提交于
Dan reported compiler warnings about missing curly braces in mce_severity_amd(). Reindent the catch-all "return MCE_AR_SEVERITY" correctly to single tab. While at it, chain ctx == IN_KERNEL check with mcgstatus check to make it cleaner, as suggested by Boris. No functional changes are introduced by this patch. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NAravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1427814281-18192-1-git-send-email-Aravind.Gopalakrishnan@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once, and we unnecessarily cache the value in tss.sp1. We never read the cached value. Remove all of the caching. It serves no purpose. Suggested-by: NDenys Vlasenko <dvlasenk@redhat.com> Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 02 4月, 2015 18 次提交
-
-
由 Ingo Molnar 提交于
On a 32-bit build I got: arch/x86/kernel/cpu/perf_event_intel_pt.c:413:5: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] arch/x86/kernel/cpu/perf_event_intel_bts.c:162:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast] Fix it. The code should probably be (re-)tested on 32-bit systems to make sure all is fine. Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: linux-kernel@vger.kernel.org Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
perf with LBRs on has a tendency to rewrite the DEBUGCTL MSR with the same value. Add a little optimization to skip the unnecessary write. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
The perf PMI currently does unnecessary MSR accesses when LBRs are enabled. We use LBR freezing, or when in callstack mode force the LBRs to only filter on ring 3. So there is no need to disable the LBRs explicitely in the PMI handler. Also we always unnecessarily rewrite LBR_SELECT in the LBR handler, even though it can never change. 5) | /* write_msr: MSR_LBR_SELECT(1c8), value 0 */ 5) | /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ 5) | /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ 5) | /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 70000000f */ 5) | /* write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 */ 5) | /* write_msr: MSR_LBR_SELECT(1c8), value 0 */ 5) | /* read_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ 5) | /* write_msr: MSR_IA32_DEBUGCTLMSR(1d9), value 1801 */ This patch: - Avoids disabling already frozen LBRs unnecessarily in the PMI - Avoids changing LBR_SELECT in the PMI Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1426871484-21285-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Technically PEBS_ENABLED is only guaranteed to exist when we detected PEBS. So add a check for this to the PMU dump function. I don't think it can happen on a real CPU, but could in a VM. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
LBRs and LBR freezing are controlled through the DEBUGCTL MSR. So dump the state of DEBUGCTL too when dumping the PMU state. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
The PMU reset code didn't quite keep up with newer PMU features. Improve it a bit to really reset a modern PMU: - Clear all overflow status - Clear LBRs and freezing state - Disable fixed counters too Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1425059312-18217-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch disables the PMU HT bug when Hyperthreading (HT) is disabled. We cannot do this test immediately when perf_events is initialized. We need to wait until the topology information is setup properly. As such, we register a later initcall, check the topology and potentially disable the workaround. To do this, we need to ensure there is no user of the PMU. At this point of the boot, the only user is the NMI watchdog, thus we disable it during the switch and re-enable it right after. Having the workaround disabled when it is not needed provides some benefits by limiting the overhead is time and space. The workaround still ensures correct scheduling of the corrupting memory events (0xd0, 0xd1, 0xd2) when HT is off. Those events can only be measured on counters 0-3. Something else the current kernel did not handle correctly. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-13-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
perf/x86/intel: Limit to half counters when the HT workaround is enabled, to avoid exclusive mode starvation This patch limits the number of counters available to each CPU when the HT bug workaround is enabled. This is necessary to avoid situation of counter starvation. Such can arise from configuration where one HT thread, HT0, is using all 4 counters with corrupting events which require exclusion the the sibling HT, HT1. In such case, HT1 would not be able to schedule any event until HT0 is done. To mitigate this problem, this patch artificially limits the number of counters to 2. That way, we can gurantee that at least 2 counters are not in exclusive mode and therefore allow the sibling thread to schedule events of the same type (system vs. per-thread). The 2 counters are not determined in advance. We simply set the limit to two events per HT. This helps mitigate starvation in case of events with specific counter constraints such a PREC_DIST. Note that this does not elimintate the starvation is all cases. But it is better than not having it. (Solution suggested by Peter Zjilstra.) Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
With dynamic constraint, we need to restart from the static constraints each time the intel_get_event_constraints() is called. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-10-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch modifies the PEBS constraint tables for SNB/IVB/HSW such that corrupting events supporting PEBS activate the HT workaround. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-9-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patches activates the HT bug workaround for the SNB/IVB/HSW processors. This covers non-PEBS mode. Activation is done thru the constraint tables. Both client and server processors needs this workaround. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-8-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch implements a software workaround for a HW erratum on Intel SandyBridge, IvyBridge and Haswell processors with Hyperthreading enabled. The errata are documented for each processor in their respective specification update documents: - SandyBridge: BJ122 - IvyBridge: BV98 - Haswell: HSD29 The bug causes silent counter corruption across hyperthreads only when measuring certain memory events (0xd0, 0xd1, 0xd2, 0xd3). Counters measuring those events may leak counts to the sibling counter. For instance, counter 0, thread 0 measuring event 0xd0, may leak to counter 0, thread 1, regardless of the event measured there. The size of the leak is not predictible. It all depends on the workload and the state of each sibling hyper-thread. The corrupting events do undercount as a consequence of the leak. The leak is compensated automatically only when the sibling counter measures the exact same corrupting event AND the workload is on the two threads is the same. Given, there is no way to guarantee this, a work-around is necessary. Furthermore, there is a serious problem if the leaked count is added to a low-occurrence event. In that case the corruption on the low occurrence event can be very large, e.g., orders of magnitude. There is no HW or FW workaround for this problem. The bug is very easy to reproduce on a loaded system. Here is an example on a Haswell client, where CPU0, CPU4 are siblings. We load the CPUs with a simple triad app streaming large floating-point vector. We use 0x81d0 corrupting event (MEM_UOPS_RETIRED:ALL_LOADS) and 0x20cc (ROB_MISC_EVENTS:LBR_INSERTS). Given we are not using the LBR, the 0x20cc event should be zero. $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 139 277 291 r20cc 10,000969126 seconds time elapsed In this example, 0x81d0 and r20cc ar eusing sinling counters on CPU0 and CPU4. 0x81d0 leaks into 0x20cc and corrupts it from 0 to 139 millions occurrences. This patch provides a software workaround to this problem by modifying the way events are scheduled onto counters by the kernel. The patch forces cross-thread mutual exclusion between counters in case a corrupting event is measured by one of the hyper-threads. If thread 0, counter 0 is measuring event 0xd0, then nothing can be measured on counter 0, thread 1. If no corrupting event is measured on any hyper-thread, event scheduling proceeds as before. The same example run with the workaround enabled, yield the correct answer: $ taskset -c 0 triad & $ taskset -c 4 triad & $ perf stat -a -C 0 -e r81d0 sleep 100 & $ perf stat -a -C 4 -r20cc sleep 10 Performance counter stats for 'system wide': 0 r20cc 10,000969126 seconds time elapsed The patch does provide correctness for all non-corrupting events. It does not "repatriate" the leaked counts back to the leaking counter. This is planned for a second patch series. This patch series makes this repatriation more easy by guaranteeing the sibling counter is not measuring any useful event. The patch introduces dynamic constraints for events. That means that events which did not have constraints, i.e., could be measured on any counters, may now be constrained to a subset of the counters depending on what is going on the sibling thread. The algorithm is similar to a cache coherency protocol. We call it XSU in reference to Exclusive, Shared, Unused, the 3 possible states of a PMU counter. As a consequence of the workaround, users may see an increased amount of event multiplexing, even in situtations where there are fewer events than counters measured on a CPU. Patch has been tested on all three impacted processors. Note that when HT is off, there is no corruption. However, the workaround is still enabled, yet not costing too much. Adding a dynamic detection of HT on turned out to be complex are requiring too much to code to be justified. This patch addresses the issue when PEBS is not used. A subsequent patch fixes the problem when PEBS is used. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [spinlock_t -> raw_spinlock_t] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-7-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch adds a new shared_regs style structure to the per-cpu x86 state (cpuc). It is used to coordinate access between counters which must be used with exclusion across HyperThreads on Intel processors. This new struct is not needed on each PMU, thus is is allocated on demand. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> [peterz: spinlock_t -> raw_spinlock_t] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-6-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch adds an index parameter to the get_event_constraint() x86_pmu callback. It is expected to represent the index of the event in the cpuc->event_list[] array. When the callback is used for fake_cpuc (evnet validation), then the index must be -1. The motivation for passing the index is to use it to index into another cpuc array. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-5-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Maria Dimakopoulou 提交于
This patch adds 3 new PMU model specific callbacks during the event scheduling done by x86_schedule_events(). ->start_scheduling(): invoked when entering the schedule routine. ->stop_scheduling(): invoked at the end of the schedule routine ->commit_scheduling(): invoked for each committed event To be used optionally by model-specific code. Signed-off-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NStephane Eranian <eranian@google.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
Make the cpuc->kfree_on_online a vector to accommodate more than one entry and add the second entry to be used by a later patch. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMaria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Link: http://lkml.kernel.org/r/1416251225-17721-3-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
Because it will be used for more than just tracking the presence of extra registers. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: bp@alien8.de Cc: jolsa@redhat.com Cc: kan.liang@intel.com Cc: maria.n.dimakopoulou@gmail.com Link: http://lkml.kernel.org/r/1416251225-17721-2-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Shishkin 提交于
Add support for Branch Trace Store (BTS) via kernel perf event infrastructure. The difference with the existing implementation of BTS support is that this one is a separate PMU that exports events' trace buffers to userspace by means of AUX area of the perf buffer, which is zero-copy mapped into userspace. The immediate benefit is that the buffer size can be much bigger, resulting in fewer interrupts and no kernel side copying is involved and little to no trace data loss. Also, kernel code can be traced with this driver. The old way of collecting BTS traces still works. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kaixu Xia <kaixu.xia@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Robert Richter <rric@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: acme@infradead.org Cc: adrian.hunter@intel.com Cc: kan.liang@intel.com Cc: markus.t.metzger@intel.com Cc: mathieu.poirier@linaro.org Link: http://lkml.kernel.org/r/1422614435-114702-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-