- 23 10月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
If the TSC is unusable or disabled, then this patch fixes: - Confusion while trying to clear old APIC interrupts. - Division by zero and incorrect programming of the TSC deadline timer. This fixes boot if the CPU has a TSC deadline timer but a missing or broken TSC. The failure to boot can be observed with qemu using -cpu qemu64,-tsc,+tsc-deadline This also happens to me in nested KVM for unknown reasons. With this patch, I can boot cleanly (although without a TSC). Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Cc: Bandan Das <bsd@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/e2fa274e498c33988efac0ba8b7e3120f7f92d78.1413393027.git.luto@amacapital.netSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 20 10月, 2014 1 次提交
-
-
由 Jiang Liu 提交于
When IOAPIC is disabled, acpi_gsi_to_irq() should return gsi directly instead of calling mp_map_gsi_to_irq() to translate gsi to IRQ by IOAPIC. It fixes https://bugzilla.kernel.org/show_bug.cgi?id=84381. This regression was introduced with commit 6b9fb708 "x86, ACPI, irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number" Reported-and-Tested-by: NThomas Richter <thor@math.tu-berlin.de> Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Richter <thor@math.tu-berlin.de> Cc: rui.zhang@intel.com Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: <stable@vger.kernel.org> # 3.17 Link: http://lkml.kernel.org/r/1413816327-12850-1-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 19 10月, 2014 1 次提交
-
-
由 Ingo Molnar 提交于
Makes the code more readable by moving variable and usage closer to each other, which also avoids this build warning in the !CONFIG_HOTPLUG_CPU case: arch/x86/kernel/smpboot.c:105:42: warning: ‘die_complete’ defined but not used [-Wunused-variable] Cc: Prarit Bhargava <prarit@redhat.com> Cc: Lan Tianyu <tianyu.lan@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: srostedt@redhat.com Cc: toshi.kani@hp.com Cc: imammedo@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 10月, 2014 3 次提交
-
-
由 Andi Kleen 提交于
A variable cannot be both __read_mostly and const. This is a meaningless combination. Just make it only const. This fixes the LTO build with numachip enabled. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1411533139-25708-1-git-send-email-andi@firstfloor.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Bryan O'Donoghue 提交于
Intel processors which don't report cache information via cpuid(2) or cpuid(4) need quirk code in the legacy_cache_size callback to report this data. For Intel that callback is is intel_size_cache(). This patch enables calling of cpu_detect_cache_sizes() inside of init_intel() and hence the calling of the legacy_cache callback in intel_size_cache(). Adding this call will ensure that PIII Tualatin currently in intel_size_cache() and Quark SoC X1000 being added to intel_size_cache() in this patch will report their respective cache sizes. This model of calling cpu_detect_cache_sizes() is consistent with AMD/Via/Cirix/Transmeta and Centaur. Also added is a string to idenitfy the Quark as Quark SoC X1000 giving better and more descriptive output via /proc/cpuinfo Adding cpu_detect_cache_sizes to init_intel() will enable calling of intel_size_cache() on Intel processors which currently no code can reach. Therefore this patch will also re-enable reporting of PIII Tualatin cache size information as well as add Quark SoC X1000 support. Comment text and cache flow logic suggested by Thomas Gleixner Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie> Cc: davej@redhat.com Cc: hmh@hmh.eng.br Link: http://lkml.kernel.org/r/1412641189-12415-3-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Bryan O'Donoghue 提交于
Quark SoC X1000 advertises Page Global Enable for it's Translation Lookaside Buffer via cpuid. The silicon does not in fact support PGE and hence will not flush the TLB when CR4.PGE is rewritten. The Quark documentation makes clear the necessity to instead rewrite CR3 in order to flush any TLB entries, irrespective of the state of CR4.PGE or an individual PTE.PGE See Intel Quark Core DevMan_001.pdf section 6.4.11 In setup.c setup_arch() the code will load_cr3() and then do a __flush_tlb_all(). On Quark the entire TLB will be flushed at the load_cr3(). The __flush_tlb_all() have no effect and can be safely ignored. Later on in the boot process we switch off the flag for cpu_has_pge() which means that subsequent calls to __flush_tlb_all() will call __flush_tlb() not __flush_tlb_global() flushing the TLB in the correct way via load_cr3() not CR4.PGE rewrite This patch documents the behaviour of flushing the TLB for Quark in setup_arch() Comment text suggested by Thomas Gleixner Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie> Cc: davej@redhat.com Cc: hmh@hmh.eng.br Link: http://lkml.kernel.org/r/1412641189-12415-2-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 07 10月, 2014 1 次提交
-
-
由 Andy Lutomirski 提交于
The NT flag doesn't do anything in long mode other than causing IRET to #GP. Oddly, CPL3 code can still set NT using popf. Entry via hardware or software interrupt clears NT automatically, so the only relevant entries are fast syscalls. If user code causes kernel code to run with NT set, then there's at least some (small) chance that it could cause trouble. For example, user code could cause a call to EFI code with NT set, and who knows what would happen? Apparently some games on Wine sometimes do this (!), and, if an IRET return happens, they will segfault. That segfault cannot be handled, because signal delivery fails, too. This patch programs the CPU to clear NT on entry via SYSCALL (both 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT in software on entry via SYSENTER. To save a few cycles, this borrows a trick from Jan Beulich in Xen: it checks whether NT is set before trying to clear it. As a result, it seems to have very little effect on SYSENTER performance on my machine. There's another minor bug fix in here: it looks like the CFI annotations were wrong if CONFIG_AUDITSYSCALL=n. Testers beware: on Xen, SYSENTER with NT set turns into a GPF. I haven't touched anything on 32-bit kernels. The syscall mask change comes from a variant of this patch by Anish Bhatt. Note to stable maintainers: there is no known security issue here. A misguided program can set NT and cause the kernel to try and fail to deliver SIGSEGV, crashing the program. This patch fixes Far Cry on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275 Cc: <stable@vger.kernel.org> Reported-by: NAnish Bhatt <anish@chelsio.com> Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 03 10月, 2014 3 次提交
-
-
由 Wei Huang 提交于
PMU checking can fail due to various reasons. On native machine, this is mostly caused by faulty hardware and it is reasonable to use KERN_ERR in reporting. However, when kernel is running on virtualized environment, this checking can fail if virtual PMU is not supported (e.g. KVM on AMD host). It is annoying to see an error message on splash screen, even though we know such failure is benign on virtualized environment. This patch checks if the kernel is running in a virtualized environment. If so, it will use KERN_INFO in reporting, which reduces the syslog priority of them. This patch was tested successfully on KVM. Signed-off-by: NWei Huang <wei@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1411617314-24659-1-git-send-email-wei@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
I was looking for the trinity oops cause in the uncore driver. (so far didn't found it) However I found this tiny race: when a box is set up two threads on the same CPU, they may be setting up the box in parallel (e.g. with kernel preemption). This could lead to the reference count being increasing too much. Always recheck there is no existing cpu reference inside the lock. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1411424826-15629-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
Commit: cebf15eb ("x86, sched: Add new topology for multi-NUMA-node CPUs") some code to try to detect the situation where we have a NUMA node inside of the "DIE" sched domain. It detected this by looking for cpus which match_die() but do not match NUMA nodes via topology_same_node(). I wrote it up as: if (match_die(c, o) == !topology_same_node(c, o)) which actually seemed to work some of the time, albiet accidentally. It should have been doing an &&, not an ==. This code essentially chopped off the "DIE" domain on one of Andrew Morton's systems. He reported that this patch fixed his issue. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave@sr71.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Lan Tianyu <tianyu.lan@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/20140930214546.FD481CFF@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 9月, 2014 16 次提交
-
-
由 Oleg Nesterov 提交于
___preempt_schedule() does SAVE_ALL/RESTORE_ALL but this is suboptimal, we do not need to save/restore the callee-saved register. And we already have arch/x86/lib/thunk_*.S which implements the similar asm wrappers, so it makes sense to redefine ___preempt_schedule() as "THUNK ..." and remove preempt.S altogether. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NAndy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140921184153.GA23727@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Wanpeng Li 提交于
The following bug can be triggered by hot adding and removing a large number of xen domain0's vcpus repeatedly: BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group PGD 5a9d5067 PUD 13067 PMD 0 Oops: 0000 [#3] SMP [...] Call Trace: load_balance ? _raw_spin_unlock_irqrestore idle_balance __schedule schedule schedule_timeout ? lock_timer_base schedule_timeout_uninterruptible msleep lock_device_hotplug_sysfs online_store dev_attr_store sysfs_write_file vfs_write SyS_write system_call_fastpath Last level cache shared mask is built during CPU up and the build_sched_domain() routine takes advantage of it to setup the sched domain CPU topology. However, llc_shared_mask is not released during CPU disable, which leads to an invalid sched domainCPU topology. This patch fix it by releasing the llc_shared_mask correctly during CPU disable. Yasuaki also reported that this can happen on real hardware: https://lkml.org/lkml/2014/7/22/1018 His case is here: == Here is an example on my system. My system has 4 sockets and each socket has 15 cores and HT is enabled. In this case, each core of sockes is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-44, 90-104 Socket#3 | 45-59, 105-119 Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-44 and 90-104. When hot-removing socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30 remains having 0x3fff80000001fffc0000000. After that, when hot-adding socket#2 and #3, each core of sockets is numbered as follows: | CPU# Socket#0 | 0-14 , 60-74 Socket#1 | 15-29, 75-89 Socket#2 | 30-59 Socket#3 | 90-119 Then llc_shared_mask of CPU#30 becomes 0x3fff8000fffffffc0000000. It means that last level cache of Socket#2 is shared with CPU#30-59 and 90-104. So the mask has the wrong value. Signed-off-by: NWanpeng Li <wanpeng.li@linux.intel.com> Tested-by: NLinn Crosetto <linn@hp.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NToshi Kani <toshi.kani@hp.com> Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: <stable@vger.kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1411547885-48165-1-git-send-email-wanpeng.li@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Bryan O'Donoghue 提交于
Quark x1000 advertises PGE via the standard CPUID method PGE bits exist in Quark X1000's PTEs. In order to flush an individual PTE it is necessary to reload CR3 irrespective of the PTE.PGE bit. See Quark Core_DevMan_001.pdf section 6.4.11 This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to crash and burn on this platform. Signed-off-by: NBryan O'Donoghue <pure.logic@nexus-software.ie> Cc: Borislav Petkov <bp@alien8.de> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/1411514784-14885-1-git-send-email-pure.logic@nexus-software.ieSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Lan Tianyu 提交于
With certain kernel configurations, CPU offline consumes more than 100ms during S3. It's a timing related issue: native_cpu_die() would occasionally fall into a 100ms sleep when the CPU idle loop thread marked the CPU state to DEAD too slowly. What native_cpu_die() does is that it polls the CPU state and waits for 100ms if CPU state hasn't been marked to DEAD. The 100ms sleep doesn't make sense and is purely historic. To avoid such long sleeping, this patch adds a 'struct completion' to each CPU, waits for the completion in native_cpu_die() and wakes up the completion when the CPU state is marked to DEAD. Tested on an Intel Xeon server with 48 cores, Ivybridge and on Haswell laptops. The CPU offlining cost on these machines is reduced from more than 100ms to less than 5ms. The system suspend time is reduced by 2.3s on the servers. Borislav and Prarit also helped to test the patch on an AMD machine and a few systems of various sizes and configurations (multi-socket, single-socket, no hyper threading, etc.). No issues were seen. Tested-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NLan Tianyu <tianyu.lan@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: srostedt@redhat.com Cc: toshi.kani@hp.com Cc: imammedo@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409039025-32310-1-git-send-email-tianyu.lan@intel.com [ Improved a few minor details in the code, cleaned up the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Stephane Eranian 提交于
This patch restructures the memory controller (IMC) uncore PMU support for client SNB/IVB/HSW processors. The main change is that it can now cope with more than one PCI device ID per processor model. There are many flavors of memory controllers for each processor. They have different PCI device ID, yet they behave the same w.r.t. the memory controller PMU that we are interested in. The patch now supports two distinct memory controllers for IVB processors: one for mobile, one for desktop. Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140917090616.GA11281@quad Cc: ak@linux.intel.com Cc: kan.liang@intel.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
The PCU frequency band filters use 8 bit each in a register. When setting up the value the shift value was not correctly scaled, which resulted in all filters except for band 0 to be zero. Fix the scaling. This allows to correctly monitor multiple uncore frequency bands. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409872109-31645-5-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
The IvyBridge-EP uncore driver was missing three filter flags: NC, ISOC, C6 which are useful in some cases. Support them in the same way as the Haswell EP driver, by allowing to set them and exposing them in the sysfs formats. Also fix a typo in a define. Relies on the Haswell EP driver to be applied earlier. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1409872109-31645-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
Current code registers PMUs for all possible uncore pci devices. This is not good because, on some machines, one or more uncore pci devices can be missing. The missing pci device make corresponding PMU unusable. Register the PMU only if the uncore device exists. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409872109-31645-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yan, Zheng 提交于
The uncore subsystem in Haswell-EP is similar to Sandy/Ivy Bridge-EP. There are some differences in config register encoding and pci device IDs. The Haswell-EP uncore also supports a few new events. Add the Haswell-EP driver to the snbep split driver. Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com> [ Add missing break. Add imc events. Add cbox nc/isoc/c6. ] Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409872109-31645-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Use the newly added Broadwell cache event list for Haswell too. All Haswell and Broadwell events and offcore masks used in these lists are identical. However Haswell is very different from the Sandy Bridge list that was used previously. That fixes a wide range of mis-counting cache events. The node events are now only for retired memory events, so prefetching and speculative memory accesses are not included. They are PEBS capable now, which makes it much easier to sample for them, plus it's possible to create address maps with -d. The prefetch events are gone now. They way the hardware counts them is very misleading (some prefetches included, others not), so it seemed best to leave them out. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-5-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
On Broadwell INST_RETIRED.ALL cannot be used with any period that doesn't have the lowest 6 bits cleared. And the period should not be smaller than 128. Add a new callback to enforce this, and set it for Broadwell. This is erratum BDM57 and BDM11. How does this handle the case when an app requests a specific period with some of the bottom bits set The apps thinks it is sampling at X occurences per sample, when it is in fact at X - 63 (worst case). Short answer: Any useful instruction sampling period needs to be 4-6 orders of magnitude larger than 128, as an PMI every 128 instructions would instantly overwhelm the system and be throttled. So the +-64 error from this is really small compared to the period, much smaller than normal system jitter. Long answer: <write up by Peter:> IFF we guarantee perf_event_attr::sample_period >= 128. Suppose we start out with sample_period=192; then we'll set period_left to 192, we'll end up with left = 128 (we truncate the lower bits). We get an interrupt, find that period_left = 64 (>0 so we return 0 and don't get an overflow handler), up that to 128. Then we trigger again, at n=256. Then we find period_left = -64 (<=0 so we return 1 and do get an overflow). We increment with sample_period so we get left = 128. We fire again, at n=384, period_left = 0 (<=0 so we return 1 and get an overflow). And on and on. So while the individual interrupts are 'wrong' we get then with interval=256,128 in exactly the right ratio to average out at 192. And this works for everything >=128. So the num_samples*fixed_period thing is still entirely correct +- 127, which is good enough I'd say, as you already have that error anyhow. So no need to 'fix' the tools, al we need to do is refuse to create INST_RETIRED:ALL events with sample_period < 128. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Maria Dimakopoulou <maria.n.dimakopoulou@gmail.com> Cc: Mark Davies <junk@eslaf.co.uk> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1409683455-29168-4-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Add Broadwell support for Broadwell Client to perf. This is very similar to Haswell. It uses a new cache event table, because there were various changes there. The constraint list has one new event that needs to be handled over Haswell. The PEBS event list is the same, so we reuse Haswell's. [fengguang.wu: make intel_bdw_event_constraints[] static] Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
Add names for each Haswell model as requested by Peter. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Link: http://lkml.kernel.org/r/1409683455-29168-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andi Kleen 提交于
71 is a Broadwell, not a Haswell. The model number was added by mistake earlier. Remove it for now, until it can be re-added later with real Broadwell support. In practice it does not cause a lot of issues because the Broadwell PMU is very similar to Haswell, but some details were wrong, and it's better to handle it correctly. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: eranian@google.com Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Link: http://lkml.kernel.org/r/1409683455-29168-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
I'm getting the spew below when booting with Haswell (Xeon E5-2699 v3) CPUs and the "Cluster-on-Die" (CoD) feature enabled in the BIOS. It seems similar to the issue that some folks from AMD ran in to on their systems and addressed in this commit: 161270fc ("x86/smp: Fix topology checks on AMD MCM CPUs") Both these Intel and AMD systems break an assumption which is being enforced by topology_sane(): a socket may not contain more than one NUMA node. AMD special-cased their system by looking for a cpuid flag. The Intel mode is dependent on BIOS options and I do not know of a way which it is enumerated other than the tables being parsed during the CPU bringup process. In other words, we have to trust the ACPI tables <shudder>. This detects the situation where a NUMA node occurs at a place in the middle of the "CPU" sched domains. It replaces the default topology with one that relies on the NUMA information from the firmware (SRAT table) for all levels of sched domains above the hyperthreads. This also fixes a sysfs bug. We used to freak out when we saw the "mc" group cross a node boundary, so we stopped building the MC group. MC gets exported as the 'core_siblings_list' in /sys/devices/system/cpu/cpu*/topology/ and this caused CPUs with the same 'physical_package_id' to not be listed together in 'core_siblings_list'. This violates a statement from Documentation/ABI/testing/sysfs-devices-system-cpu: core_siblings: internal kernel map of cpu#'s hardware threads within the same physical_package_id. core_siblings_list: human-readable list of the logical CPU numbers within the same physical_package_id as cpu#. The sysfs effects here cause an issue with the hwloc tool where it gets confused and thinks there are more sockets than are physically present. Before this patch, there are two packages: # cd /sys/devices/system/cpu/ # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 But 4 _sets_ of core siblings: # cat cpu*/topology/core_siblings_list | sort | uniq -c 9 0-8 9 18-26 9 27-35 9 9-17 After this set, there are only 2 sets of core siblings, which is what we expect for a 2-socket system. # cat cpu*/topology/physical_package_id | sort | uniq -c 18 0 18 1 # cat cpu*/topology/core_siblings_list | sort | uniq -c 18 0-17 18 18-35 Example spew: ... NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. #2 #3 #4 #5 #6 #7 #8 .... node #1, CPUs: #9 ------------[ cut here ]------------ WARNING: CPU: 9 PID: 0 at /home/ak/hle/linux-hle-2.6/arch/x86/kernel/smpboot.c:306 topology_sane.isra.2+0x74/0x90() sched: CPU #9's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. Modules linked in: CPU: 9 PID: 0 Comm: swapper/9 Not tainted 3.17.0-rc1-00293-g8e01c4d-dirty #631 Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS GRNDSDP1.86B.0036.R05.1407140519 07/14/2014 0000000000000009 ffff88046ddabe00 ffffffff8172e485 ffff88046ddabe48 ffff88046ddabe38 ffffffff8109691d 000000000000b001 0000000000000009 ffff88086fc12580 000000000000b020 0000000000000009 ffff88046ddabe98 Call Trace: [<ffffffff8172e485>] dump_stack+0x45/0x56 [<ffffffff8109691d>] warn_slowpath_common+0x7d/0xa0 [<ffffffff8109698c>] warn_slowpath_fmt+0x4c/0x50 [<ffffffff81074f94>] topology_sane.isra.2+0x74/0x90 [<ffffffff8107530e>] set_cpu_sibling_map+0x31e/0x4f0 [<ffffffff8107568d>] start_secondary+0x1ad/0x240 ---[ end trace 3fe5f587a9fcde61 ]--- #10 #11 #12 #13 #14 #15 #16 #17 .... node #2, CPUs: #18 #19 #20 #21 #22 #23 #24 #25 #26 .... node #3, CPUs: #27 #28 #29 #30 #31 #32 #33 #34 #35 Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> [ Added LLC domain and s/match_mc/match_die/ ] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: David Rientjes <rientjes@google.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: brice.goglin@gmail.com Cc: "H. Peter Anvin" <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/20140918193334.C065EBCE@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Paolo Bonzini 提交于
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: NChris Webb <chris@arachsys.com> Tested-by: NChris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 9月, 2014 4 次提交
-
-
由 David E. Box 提交于
Makes the IOSF sideband available through debugfs. Allows developers to experiment with using the sideband to provide debug and analytical tools for units on the SoC. Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1411017231-20807-4-git-send-email-david.e.box@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 David E. Box 提交于
Add Braswell PCI ID to list of supported ID's for the IOSF driver. Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com> Link: http://lkml.kernel.org/r/1411017231-20807-2-git-send-email-david.e.box@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Martin Kelly 提交于
When compiling with CONFIG_DEBUG_FS=n, GCC emits an unused variable warning for pmc_atom.c because "ret" is used only within the CONFIG_DEBUG_FS block. This patch adds a dummy #ifdef for pmc_dbgfs_register() when CONFIG_DEBUG_FS=n to simplify the code and remove the warning. Signed-off-by: NMartin Kelly <martkell@amazon.com> Acked-by: N"Li, Aubrey" <aubrey.li@linux.intel.com> Cc: vishwesh.m.rudramuni@intel.com Link: http://lkml.kernel.org/r/1410963476-8360-1-git-send-email-martin@martingkelly.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rakib Mullick 提交于
intel_init_thermal() is called from a) at the time of system initializing and b) at the time of system resume to initialize thermal monitoring. In case when thermal monitoring is handled by SMI, we get to know it via printk(). Currently it gives the message at both cases, but its okay if we get it only once and no need to get the same message at every time system resumes. So, limit showing this message only at system boot time by avoid showing at system resume and reduce abusing kernel log buffer. Signed-off-by: NRakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomainSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 9月, 2014 3 次提交
-
-
由 Igor Mammedov 提交于
Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Lee, Chun-Yi 提交于
In arch/x86/kernel/setup.c::trim_bios_range(), the codes introduced by 1b5576e6 (base on d8a9e6a5), it updates the first 4Kb of memory to be E820_RESERVED region. That's because it's a BIOS owned area but generally not listed in the E820 table: e820: BIOS-provided physical RAM map: BIOS-e820: [mem 0x0000000000000000-0x0000000000096fff] usable BIOS-e820: [mem 0x0000000000097000-0x0000000000097fff] reserved ... e820: update [mem 0x00000000-0x00000fff] usable ==> reserved e820: remove [mem 0x000a0000-0x000fffff] usable But the region of first 4Kb didn't register to nosave memory: PM: Registered nosave memory: [mem 0x00097000-0x00097fff] PM: Registered nosave memory: [mem 0x000a0000-0x000fffff] The code in e820_mark_nosave_regions() assumes the first e820 area to be RAM, so it causes the first 4Kb E820_RESERVED region ignored when register to nosave. This patch removed assumption of the first e820 area. Signed-off-by: NLee, Chun-Yi <jlee@suse.com> Acked-by: NPavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Len Brown <len.brown@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Takashi Iwai <tiwai@suse.de> Link: http://lkml.kernel.org/r/1410491038-17576-1-git-send-email-jlee@suse.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Neubauer 提交于
As the Soekris net6501 and other e6xx based systems do not have any ACPI implementation, HPET won't get enabled. This patch enables HPET on such platforms. [ 0.430149] pci 0000:00:01.0: Force enabled HPET at 0xfed00000 [ 0.644838] HPET: 3 timers in total, 0 timers will be used for per-cpu timer Original patch by Peter Neubauer (http://www.mail-archive.com/soekris-tech@lists.soekris.com/msg06462.html) slightly modified by Conrad Kostecki <ck@conrad-kostecki.de> and massaged accoring to Thomas Gleixners <tglx@linutronix.de> by me. Suggested-by: NConrad Kostecki <ck@conrad-kostecki.de> Signed-off-by: NEric Sesterhenn <eric.sesterhenn@lsexperts.de> Cc: Peter Neubauer <pneubauer@bluerwhite.org> Link: http://lkml.kernel.org/r/5412D3A5.2030909@lsexperts.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 14 9月, 2014 1 次提交
-
-
由 Frederic Weisbecker 提交于
x86 supports irq work self-IPIs when local apic is available. This is partly known on runtime so lets implement arch_irq_work_has_interrupt() accordingly. This should be safely called after setup_arch(). Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
- 12 9月, 2014 2 次提交
-
-
由 Dave Hansen 提交于
The original motivation for these patches was for an Intel CPU feature called MPX. The patch to add a disabled feature for it will go in with the other parts of the support. But, in the meantime, there are a few other features than MPX that we can make assumptions about at compile-time based on compile options. Add them to disabled-features.h and check them with cpu_feature_enabled(). Note that this gets rid of the last things that needed an #ifdef CONFIG_X86_64 in cpufeature.h. Yay! Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.comAcked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Dave Hansen 提交于
cpu_has_pae is only referenced in one place: the X86_32 kexec code (in a file not even built on 64-bit). It hardly warrants its own macro, or the trouble we go to ensuring that it can't be called in X86_64 code. Axe the macro and replace it with a direct cpu feature check. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.comAcked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 09 9月, 2014 4 次提交
-
-
由 Andi Kleen 提交于
The new split Intel uncore driver code that recently went into tip added a section mismatch, which the build process complains about. uncore_pmu_register() can be called from uncore_pci_probe,() which is not __init and can be called from pci driver ->probe. I'm not fully sure if it's actually possible to call the probe function later, but it seems safer to mark uncore_pmu_register not __init. This also fixes the warning. Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1409332858-29039-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mathias Krause 提交于
A few of the initialization functions are missing the __init annotation. Fix this and thereby allow ~680 additional bytes of code to be released after initialization. Signed-off-by: NMathias Krause <minipli@googlemail.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1409071785-26015-1-git-send-email-minipli@googlemail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
On KVM on my box, this reduces the overhead from an always-accept seccomp filter from ~130ns to ~17ns. Most of that comes from avoiding IRET on every syscall when seccomp is enabled. In extremely approximate hacked-up benchmarking, just bypassing IRET saves about 80ns, so there's another 43ns of savings here from simplifying the seccomp path. The diffstat is also rather nice :) Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/a3dbd267ee990110478d349f78cccfdac5497a84.1409954077.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andy Lutomirski 提交于
For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick the syscall number into regs->orig_ax prior to any possible tracing and syscall execution. This is user-visible ABI used by ptrace syscall emulation and seccomp. For fastpath syscalls, there's no good reason not to do the same thing. It's even slightly simpler than what we're currently doing. It probably has no measureable performance impact. It should have no user-visible effect. The purpose of this patch is to prepare for two-phase syscall tracing, in which the first phase might modify the saved RAX without leaving the fast path. This change is just subtle enough that I'm keeping it separate. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-