- 23 7月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
P4 systems with cpuid level < 4 can have SMT, but the cache topology description available (cpuid2) does not include SMP information. Now we know that SMT shares all cache levels, and therefore we can mark all available cache levels as shared. We do this by setting cpu_llc_id to ->phys_proc_id, since that's the same for each SMT thread. We can do this unconditional since if there's no SMT its still true, the one CPU shares cache with only itself. This fixes a problem where such CPUs report an incorrect LLC CPU mask. This in turn fixes a crash in the scheduler where the topology was build wrong, it assumes the LLC mask to include at least the SMT CPUs. Cc: Josh Boyer <jwboyer@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: NBruno Wolff III <bruno@wolff.to> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140722133514.GM12054@laptop.lanSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 22 7月, 2014 2 次提交
-
-
由 Sven Wegener 提交于
Commit 554086d8 ("x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)") introduced a regression in the x86_32 syscall entry code, resulting in syscall() not returning proper errors for undefined syscalls on CPUs supporting the sysenter feature. The following code: > int result = syscall(666); > printf("result=%d errno=%d error=%s\n", result, errno, strerror(errno)); results in: > result=666 errno=0 error=Success Obviously, the syscall return value is the called syscall number, but it should have been an ENOSYS error. When run under ptrace it behaves correctly, which makes it hard to debug in the wild: > result=-1 errno=38 error=Function not implemented The %eax register is the return value register. For debugging via ptrace the syscall entry code stores the complete register context on the stack. The badsys handlers only store the ENOSYS error code in the ptrace register set and do not set %eax like a regular syscall handler would. The old resume_userspace call chain contains code that clobbers %eax and it restores %eax from the ptrace registers afterwards. The same goes for the ptrace-enabled call chain. When ptrace is not used, the syscall return value is the passed-in syscall number from the untouched %eax register. Use %eax as the return value register in syscall_badsys and sysenter_badsys, like a real syscall handler does, and have the caller push the value onto the stack for ptrace access. Signed-off-by: NSven Wegener <sven.wegener@stealer.net> Link: http://lkml.kernel.org/r/alpine.LNX.2.11.1407221022380.31021@titan.int.lan.stealer.netReviewed-and-tested-by: NAndy Lutomirski <luto@amacapital.net> Cc: <stable@vger.kernel.org> # If 554086d8 is backported Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
由 Borislav Petkov 提交于
BorisO reports that misc_register() fails often on xen. The current code unregisters the CPU hotplug notifier in that case. If then a CPU is offlined and onlined back again, we end up with a second timer running on that CPU, leading to soft lockups and system hangs. So let's leave the hotcpu notifier always registered - even if mce_device_create failed for some cores and never unreg it so that we can deal with the timer handling accordingly. Reported-and-Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.comSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 16 7月, 2014 1 次提交
-
-
由 Paul Bolle 提交于
Compile tested. "polling" is unused since commit f80c5b39 ("sched/idle, x86: Switch from TS_POLLING to TIF_POLLING_NRFLAG"). Signed-off-by: NPaul Bolle <pebolle@tiscali.nl> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Jiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/1404138749.2978.6.camel@x41Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 7月, 2014 1 次提交
-
-
由 Boris Ostrovsky 提交于
init_espfix_ap() is currently off by one level when informing hypervisor that allocated pages will be used for ministacks' page tables. The most immediate effect of this on a PV guest is that if 'stack_page = __get_free_page()' returns a non-zeroed-out page the hypervisor will refuse to use it for a page table (which it shouldn't be anyway). This will result in warnings by both Xen and Linux. More importantly, a subsequent write to that page (again, by a PV guest) is likely to result in fatal page fault. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1404926298-5565-1-git-send-email-boris.ostrovsky@oracle.comReviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org>
-
- 02 7月, 2014 2 次提交
-
-
由 HATAYAMA Daisuke 提交于
Currently, any NMI is falsely handled by a NMI handler of NMI watchdog if CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR is set. For example, we use external NMI to make system panic to get crash dump, but in this case, the external NMI is falsely handled do to the issue. This commit deals with the issue simply by ignoring CondChgd bit. Here is explanation in detail. On x86 NMI watchdog uses performance monitoring feature to periodically signal NMI each time performance counter gets overflowed. intel_pmu_handle_irq() is called as a NMI_LOCAL handler from a NMI handler of NMI watchdog, perf_event_nmi_handler(). It identifies an owner of a given NMI by looking at overflow status bits in MSR_CORE_PERF_GLOBAL_STATUS MSR. If some of the bits are set, then it handles the given NMI as its own NMI. The problem is that the intel_pmu_handle_irq() doesn't distinguish CondChgd bit from other bits. Unlike the other status bits, CondChgd bit doesn't represent overflow status for performance counters. Thus, CondChgd bit cannot be thought of as a mark indicating a given NMI is NMI watchdog's. As a result, if CondChgd bit is set, any NMI is falsely handled by the NMI handler of NMI watchdog. Also, if type of the falsely handled NMI is either NMI_UNKNOWN, NMI_SERR or NMI_IO_CHECK, the corresponding action is never performed until CondChgd bit is cleared. I noticed this behavior on systems with Ivy Bridge processors: Intel Xeon CPU E5-2630 v2 and Intel Xeon CPU E7-8890 v2. On both systems, CondChgd bit in MSR_CORE_PERF_GLOBAL_STATUS MSR has already been set in the beginning at boot. Then the CondChgd bit is immediately cleared by next wrmsr to MSR_CORE_PERF_GLOBAL_CTRL MSR and appears to remain 0. On the other hand, on older processors such as Nehalem, Xeon E7540, CondChgd bit is not set in the beginning at boot. I'm not sure about exact behavior of CondChgd bit, in particular when this bit is set. Although I read Intel System Programmer's Manual to figure out that, the descriptions I found are: In 18.9.1: "The MSR_PERF_GLOBAL_STATUS MSR also provides a ¡sticky bit¢ to indicate changes to the state of performancmonitoring hardware" In Table 35-2 IA-32 Architectural MSRs 63 CondChg: status bits of this register has changed. These are different from the bahviour I see on the actual system as I explained above. At least, I think ignoring CondChgd bit should be enough for NMI watchdog perspective. Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140625.103503.409316067.d.hatayama@jp.fujitsu.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mauro reported that his AMD X2 using the powernow-k8 cpufreq driver locked up when doing cpu hotplug. Because we called set_cyc2ns_scale() from the time_cpufreq_notifier() unconditionally, it gets called multiple times for each freq change, instead of only the once, when the tsc_khz value actually changes. Because it gets called more than once, we run out of cyc2ns data slots and stall, waiting for a free one, but because we're half way offline, there's no consumers to free slots. By placing the call inside the condition that actually changes tsc_khz we avoid superfluous calls and avoid the problem. Reported-by: NMauro <registosites@hotmail.com> Tested-by: NMauro <registosites@hotmail.com> Fixes: 20d1c86a ("sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs") Cc: <stable@vger.kernel.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Bin Gao <bin.gao@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: linux-kernel@vger.kernel.org Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 6月, 2014 3 次提交
-
-
由 Aaron Tomlin 提交于
Sometimes it is preferred not to use the trigger_all_cpu_backtrace() routine when one wants to avoid capturing a back trace for current. For instance if one was previously captured recently. This patch provides a new routine namely trigger_allbutself_cpu_backtrace() which offers the flexibility to issue an NMI to every cpu but current and capture a back trace accordingly. Patch x86 and sparc to support new routine. [dzickus@redhat.com: add stub in #else clause] [dzickus@redhat.com: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion] [sfr@canb.auug.org.au: undo C99ism] Signed-off-by: NAaron Tomlin <atomlin@redhat.com> Signed-off-by: NDon Zickus <dzickus@redhat.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Lutomirski 提交于
This commit: commit 6f121e54 Author: Andy Lutomirski <luto@amacapital.net> Date: Mon May 5 12:19:34 2014 -0700 x86, vdso: Reimplement vdso.so preparation in build-time C Contained this obvious typo: - restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn); + restorer = current->mm->context.vdso + + selected_vdso32->sym___kernel_sigreturn; Note the missing 'rt_' in the new code. Fix it. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/1eb40ad923acde2e18357ef2832867432e70ac42.1403361010.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andy Lutomirski 提交于
The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bb i386 syscall audit fast-path Cc: stable@vger.kernel.org Cc: Roland McGrath <roland@redhat.com> Reported-by: NToralf Förster <toralf.foerster@gmx.de> Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 14 6月, 2014 1 次提交
-
-
由 Masami Hiramatsu 提交于
This essentially reverts commit: ecd50f71 ("kprobes, x86: Call exception_enter after kprobes handled") since it causes build errors with CONFIG_CONTEXT_TRACKING and that has been made from misunderstandings; context_track_user_*() don't involve much in interrupt context, it just returns if in_interrupt() is true. Instead of changing the do_debug/int3(), this just adds context_track_user_*() to kprobes blacklist, since those are still can be called right before kprobes handles int3 and debug exceptions, and probing those will cause an infinite loop. Reported-by: NFrederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Borislav Petkov <bp@suse.de> Cc: Kees Cook <keescook@chromium.org> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/20140614064711.7865.45957.stgit@kbuild-fedora.novalocalSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 6月, 2014 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 3e1a878b. It came in very late, and already has one reported failure: Sitsofe reports that the current tree fails to boot on his EeePC, and bisected it down to this. Rather than waste time trying to figure out what's wrong, just revert it. Reported-by: NSitsofe Wheeler <sitsofe@gmail.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 6月, 2014 12 次提交
-
-
由 Igor Mammedov 提交于
Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Igor Mammedov 提交于
If system is running without debug level logging, it will not log error if do_boot_cpu() failed to wakeup AP. It may lead to silent AP bringup failures at boot time. Change message level to KERN_ERR to make error visible to user as it's done on other architectures. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Igor Mammedov 提交于
currently if AP wake up is failed, master CPU marks AP as not present in do_boot_cpu() by calling set_cpu_present(cpu, false). That leads to following list corruption on the next physical CPU hotplug: [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0() [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0). [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387 [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021 [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8 [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea [ 418.178445] Call Trace: [ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c [ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0 [ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50 [ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7 [ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0 [ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200 [ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60 [ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70 [ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550 [ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0 [ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30 [ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130 [ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150 [ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b ... Which is caused by the fact that generic_processor_info() allocates logical CPU id by calling: cpu = cpumask_next_zero(-1, cpu_present_mask); which returns id of previously failed to wake up CPU, since its bit is cleared by do_boot_cpu() and as result register_cpu() tries to register another CPU with the same id as already present but failed to be onlined CPU. Taking in account that AP will not do anything if master CPU failed to wake it up, there is no reason to mark that AP as not present and break next cpu hotplug attempts. As a side effect of not marking AP as not present, user would be allowed to online it again later. Also fix memory corruption in acpi_unmap_lsapic() if during CPU hotplug master CPU failed to wake up AP it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP. However following attempt to unplug that CPU will lead to out of bound write access to __apicid_to_node[] which is 32768 items long on x86_64 kernel. So with above fix of cpu_present_mask make sure that a present CPU has a valid APIC ID by not setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu() on failure and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Oleg Nesterov 提交于
Purely cosmetic, no changes in .o, 1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to ->defparam. 2. Add the comment into default_post_xol_op() to explain "regs->sp +=". 3. Remove the stale part of the comment in arch_uprobe_analyze_insn(). Suggested-by: NJim Keniston <jkenisto@us.ibm.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: NOleg Nesterov <oleg@redhat.com>
-
由 Anshuman Khandual 提交于
This patch adds conditional branch filtering support, enabling it for PERF_SAMPLE_BRANCH_COND in perf branch stack sampling framework by utilizing an available software filter X86_BR_JCC. Signed-off-by: NAnshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: NStephane Eranian <eranian@google.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: mpe@ellerman.id.au Cc: benh@kernel.crashing.org Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1400743210-32289-3-git-send-email-khandual@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vince Weaver 提交于
Make the x86 perf code use the new common PMU interrupt disabled code. Typically most x86 machines have working PMU interrupts, although some older p6-class machines had this problem. Signed-off-by: NVince Weaver <vincent.weaver@maine.edu> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1405161715560.11099@vincent-weaver-1.umelst.maine.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
... instead of naked numbers. Stuff in sysrq.c used to set it to 8 which is supposed to mean above default level so set it to DEBUG instead as we're terminating/killing all tasks and we want to be verbose there. Also, correct the check in x86_64_start_kernel which should be >= as we're clearly issuing the string there for all debug levels, not only the magical 10. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chen Yucong 提交于
Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: NChen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
dma_generic_alloc_coherent() firstly attempts to allocate by dma_alloc_from_contiguous() if CONFIG_DMA_CMA is enabled. But the memory region allocated by it may not fit within the device's DMA mask. This change makes it fall back to usual alloc_pages_node() allocation for such cases. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
Currently, "cma=" kernel parameter is used to specify the size of CMA, but we can't specify where it is located. We want to locate CMA below 4GB for devices only supporting 32-bit addressing on 64-bit systems without iommu. This enables to specify the placement of CMA by extending "cma=" kernel parameter. Examples: 1. locate 64MB CMA below 4GB by "cma=64M@0-4G" 2. locate 64MB CMA exact at 512MB by "cma=64M@512M" Note that the DMA contiguous memory allocator on x86 assumes that page_address() works for the pages to allocate. So this change requires to limit end address of contiguous memory area upto max_pfn_mapped to prevent from locating it on highmem area by the argument of dma_contiguous_reserve(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
The DMA Contiguous Memory Allocator support on x86 is disabled when swiotlb config option is enabled. So DMA CMA is always disabled on x86_64 because swiotlb is always enabled. This attempts to support for DMA CMA with enabling swiotlb config option. The contiguous memory allocator on x86 is integrated in the function dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops for dma_alloc_coherent(). x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops tries to allocate with dma_generic_alloc_coherent() firstly and then swiotlb_alloc_coherent() is called as a fallback. The main part of supporting DMA CMA with swiotlb is that changing x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops for dma_free_coherent() so that it can distinguish memory allocated by dma_generic_alloc_coherent() from one allocated by swiotlb_alloc_coherent() and release it with dma_generic_free_coherent() which can handle contiguous memory. This change requires making is_swiotlb_buffer() global function. This also needs to change .free callback in the dma_map_ops for amd_gart and sta2x11, because these dma_ops are also using dma_generic_alloc_coherent(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This patchset enhances the DMA Contiguous Memory Allocator on x86. Currently the DMA CMA is only supported with pci-nommu dma_map_ops and furthermore it can't be enabled on x86_64. But I would like to allocate big contiguous memory with dma_alloc_coherent() and tell it to the device that requires it, regardless of which dma mapping implementation is actually used in the system. So this makes it work with swiotlb and intel-iommu dma_map_ops, too. And this also extends "cma=" kernel parameter to specify placement constraint by the physical address range of memory allocations. For example, CMA allocates memory below 4GB by "cma=64M@0-4G", it is required for the devices only supporting 32-bit addressing on 64-bit systems without iommu. This patch (of 5): Calling dma_alloc_coherent() with __GFP_ZERO must return zeroed memory. But when the contiguous memory allocator (CMA) is enabled on x86 and the memory region is allocated by dma_alloc_from_contiguous(), it doesn't return zeroed memory. Because dma_generic_alloc_coherent() forgot to fill the memory region with zero if it was allocated by dma_alloc_from_contiguous() Most implementations of dma_alloc_coherent() return zeroed memory regardless of whether __GFP_ZERO is specified. So this fixes it by unconditionally zeroing the allocated memory region. Alternatively, we could fix dma_alloc_from_contiguous() to return zeroed out memory and remove memset() from all caller of it. But we can't simply remove the memset on arm because __dma_clear_buffer() is used there for ensuring cache flushing and it is used in many places. Of course we can do redundant memset in dma_alloc_from_contiguous(), but I think this patch is less impact for fixing this problem. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2014 2 次提交
-
-
由 Yinghai Lu 提交于
check_irq_vectors_for_cpu_disable() can overestimate the number of available interrupt vectors, so the check for cpu down succeeds, but the actual cpu removal fails. It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong because the systems vectors are not taken into account. Limit the search to first_system_vector instead of NR_VECTORS. The second indicator for vector availability the used_vectors bitmap is not taken into account at all. So system vectors, e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20), are accounted as available. Add a check for the used_vectors bitmap and do not account vectors which are marked there. [ tglx: Simplified code. Rewrote changelog and code comments. ] Signed-off-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Petr Mladek 提交于
I just went over this when looking at some Xen-related ftrace initialization problems. They were related to Xen code that is not upstream but this clean up would make sense here. I think that this was already the intention when text_ip_addr() was introduced in the commit 87fbb2ac (ftrace/x86: Use breakpoints for converting function graph caller). Anyway, better do it now before it shots people into their leg ;-) Link: http://lkml.kernel.org/p/1401812601-2359-1-git-send-email-pmladek@suse.czSigned-off-by: NPetr Mladek <pmladek@suse.cz> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 31 5月, 2014 2 次提交
-
-
由 Borislav Petkov 提交于
There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Mathieu Souchaud 提交于
Check return code of every function called by mcheck_init_device(). Signed-off-by: NMathieu Souchaud <mattieu.souchaud@free.fr> Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.frSigned-off-by: NBorislav Petkov <bp@suse.de>
-
- 24 5月, 2014 2 次提交
-
-
由 Bjorn Helgaas 提交于
Print the AGP bridge info the same way as the rest of the kernel, e.g., "0000:00:04.0" instead of "00:04:00". Also print the AGP aperture address range the same way we print resources, and label it explicitly as a bus address range. No functional change except the message changes. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Bjorn Helgaas 提交于
Replace printk() with pr_info(), pr_err(), etc. Define pr_fmt() to prefix output with "AGP: ". No functional change except the addition of "AGP: " prefix in dmesg output. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 22 5月, 2014 5 次提交
-
-
由 Dave Hansen 提交于
I noticed on some of my systems that page fault tracing doesn't work: cd /sys/kernel/debug/tracing echo 1 > events/exceptions/enable cat trace; # nothing shows up I eventually traced it down to CONFIG_KVM_GUEST. At least in a KVM VM, enabling that option breaks page fault tracing, and disabling fixes it. I tried on some old kernels and this does not appear to be a regression: it never worked. There are two page-fault entry functions today. One when tracing is on and another when it is off. The KVM code calls do_page_fault() directly instead of calling the traced version: > dotraplinkage void __kprobes > do_async_page_fault(struct pt_regs *regs, unsigned long > error_code) > { > enum ctx_state prev_state; > > switch (kvm_read_and_reset_pf_reason()) { > default: > do_page_fault(regs, error_code); > break; > case KVM_PV_REASON_PAGE_NOT_PRESENT: I'm also having problems with the page fault tracing on bare metal (same symptom of no trace output). I'm unsure if it's related. Steven had an alternative to this which has zero overhead when tracing is off where this includes the standard noops even when tracing is disabled. I'm unconvinced that the extra complexity of his apporach: http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home is worth it, expecially considering that the KVM code is already making page fault entry slower here. This solution is dirt-simple. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Gleb Natapov <gleb@redhat.com> Cc: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
One more specialized entry function is now gone. Again, this seems to only change line numbers in entry_64.o. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/f54854f07ff3be8162b166124dbead23feeefe10.1400709717.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andy Lutomirski 提交于
I haven't touched the device interrupt code, which is different enough that it's probably not worth merging, and I haven't done anything about paranoidzeroentry_ist yet. This appears to produce an entry_64.o file that differs only in the debug info line numbers. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e7a6acfb130471700370e77af9e4b4b6ed46f5ef.1400709717.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andy Lutomirski 提交于
The paranoidzeroentry macros were missing them. I'm not at all convinced that these annotations are correct and/or necessary, but this makes the macros more consistent with each other. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/10ad65f534f8bc62e77f74fe15f68e8d4a59d8b3.1400709717.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 H. Peter Anvin 提交于
This reverts commit fa81511b in preparation of merging in the proper fix (espfix64). Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 21 5月, 2014 1 次提交
-
-
由 Borislav Petkov 提交于
Add a cmdline param which disables the microcode loader. This is useful mostly in debugging situations where we want to turn off microcode loading, both early from the initrd and late, as a means to be able to rule out its influence on the machine. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1400525957-11525-3-git-send-email-bp@alien8.deSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 19 5月, 2014 1 次提交
-
-
由 Stephane Eranian 提交于
This patch fixes a bug in precise_store_data_hsw() whereby it would set the data source memory level to the wrong value. As per the the SDM Vol 3b Table 18-41 (Layout of Data Linear Address Information in PEBS Record), when status bit 0 is set this is a L1 hit, otherwise this is a L1 miss. This patch encodes the memory level according to the specification. In V2, we added the filtering on the store events. Only the following events produce L1 information: * MEM_UOPS_RETIRED.STLB_MISS_STORES * MEM_UOPS_RETIRED.LOCK_STORES * MEM_UOPS_RETIRED.SPLIT_STORES * MEM_UOPS_RETIRED.ALL_STORES Cc: mingo@elte.hu Cc: acme@ghostprotocols.net Cc: jolsa@redhat.com Cc: jmario@redhat.com Cc: ak@linux.intel.com Tested-and-Reviewed-by: NDon Zickus <dzickus@redhat.com> Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140515155644.GA3884@quadSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 16 5月, 2014 3 次提交
-
-
由 Thomas Gleixner 提交于
That's a leftover from the time where x86 supported SPARSE_IRQ=n. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NGrant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20140507154338.967285614@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
No more users. Remove the cruft Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NGrant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/20140507154336.760446122@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
No need to expose this outside of the ioapic code. The dynamic allocations are guaranteed not to happen in the gsi space. See commit 62a08ae2. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NGrant Likely <grant.likely@linaro.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: x86@kernel.org Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20140507154335.959870037@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-