- 31 7月, 2014 3 次提交
-
-
由 Dave Hansen 提交于
If we take the if (end == TLB_FLUSH_ALL || vmflag & VM_HUGETLB) { local_flush_tlb(); goto out; } path out of flush_tlb_mm_range(), we will have flushed the tlb, but not incremented NR_TLB_LOCAL_FLUSH_ALL. This unifies the way out of the function so that we always take a single path when doing a full tlb flush. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154056.FF763B76@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Dave Hansen 提交于
I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, *NOT* ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Dave Hansen 提交于
The if (cpumask_any_but(mm_cpumask(mm), smp_processor_id()) < nr_cpu_ids) line of code is not exactly the easiest to audit, especially when it ends up at two different indentation levels. This eliminates one of the the copy-n-paste versions. It also gives us a unified exit point for each path through this function. We need this in a minute for our tracepoint. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154054.44F1CDDC@viggo.jf.intel.comAcked-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 12 6月, 2014 1 次提交
-
-
由 Jiri Kosina 提交于
If pagefault triggers due to SMEP triggering, it can't be really easily distinguished from any other oops-causing pagefault, which might lead to quite some confusion when trying to understand the reason for the oops. Print an explanatory message in case the fault happened during instruction fetch for _PAGE_USER page which is present and executable on SMEP-enabled CPUs. This is consistent with what we are doing for NX already; in addition to immediately seeing from the oops what might be happening, it can even easily give a good indication to sysadmins who are carefully monitoring their kernel logs that someone might be trying to pwn them. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1406102248490.1321@pobox.suse.czSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
- 09 6月, 2014 1 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit 3e1a878b. It came in very late, and already has one reported failure: Sitsofe reports that the current tree fails to boot on his EeePC, and bisected it down to this. Rather than waste time trying to figure out what's wrong, just revert it. Reported-by: NSitsofe Wheeler <sitsofe@gmail.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 6月, 2014 1 次提交
-
-
由 Matt Fleming 提交于
commit 7d453eee ("x86/efi: Wire up CONFIG_EFI_MIXED") introduced a regression for the functionality to load kernels above 4G. The relevant (incorrect) reasoning behind this change can be seen in the commit message, "The xloadflags field in the bzImage header is also updated to reflect that the kernel supports both entry points by setting both of XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y. XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is guaranteed to be addressable with 32-bits." This is obviously bogus since 32-bit EFI loaders will never place the kernel above the 4G mark. So this restriction is entirely unnecessary. But things are worse than that - since we want to encourage people to always compile with CONFIG_EFI_MIXED=y so that their kernels work out of the box for both 32-bit and 64-bit firmware, commit 7d453eee effectively disables XLF_CAN_BE_LOADED_ABOVE_4G completely. Remove the overzealous and superfluous restriction and restore the XLF_CAN_BE_LOADED_ABOVE_4G functionality. Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1402140380-15377-1-git-send-email-matt@console-pimps.orgSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 07 6月, 2014 2 次提交
-
-
由 Oleg Nesterov 提交于
It has no users and it doesn't look useful. I do not know why/when it was introduced, I can't even find any user in the git history. Signed-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Richard Weinberger <richard@nod.at> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 H. Peter Anvin 提交于
There are no standard functions for littleendian data (unlike bigendian data.) Thus, use <tools/le_byteshift.h> to access littleendian data members. Those are fairly inefficient, but it doesn't matter for this purpose (and can be optimized later.) This avoids portability problems. Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Tested-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/20140606140017.afb7f91142f66cb3dd13c186@linux-foundation.org
-
- 05 6月, 2014 17 次提交
-
-
由 Igor Mammedov 提交于
Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Igor Mammedov 提交于
If system is running without debug level logging, it will not log error if do_boot_cpu() failed to wakeup AP. It may lead to silent AP bringup failures at boot time. Change message level to KERN_ERR to make error visible to user as it's done on other architectures. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Igor Mammedov 提交于
currently if AP wake up is failed, master CPU marks AP as not present in do_boot_cpu() by calling set_cpu_present(cpu, false). That leads to following list corruption on the next physical CPU hotplug: [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0() [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0). [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387 [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021 [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8 [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea [ 418.178445] Call Trace: [ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c [ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0 [ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50 [ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7 [ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0 [ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200 [ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60 [ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70 [ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550 [ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0 [ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30 [ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130 [ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150 [ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b ... Which is caused by the fact that generic_processor_info() allocates logical CPU id by calling: cpu = cpumask_next_zero(-1, cpu_present_mask); which returns id of previously failed to wake up CPU, since its bit is cleared by do_boot_cpu() and as result register_cpu() tries to register another CPU with the same id as already present but failed to be onlined CPU. Taking in account that AP will not do anything if master CPU failed to wake it up, there is no reason to mark that AP as not present and break next cpu hotplug attempts. As a side effect of not marking AP as not present, user would be allowed to online it again later. Also fix memory corruption in acpi_unmap_lsapic() if during CPU hotplug master CPU failed to wake up AP it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP. However following attempt to unplug that CPU will lead to out of bound write access to __apicid_to_node[] which is 32768 items long on x86_64 kernel. So with above fix of cpu_present_mask make sure that a present CPU has a valid APIC ID by not setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu() on failure and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU. Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Acked-by: NToshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
... instead of naked numbers. Stuff in sysrq.c used to set it to 8 which is supposed to mean above default level so set it to DEBUG instead as we're terminating/killing all tasks and we want to be verbose there. Also, correct the check in x86_64_start_kernel which should be >= as we're clearly issuing the string there for all debug levels, not only the magical 10. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Joe Perches <joe@perches.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Fabian Frederick 提交于
sys_sgetmask and sys_ssetmask are obsolete system calls no longer supported in libc. This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert mode configuration.That option is enabled by default for those architectures. Signed-off-by: NFabian Frederick <fabf@skynet.be> Cc: Steven Miao <realmz6@gmail.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michal Simek <monstr@monstr.eu> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Ungerer <gerg@uclinux.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chen Yucong 提交于
Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: NChen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Emil Medve 提交于
Signed-off-by: NEmil Medve <Emilian.Medve@Freescale.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
Tracking dirty status on 2 level pages requires very ugly macros and taking into account how old the machines who can operate without PAE mode only are, lets drop soft dirty tracker from them for code simplicity (note I can't drop all the macros from 2 level pages by now since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without tracker). Linus proposed to completely rip off softdirty support on x86-32 (even with PAE) and since for CRIU we're not planning to support native x86-32 mode, lets do that. (Softdirty tracker is relatively new feature which is mostly used by CRIU so I don't expect if such API change would cause problems for userspace). Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Cyrill Gorcunov 提交于
_PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so drop redundant #ifdef. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
dma_generic_alloc_coherent() firstly attempts to allocate by dma_alloc_from_contiguous() if CONFIG_DMA_CMA is enabled. But the memory region allocated by it may not fit within the device's DMA mask. This change makes it fall back to usual alloc_pages_node() allocation for such cases. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
Currently, "cma=" kernel parameter is used to specify the size of CMA, but we can't specify where it is located. We want to locate CMA below 4GB for devices only supporting 32-bit addressing on 64-bit systems without iommu. This enables to specify the placement of CMA by extending "cma=" kernel parameter. Examples: 1. locate 64MB CMA below 4GB by "cma=64M@0-4G" 2. locate 64MB CMA exact at 512MB by "cma=64M@512M" Note that the DMA contiguous memory allocator on x86 assumes that page_address() works for the pages to allocate. So this change requires to limit end address of contiguous memory area upto max_pfn_mapped to prevent from locating it on highmem area by the argument of dma_contiguous_reserve(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
The DMA Contiguous Memory Allocator support on x86 is disabled when swiotlb config option is enabled. So DMA CMA is always disabled on x86_64 because swiotlb is always enabled. This attempts to support for DMA CMA with enabling swiotlb config option. The contiguous memory allocator on x86 is integrated in the function dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops for dma_alloc_coherent(). x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops tries to allocate with dma_generic_alloc_coherent() firstly and then swiotlb_alloc_coherent() is called as a fallback. The main part of supporting DMA CMA with swiotlb is that changing x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops for dma_free_coherent() so that it can distinguish memory allocated by dma_generic_alloc_coherent() from one allocated by swiotlb_alloc_coherent() and release it with dma_generic_free_coherent() which can handle contiguous memory. This change requires making is_swiotlb_buffer() global function. This also needs to change .free callback in the dma_map_ops for amd_gart and sta2x11, because these dma_ops are also using dma_generic_alloc_coherent(). Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com> Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This patchset enhances the DMA Contiguous Memory Allocator on x86. Currently the DMA CMA is only supported with pci-nommu dma_map_ops and furthermore it can't be enabled on x86_64. But I would like to allocate big contiguous memory with dma_alloc_coherent() and tell it to the device that requires it, regardless of which dma mapping implementation is actually used in the system. So this makes it work with swiotlb and intel-iommu dma_map_ops, too. And this also extends "cma=" kernel parameter to specify placement constraint by the physical address range of memory allocations. For example, CMA allocates memory below 4GB by "cma=64M@0-4G", it is required for the devices only supporting 32-bit addressing on 64-bit systems without iommu. This patch (of 5): Calling dma_alloc_coherent() with __GFP_ZERO must return zeroed memory. But when the contiguous memory allocator (CMA) is enabled on x86 and the memory region is allocated by dma_alloc_from_contiguous(), it doesn't return zeroed memory. Because dma_generic_alloc_coherent() forgot to fill the memory region with zero if it was allocated by dma_alloc_from_contiguous() Most implementations of dma_alloc_coherent() return zeroed memory regardless of whether __GFP_ZERO is specified. So this fixes it by unconditionally zeroing the allocated memory region. Alternatively, we could fix dma_alloc_from_contiguous() to return zeroed out memory and remove memset() from all caller of it. But we can't simply remove the memset on arm because __dma_clear_buffer() is used there for ensuring cache flushing and it is used in many places. Of course we can do redundant memset in dma_alloc_from_contiguous(), but I think this patch is less impact for fixing this problem. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Marek Szyprowski <m.szyprowski@samsung.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Don Dutile <ddutile@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yinghai Lu 提交于
On system with 2TiB ram, current x86_64 have 128M as section size, and one memory_block only include one section. So will have 16400 entries under /sys/devices/system/memory/. Current code try to use block id to find block pointer in /sys for any section, and reuse that block pointer. that finding will take some time even after commit 7c243c71 ("mm: speedup in __early_pfn_to_nid") that will skip the search in that case during booting up. So solution could be increase block size just like SGI UV system did. (harded code to 2g). This patch is trying to probe the block size to make it match mmio remap size. for example, Intel Nehalem later system will have memory range [0, TOML), [4g, TOMH]. If the memory hole is 2g and total is 128g, TOM will be 2g, and TOM2 will be 130g. We could use 2g as block size instead of default 128M. That will reduce number of entries in /sys/devices/system/memory/ On system 6TiB system will reduce boot time by 35 seconds. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
_PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults on x86. Care is taken such that _PAGE_NUMA is used only in situations where the VMA flags distinguish between NUMA hinting faults and prot_none faults. This decision was x86-specific and conceptually it is difficult requiring special casing to distinguish between PROTNONE and NUMA ptes based on context. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults as if the PTE is not present then a fault will be trapped. Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This patch shrinks the maximum possible swap size and uses the bit to uniquely distinguish between NUMA hinting ptes and swap ptes. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
32-bit support for NUMA is an oddity on its own but with automatic NUMA balancing on top there is a reasonable risk that the CPUPID information cannot be stored in the page flags. This patch removes support for automatic NUMA support on 32-bit x86. Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Naoya Horiguchi 提交于
Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: NMichael Ellerman <mpe@ellerman.id.au> Tested-by: NMichael Ellerman <mpe@ellerman.id.au> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2014 1 次提交
-
-
由 Yinghai Lu 提交于
check_irq_vectors_for_cpu_disable() can overestimate the number of available interrupt vectors, so the check for cpu down succeeds, but the actual cpu removal fails. It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong because the systems vectors are not taken into account. Limit the search to first_system_vector instead of NR_VECTORS. The second indicator for vector availability the used_vectors bitmap is not taken into account at all. So system vectors, e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20), are accounted as available. Add a check for the used_vectors bitmap and do not account vectors which are marked there. [ tglx: Simplified code. Rewrote changelog and code comments. ] Signed-off-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NPrarit Bhargava <prarit@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 02 6月, 2014 1 次提交
-
-
由 Dave Young 提交于
For ioremapped efi memory aka old_map the virt addresses are not persistant across kexec reboot. kexec-tools will read the runtime maps from sysfs then pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause kexec boot failure. To address this issue do not export runtime maps in case efi old_map so userspace can use no efi boot instead. Signed-off-by: NDave Young <dyoung@redhat.com> Acked-by: NBorislav Petkov <bp@suse.de> Acked-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
-
- 31 5月, 2014 6 次提交
-
-
由 H. Peter Anvin 提交于
Make it a little clearer what the littleendian access macros in vdso2c.[ch] actually do. This way they can probably also be moved to a central location (e.g. tools/include) for the benefit of other host tools. We should avoid implementation namespace symbols when writing code that is compiling for the compiler host, so avoid names starting with double underscore or underscore-capital. Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2cf258df123cb24bad63c274c8563c050547d99d.1401464755.git.luto@amacapital.net
-
由 Andy Lutomirski 提交于
This adds a macro GET(x) to convert x from big-endian to little-endian. Hopefully I put it everywhere it needs to go and got all the cases needed for everyone's linux/elf.h. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2cf258df123cb24bad63c274c8563c050547d99d.1401464755.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Andy Lutomirski 提交于
This avoids bizarre failures if make is run again. Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/1764385fe9931e8940b9d001132515448ea89523.1401464755.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
-
由 Borislav Petkov 提交于
There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnicSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Mathieu Souchaud 提交于
Check return code of every function called by mcheck_init_device(). Signed-off-by: NMathieu Souchaud <mattieu.souchaud@free.fr> Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.frSigned-off-by: NBorislav Petkov <bp@suse.de>
-
由 Minchan Kim 提交于
While I play inhouse patches with much memory pressure on qemu-kvm, 3.14 kernel was randomly crashed. The reason was kernel stack overflow. When I investigated the problem, the callstack was a little bit deeper by involve with reclaim functions but not direct reclaim path. I tried to diet stack size of some functions related with alloc/reclaim so did a hundred of byte but overflow was't disappeard so that I encounter overflow by another deeper callstack on reclaim/allocator path. Of course, we might sweep every sites we have found for reducing stack usage but I'm not sure how long it saves the world(surely, lots of developer start to add nice features which will use stack agains) and if we consider another more complex feature in I/O layer and/or reclaim path, it might be better to increase stack size( meanwhile, stack usage on 64bit machine was doubled compared to 32bit while it have sticked to 8K. Hmm, it's not a fair to me and arm64 already expaned to 16K. ) So, my stupid idea is just let's expand stack size and keep an eye toward stack consumption on each kernel functions via stacktrace of ftrace. For example, we can have a bar like that each funcion shouldn't exceed 200K and emit the warning when some function consumes more in runtime. Of course, it could make false positive but at least, it could make a chance to think over it. I guess this topic was discussed several time so there might be strong reason not to increase kernel stack size on x86_64, for me not knowing so Ccing x86_64 maintainers, other MM guys and virtio maintainers. Here's an example call trace using up the kernel stack: Depth Size Location (51 entries) ----- ---- -------- 0) 7696 16 lookup_address 1) 7680 16 _lookup_address_cpa.isra.3 2) 7664 24 __change_page_attr_set_clr 3) 7640 392 kernel_map_pages 4) 7248 256 get_page_from_freelist 5) 6992 352 __alloc_pages_nodemask 6) 6640 8 alloc_pages_current 7) 6632 168 new_slab 8) 6464 8 __slab_alloc 9) 6456 80 __kmalloc 10) 6376 376 vring_add_indirect 11) 6000 144 virtqueue_add_sgs 12) 5856 288 __virtblk_add_req 13) 5568 96 virtio_queue_rq 14) 5472 128 __blk_mq_run_hw_queue 15) 5344 16 blk_mq_run_hw_queue 16) 5328 96 blk_mq_insert_requests 17) 5232 112 blk_mq_flush_plug_list 18) 5120 112 blk_flush_plug_list 19) 5008 64 io_schedule_timeout 20) 4944 128 mempool_alloc 21) 4816 96 bio_alloc_bioset 22) 4720 48 get_swap_bio 23) 4672 160 __swap_writepage 24) 4512 32 swap_writepage 25) 4480 320 shrink_page_list 26) 4160 208 shrink_inactive_list 27) 3952 304 shrink_lruvec 28) 3648 80 shrink_zone 29) 3568 128 do_try_to_free_pages 30) 3440 208 try_to_free_pages 31) 3232 352 __alloc_pages_nodemask 32) 2880 8 alloc_pages_current 33) 2872 200 __page_cache_alloc 34) 2672 80 find_or_create_page 35) 2592 80 ext4_mb_load_buddy 36) 2512 176 ext4_mb_regular_allocator 37) 2336 128 ext4_mb_new_blocks 38) 2208 256 ext4_ext_map_blocks 39) 1952 160 ext4_map_blocks 40) 1792 384 ext4_writepages 41) 1408 16 do_writepages 42) 1392 96 __writeback_single_inode 43) 1296 176 writeback_sb_inodes 44) 1120 80 __writeback_inodes_wb 45) 1040 160 wb_writeback 46) 880 208 bdi_writeback_workfn 47) 672 144 process_one_work 48) 528 112 worker_thread 49) 416 240 kthread 50) 176 176 ret_from_fork [ Note: the problem is exacerbated by certain gcc versions that seem to generate much bigger stack frames due to apparently bad coalescing of temporaries and generating too many spills. Rusty saw gcc-4.6.4 using 35% more stack on the virtio path than 4.8.2 does, for example. Minchan not only uses such a bad gcc version (4.6.3 in his case), but some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is what causes that kernel_map_pages() frame, for example). But we're clearly getting too close. The VM code also seems to have excessive stack frames partly for the same compiler reason, triggered by excessive inlining and lots of function arguments. We need to improve on our stack use, but in the meantime let's do this simple stack increase too. Unlike most earlier reports, there is nothing simple that stands out as being really horribly wrong here, apart from the fact that the stack frames are just bigger than they should need to be. - Linus ] Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S Tsirkin <mst@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 5月, 2014 1 次提交
-
-
由 Greg Kroah-Hartman 提交于
Now that CONFIG_USB_DEBUG is gone, remove it from a number of defconfig files that were enabling it. Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 28 5月, 2014 5 次提交
-
-
由 Hanjun Guo 提交于
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA is not used by some architectures. Make pcibios_penalize_isa_irq() a __weak function to simplify the code. This removes the need for new platforms to add stub implementations of pcibios_penalize_isa_irq(). [bhelgaas: changelog, comments] Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NArnd Bergmann <arnd@arndb.de>
-
由 Yijing Wang 提交于
Use pci_is_bridge() to simplify code. No functional change. Requires: 326c1cda PCI: Rename pci_is_bridge() to pci_has_subordinate() Requires: 1c86438c PCI: Add new pci_is_bridge() interface Signed-off-by: NYijing Wang <wangyijing@huawei.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Suravee Suthikulpanit 提交于
early_root_info_init() is now deprecated in favor of info in ACPI. Add a note to that effect. Also, clean up the code a bit. There is no functional change. Signed-off-by: NSuravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
由 Lv Zheng 提交于
Since mis-order issues have been solved, we can cleanup redundant definitions that already have defaults in <acpi/platform/acenv.h>. This patch removes redudant environments for __KERNEL__ surrounded code. Signed-off-by: NLv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Lv Zheng 提交于
There is a mis-order inclusion for <asm/acpi.h>. As we will enforce including <linux/acpi.h> for all Linux ACPI users, we can find the inclusion order is as follows: <linux/acpi.h> <acpi/acpi.h> <acpi/platform/acenv.h> (acenv.h before including aclinux.h) <acpi/platform/aclinux.h> ........................................................................... (aclinux.h before including asm/acpi.h) <asm/acpi.h> @Redundant@ (ACPICA specific stuff) ........................................................................... ........................................................................... (Linux ACPI specific stuff) ? - - - - - - - - - - - - + (aclinux.h after including asm/acpi.h) @Invisible@ | (acenv.h after including aclinux.h) @Invisible@ | other ACPICA headers @Invisible@ | ............................................................|.............. <acpi/acpi_bus.h> | <acpi/acpi_drivers.h> | <asm/acpi.h> (Excluded) | (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - + NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig generated <generated/autoconf.h> for Linux, it is meant to be included before including any ACPICA code. In the above figure, there is a question mark for "Linux ACPI specific stuff" in <asm/acpi.h> which should be included after including all other ACPICA header files. Thus they really need to be moved to the position marked with exclaimation mark or the definitions in the blocks marked with "@Invisible@" will be invisible to such architecture specific "Linux ACPI specific stuff" header blocks. This leaves 2 issues: 1. All environmental definitions in these blocks should have a copy in the area marked with "@Redundant@" if they are required by the "Linux ACPI specific stuff". 2. We cannot use any ACPICA defined types in <asm/acpi.h>. This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to fix this issue. Signed-off-by: NLv Zheng <lv.zheng@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 27 5月, 2014 1 次提交
-
-
由 Mukesh Rathor 提交于
When running as a dom0 in PVH mode, foreign pfns that are accessed must be added to our p2m which is managed by xen. This is done via XENMEM_add_to_physmap_range hypercall. This is needed for toolstack building guests and mapping guest memory, xentrace mapping xen pages, etc. Signed-off-by: NMukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
-