- 01 7月, 2015 6 次提交
-
-
由 Josh Triplett 提交于
For 32-bit userspace on a 64-bit kernel, this requires modifying stub32_clone to actually swap the appropriate arguments to match CONFIG_CLONE_BACKWARDS, rather than just leaving the C argument for tls broken. Patch co-authored by Josh Triplett and Thiago Macieira. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masanari Iida 提交于
Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Acked-by: NVineet Gupta <vgupta@synopsys.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KarimAllah Ahmed 提交于
Any parameter passed after '--' in the kernel command-line will not be parsed by the kernel at all, instead it will be passed directly to init process. Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from kexec load, and if this command-line is used to pass parameters to init process this means that 'elfcorehdr' will not be parsed as a kernel parameter at all which will be a problem for vmcore subsystem since it will know nothing about the location of the ELF structure! Prepending 'elfcorehdr' instead of appending it fixes this problem since it ensures that it always comes before '--' and so it's always parsed as a kernel command-line parameter. Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was also used to embedd a command-line to the crash dump kernel and this command-line contains '--' since the current behavior of the kernel is to actually append the boot loader command-line to the embedded command-line. Signed-off-by: NKarimAllah Ahmed <karahmed@amazon.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: Haren Myneni <hbabu@us.ibm.com> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Subject says it all. Other architectures may enable on a case-by-case basis after auditing early_pfn_to_nid and testing. Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
__early_pfn_to_nid() use static variables to cache recent lookups as memblock lookups are very expensive but it assumes that memory initialisation is single-threaded. Parallel initialisation of struct pages will break that assumption so this patch makes __early_pfn_to_nid() SMP-safe by requiring the caller to cache recent search information. early_pfn_to_nid() keeps the same interface but is only safe to use early in boot due to the use of a global static variable. meminit_pfn_in_nid() is an SMP-safe version that callers must maintain their own state for. Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 6月, 2015 15 次提交
-
-
由 Ross Zwisler 提交于
Based on an original patch by Ross Zwisler [1]. Writes to persistent memory have the potential to be posted to cpu cache, cpu write buffers, and platform write buffers (memory controller) before being committed to persistent media. Provide apis, memcpy_to_pmem(), wmb_pmem(), and memremap_pmem(), to write data to pmem and assert that it is durable in PMEM (a persistent linear address range). A '__pmem' attribute is added so sparse can track proper usage of pointers to pmem. This continues the status quo of pmem being x86 only for 4.2, but reworks to ioremap, and wider implementation of memremap() will enable other archs in 4.3. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-May/000932.html Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> [djbw: various reworks] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Toshi Kani 提交于
ACPI NFIT table has System Physical Address Range Structure entries that describe a proximity ID of each range when ACPI_NFIT_PROXIMITY_VALID is set in the flags. Change acpi_nfit_register_region() to map a proximity ID to its node ID, and set it to a new numa_node field of nd_region_desc, which is then conveyed to the nd_region device. The device core arranges for btt and namespace devices to inherit their node from their parent region. Signed-off-by: NToshi Kani <toshi.kani@hp.com> [djbw: move set_dev_node() from region.c to bus.c] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Josh Triplett 提交于
clone has some of the quirkiest syscall handling in the kernel, with a pile of special cases, historical curiosities, and architecture-specific calling conventions. In particular, clone with CLONE_SETTLS accepts a parameter "tls" that the C entry point completely ignores and some assembly entry points overwrite; instead, the low-level arch-specific code pulls the tls parameter out of the arch-specific register captured as part of pt_regs on entry to the kernel. That's a massive hack, and it makes the arch-specific code only work when called via the specific existing syscall entry points; because of this hack, any new clone-like system call would have to accept an identical tls argument in exactly the same arch-specific position, rather than providing a unified system call entry point across architectures. The first patch allows architectures to handle the tls argument via normal C parameter passing, if they opt in by selecting HAVE_COPY_THREAD_TLS. The second patch makes 32-bit and 64-bit x86 opt into this. These two patches came out of the clone4 series, which isn't ready for this merge window, but these first two cleanup patches were entirely uncontroversial and have acks. I'd like to go ahead and submit these two so that other architectures can begin building on top of this and opting into HAVE_COPY_THREAD_TLS. However, I'm also happy to wait and send these through the next merge window (along with v3 of clone4) if anyone would prefer that. This patch (of 2): clone with CLONE_SETTLS accepts an argument to set the thread-local storage area for the new thread. sys_clone declares an int argument tls_val in the appropriate point in the argument list (based on the various CLONE_BACKWARDS variants), but doesn't actually use or pass along that argument. Instead, sys_clone calls do_fork, which calls copy_process, which calls the arch-specific copy_thread, and copy_thread pulls the corresponding syscall argument out of the pt_regs captured at kernel entry (knowing what argument of clone that architecture passes tls in). Apart from being awful and inscrutable, that also only works because only one code path into copy_thread can pass the CLONE_SETTLS flag, and that code path comes from sys_clone with its architecture-specific argument-passing order. This prevents introducing a new version of the clone system call without propagating the same architecture-specific position of the tls argument. However, there's no reason to pull the argument out of pt_regs when sys_clone could just pass it down via C function call arguments. Introduce a new CONFIG_HAVE_COPY_THREAD_TLS for architectures to opt into, and a new copy_thread_tls that accepts the tls parameter as an additional unsigned long (syscall-argument-sized) argument. Change sys_clone's tls argument to an unsigned long (which does not change the ABI), and pass that down to copy_thread_tls. Architectures that don't opt into copy_thread_tls will continue to ignore the C argument to sys_clone in favor of the pt_regs captured at kernel entry, and thus will be unable to introduce new versions of the clone syscall. Patch co-authored by Josh Triplett and Thiago Macieira. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thiago Macieira <thiago.macieira@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since avr32 doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Acked-by: NHans-Christian Egtvedt <egtvedt@samfundet.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since frv doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tobias Klauser 提交于
The function is not used anywhere in the tree (anymore) and this is the last remaining instance, so remove it. Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Cc: David Howells <dhowells@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dominik Dingel 提交于
With making HPAGE_SHIFT an unsigned integer we also accidentally changed pageblock_order. In order to avoid compiler warnings we make HPAGE_SHFIT an int again. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dominik Dingel 提交于
We already do the check in pmd_large, so we can just forward the call. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dominik Dingel 提交于
We now support only hugepages on hardware with EDAT1 support. So we remove the prepare/release_hugepage hooks and simplify set_huge_pte_at and huge_ptep_get. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dominik Dingel 提交于
Nobody used these hooks so they were removed from common code, and can now be removed from the architectures. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Acked-by: NRalf Baechle <ralf@linux-mips.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dominik Dingel 提交于
There is a potential bug with KVM and hugetlbfs if the hardware does not support hugepages (EDAT1). We fix this by making EDAT1 a hard requirement for hugepages and therefore removing and simplifying code. As s390, with the sw-emulated hugepages, was the only user of arch_prepare/release_hugepage I also removed theses calls from common and other architecture code. This patch (of 5): By dropping support for hugepages on machines which do not have the hardware feature EDAT1, we fix a potential s390 KVM bug. The bug would happen if a guest is backed by hugetlbfs (not supported currently), but does not get pagetables with PGSTE. This would lead to random memory overwrites. Signed-off-by: NDominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Richard Weinberger 提交于
Don't include ptrace uapi stuff in arch headers, it will pollute the kernel namespace and conflict with existing stuff. In this case it fixes clashes with common names like R8. Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Hans-Werner Hilse 提交于
The functions in question are not part of the POSIX standard, documentation however hints that the corresponding header shall be sys/types.h. C libraries other than glibc, namely musl, did not include that header via other ways and complained. Signed-off-by: NHans-Werner Hilse <hwhilse@gmail.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Hans-Werner Hilse 提交于
stdin, stdout and stderr are macros according to C89/C99. Thus do not use them as struct member identifiers to avoid bad results from macro expansion. Signed-off-by: NHans-Werner Hilse <hwhilse@gmail.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Hans-Werner Hilse 提交于
__ptr_t type is a glibc-specific type, while the generally documented type is a void*. That's what other C libraries use, too. Signed-off-by: NHans-Werner Hilse <hwhilse@gmail.com> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
- 25 6月, 2015 19 次提交
-
-
由 David Ahern 提交于
perf walks userspace callchains by following frame pointers. Use the UREG_FP macro to make it clearer that the %fp is being used. Signed-off-by: NDavid Ahern <david.ahern@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Processes are getting killed (sigbus or segv) while walking userspace callchains when using perf. In some instances I have seen ufp = 0x7ff which does not seem like a proper stack address. This patch adds a function to run validity checks against the address before attempting the copy_from_user. The checks are copied from the x86 version as a start point with the addition of a 4-byte alignment check. Signed-off-by: NDavid Ahern <david.ahern@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Pagefault handling has a BUG_ON path that panics the system. Convert it to a warning instead. There is no need to bring down the system for this kind of failure. The following was hit while running: perf sched record -g -- make -j 16 [3609412.782801] kernel BUG at /opt/dahern/linux.git/arch/sparc/mm/fault_64.c:416! [3609412.782833] \|/ ____ \|/ [3609412.782833] "@'/ .. \`@" [3609412.782833] /_| \__/ |_\ [3609412.782833] \__U_/ [3609412.782870] cat(4516): Kernel bad sw trap 5 [#1] [3609412.782889] CPU: 0 PID: 4516 Comm: cat Tainted: G E 4.1.0-rc8+ #6 [3609412.782909] task: fff8000126e31f80 ti: fff8000110d90000 task.ti: fff8000110d90000 [3609412.782931] TSTATE: 0000004411001603 TPC: 000000000096b164 TNPC: 000000000096b168 Y: 0000004e Tainted: G E [3609412.782964] TPC: <do_sparc64_fault+0x5e4/0x6a0> [3609412.782979] g0: 000000000096abe0 g1: 0000000000d314c4 g2: 0000000000000000 g3: 0000000000000001 [3609412.783009] g4: fff8000126e31f80 g5: fff80001302d2000 g6: fff8000110d90000 g7: 00000000000000ff [3609412.783045] o0: 0000000000aff6a8 o1: 00000000000001a0 o2: 0000000000000001 o3: 0000000000000054 [3609412.783080] o4: fff8000100026820 o5: 0000000000000001 sp: fff8000110d935f1 ret_pc: 000000000096b15c [3609412.783117] RPC: <do_sparc64_fault+0x5dc/0x6a0> [3609412.783137] l0: 000007feff996000 l1: 0000000000030001 l2: 0000000000000004 l3: fff8000127bd0120 [3609412.783174] l4: 0000000000000054 l5: fff8000127bd0188 l6: 0000000000000000 l7: fff8000110d9dba8 [3609412.783210] i0: fff8000110d93f60 i1: fff8000110ca5530 i2: 000000000000003f i3: 0000000000000054 [3609412.783244] i4: fff800010000081a i5: fff8000100000398 i6: fff8000110d936a1 i7: 0000000000407c6c [3609412.783286] I7: <sparc64_realfault_common+0x10/0x20> [3609412.783308] Call Trace: [3609412.783329] [0000000000407c6c] sparc64_realfault_common+0x10/0x20 [3609412.783353] Disabling lock debugging due to kernel taint [3609412.783379] Caller[0000000000407c6c]: sparc64_realfault_common+0x10/0x20 [3609412.783449] Caller[fff80001002283e4]: 0xfff80001002283e4 [3609412.783471] Instruction DUMP: 921021a0 7feaff91 901222a8 <91d02005> 82086100 02f87f7b 808a2873 81cfe008 01000000 [3609412.783542] Kernel panic - not syncing: Fatal exception [3609412.784605] Press Stop-A (L1-A) to return to the boot prom [3609412.784615] ---[ end Kernel panic - not syncing: Fatal exception With this patch rather than a panic I occasionally get something like this: perf sched record -g -m 1024 -- make -j N where N is based on number of cpus (128 to 1024 for a T7-4 and 8 for an 8 cpu VM on a T5-2). WARNING: CPU: 211 PID: 52565 at /opt/dahern/linux.git/arch/sparc/mm/fault_64.c:417 do_sparc64_fault+0x340/0x70c() address (7feffcd6000) != regs->tpc (fff80001004873c0) Modules linked in: ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 xt_tcpudp nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables x_tables ipv6 cdc_ether usbnet mii ixgbe mdio igb i2c_algo_bit i2c_core ptp crc32c_sparc64 camellia_sparc64 des_sparc64 des_generic md5_sparc64 sha512_sparc64 sha1_sparc64 uio_pdrv_genirq uio usb_storage mpt3sas scsi_transport_sas raid_class aes_sparc64 sunvnet sunvdc sha256_sparc64(E) sha256_generic(E) CPU: 211 PID: 52565 Comm: ld Tainted: G W E 4.1.0-rc8+ #19 Call Trace: [000000000045ce30] warn_slowpath_common+0x7c/0xa0 [000000000045ceec] warn_slowpath_fmt+0x30/0x40 [000000000098ad64] do_sparc64_fault+0x340/0x70c [0000000000407c2c] sparc64_realfault_common+0x10/0x20 ---[ end trace 62ee02065a01a049 ]--- ld[52565]: segfault at fff80001004873c0 ip fff80001004873c0 (rpc fff8000100158868) sp 000007feffcd70e1 error 30002 in libc-2.12.so[fff8000100410000+184000] The segfault is horrible, but better than a system panic. An 8-cpu VM on a T5-2 also showed the above traces from time to time, so it is a general problem and not specific to the T7 or baremetal. Signed-off-by: NDavid Ahern <david.ahern@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Page faults generated walking userspace stacks can call schedule to switch out the task. When collecting callchains for scheduler tracepoints this causes a deadlock as the tracepoints can be hit with the runqueue lock held: [ 8138.159054] WARNING: CPU: 758 PID: 12488 at /opt/dahern/linux.git/arch/sparc/kernel/nmi.c:80 perfctr_irq+0x1f8/0x2b4() [ 8138.203152] Watchdog detected hard LOCKUP on cpu 758 [ 8138.410969] CPU: 758 PID: 12488 Comm: perf Not tainted 4.0.0-rc6+ #6 [ 8138.437146] Call Trace: [ 8138.447193] [000000000045cdd4] warn_slowpath_common+0x7c/0xa0 [ 8138.471238] [000000000045ce90] warn_slowpath_fmt+0x30/0x40 [ 8138.494189] [0000000000983e38] perfctr_irq+0x1f8/0x2b4 [ 8138.515716] [00000000004209f4] tl0_irq15+0x14/0x20 [ 8138.535791] [00000000009839ec] _raw_spin_trylock_bh+0x68/0x108 [ 8138.560180] [0000000000980018] __schedule+0xcc/0x710 [ 8138.580981] [00000000009806dc] preempt_schedule_common+0x10/0x3c [ 8138.606082] [000000000098077c] _cond_resched+0x34/0x44 [ 8138.627603] [0000000000565990] kmem_cache_alloc_node+0x24/0x1a0 [ 8138.652345] [0000000000450b60] tsb_grow+0xac/0x488 [ 8138.672429] [0000000000985040] do_sparc64_fault+0x4dc/0x6e4 [ 8138.695736] [0000000000407c2c] sparc64_realfault_common+0x10/0x20 [ 8138.721202] [00000000006f2e24] NG4copy_from_user+0xa4/0x3c0 [ 8138.744510] [000000000044f900] perf_callchain_user+0x5c/0x6c [ 8138.768182] [0000000000517b5c] perf_callchain+0x16c/0x19c [ 8138.790774] [0000000000515f84] perf_prepare_sample+0x68/0x218 [ 8138.814801] ---[ end trace 42ca6294b1ff7573 ]--- As with PowerPC (b59a1bfc, "powerpc/perf: Disable pagefaults during callchain stack read") disable pagefaults while walking userspace stacks. Signed-off-by: NDavid Ahern <david.ahern@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dan Williams 提交于
nd_pmem attaches to persistent memory regions and namespaces emitted by the libnvdimm subsystem, and, same as the original pmem driver, presents the system-physical-address range as a block device. The existing e820-type-12 to pmem setup is converted to an nvdimm_bus that emits an nd_namespace_io device. Note that the X in 'pmemX' is now derived from the parent region. This provides some stability to the pmem devices names from boot-to-boot. The minor numbers are also more predictable by passing 0 to alloc_disk(). Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boaz Harrosh <boaz@plexistor.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jens Axboe <axboe@fb.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Acked-by: NChristoph Hellwig <hch@lst.de> Tested-by: NToshi Kani <toshi.kani@hp.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Tony Luck 提交于
UEFI GetMemoryMap() uses a new attribute bit to mark mirrored memory address ranges. See UEFI 2.5 spec pages 157-158: http://www.uefi.org/sites/default/files/resources/UEFI%202_5.pdf On EFI enabled systems scan the memory map and tell memblock about any mirrored ranges. Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tony Luck 提交于
Some high end Intel Xeon systems report uncorrectable memory errors as a recoverable machine check. Linux has included code for some time to process these and just signal the affected processes (or even recover completely if the error was in a read only page that can be replaced by reading from disk). But we have no recovery path for errors encountered during kernel code execution. Except for some very specific cases were are unlikely to ever be able to recover. Enter memory mirroring. Actually 3rd generation of memory mirroing. Gen1: All memory is mirrored Pro: No s/w enabling - h/w just gets good data from other side of the mirror Con: Halves effective memory capacity available to OS/applications Gen2: Partial memory mirror - just mirror memory begind some memory controllers Pro: Keep more of the capacity Con: Nightmare to enable. Have to choose between allocating from mirrored memory for safety vs. NUMA local memory for performance Gen3: Address range partial memory mirror - some mirror on each memory controller Pro: Can tune the amount of mirror and keep NUMA performance Con: I have to write memory management code to implement The current plan is just to use mirrored memory for kernel allocations. This has been broken into two phases: 1) This patch series - find the mirrored memory, use it for boot time allocations 2) Wade into mm/page_alloc.c and define a ZONE_MIRROR to pick up the unused mirrored memory from mm/memblock.c and only give it out to select kernel allocations (this is still being scoped because page_alloc.c is scary). This patch (of 3): Add extra "flags" to memblock to allow selection of memory based on attribute. No functional changes Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Xiexiuqi <xiexiuqi@huawei.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
We have confusing functions to clear pmd, pmd_clear_* and pmd_clear. Add _huge_ to pmdp_clear functions so that we are clear that they operate on hugepage pte. We don't bother about other functions like pmdp_set_wrprotect, pmdp_clear_flush_young, because they operate on PTE bits and hence indicate they are operating on hugepage ptes Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Also move the pmd_trans_huge check to generic code. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Aneesh Kumar K.V 提交于
Architectures like ppc64 [1] need to do special things while clearing pmd before a collapse. For them this operation is largely different from a normal hugepage pte clear. Hence add a separate function to clear pmd before collapse. After this patch pmdp_* functions operate only on hugepage pte, and not on regular pmd_t values pointing to page table. [1] ppc64 needs to invalidate all the normal page pte mappings we already have inserted in the hardware hash page table. But before doing that we need to make sure there are no parallel hash page table insert going on. So we need to do a kick_all_cpus_sync() before flushing the older hash table entries. By moving this to a separate function we capture these details and mention how it is different from a hugepage pte clear. This patch is a cleanup and only does code movement for clarity. There should not be any change in functionality. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Zhen 提交于
Currently we have many duplicates in definitions of hugetlb_prefault_arch_hook. In all architectures this function is empty. Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Laurent Dufour 提交于
Some processes (CRIU) are moving the vDSO area using the mremap system call. As a consequence the kernel reference to the vDSO base address is no more valid and the signal return frame built once the vDSO has been moved is not pointing to the new sigreturn address. This patch handles vDSO remapping and unmapping. Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Laurent Dufour 提交于
CRIU is recreating the process memory layout by remapping the checkpointee memory area on top of the current process (criu). This includes remapping the vDSO to the place it has at checkpoint time. However some architectures like powerpc are keeping a reference to the vDSO base address to build the signal return stack frame by calling the vDSO sigreturn service. So once the vDSO has been moved, this reference is no more valid and the signal frame built later are not usable. This patch serie is introducing a new mm hook framework, and a new arch_remap hook which is called when mremap is done and the mm lock still hold. The next patch is adding the vDSO remap and unmap tracking to the powerpc architecture. This patch (of 3): This patch introduces a new set of header file to manage mm hooks: - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h) - a generic header (include/linux/mm-arch-hooks.h) The architecture which need to overwrite a hook as to redefine it in its header file, while architecture which doesn't need have nothing to do. The default hooks are defined in the generic header and are used in the case the architecture is not defining it. In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should be moved here. Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zhang Zhen 提交于
Currently we have many duplicates in definitions of huge_pmd_unshare. In all architectures this function just returns 0 when CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N. This patch puts the default implementation in mm/hugetlb.c and lets these architectures use the common code. Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Rientjes <rientjes@google.com> Cc: James Yang <James.Yang@freescale.com> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since xtensa doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since sparc does select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order to loop over each sg element. This also help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since parisc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since powerpc does select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order to loop over each sg element. This also help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since metag doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Cc: James Hogan <james.hogan@imgtec.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-