- 03 11月, 2011 2 次提交
-
-
由 Andrea Arcangeli 提交于
This avoids duplicating the function in every arch gup_fast. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
Michel while working on the working set estimation code, noticed that calling get_page_unless_zero() on a random pfn_to_page(random_pfn) wasn't safe, if the pfn ended up being a tail page of a transparent hugepage under splitting by __split_huge_page_refcount(). He then found the problem could also theoretically materialize with page_cache_get_speculative() during the speculative radix tree lookups that uses get_page_unless_zero() in SMP if the radix tree page is freed and reallocated and get_user_pages is called on it before page_cache_get_speculative has a chance to call get_page_unless_zero(). So the best way to fix the problem is to keep page_tail->_count zero at all times. This will guarantee that get_page_unless_zero() can never succeed on any tail page. page_tail->_mapcount is guaranteed zero and is unused for all tail pages of a compound page, so we can simply account the tail page references there and transfer them to tail_page->_count in __split_huge_page_refcount() (in addition to the head_page->_mapcount). While debugging this s/_count/_mapcount/ change I also noticed get_page is called by direct-io.c on pages returned by get_user_pages. That wasn't entirely safe because the two atomic_inc in get_page weren't atomic. As opposed to other get_user_page users like secondary-MMU page fault to establish the shadow pagetables would never call any superflous get_page after get_user_page returns. It's safer to make get_page universally safe for tail pages and to use get_page_foll() within follow_page (inside get_user_pages()). get_page_foll() is safe to do the refcounting for tail pages without taking any locks because it is run within PT lock protected critical sections (PT lock for pte and page_table_lock for pmd_trans_huge). The standard get_page() as invoked by direct-io instead will now take the compound_lock but still only for tail pages. The direct-io paths are usually I/O bound and the compound_lock is per THP so very finegrined, so there's no risk of scalability issues with it. A simple direct-io benchmarks with all lockdep prove locking and spinlock debugging infrastructure enabled shows identical performance and no overhead. So it's worth it. Ideally direct-io should stop calling get_page() on pages returned by get_user_pages(). The spinlock in get_page() is already optimized away for no-THP builds but doing get_page() on tail pages returned by GUP is generally a rare operation and usually only run in I/O paths. This new refcounting on page_tail->_mapcount in addition to avoiding new RCU critical sections will also allow the working set estimation code to work without any further complexity associated to the tail page refcounting with THP. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NMichel Lespinasse <walken@google.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: <stable@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 11月, 2011 24 次提交
-
-
由 Dave Jones 提交于
kmalloc size is 1st arg, not second. Signed-off-by: NDave Jones <davej@redhat.com> Signed-off-by: NRichard Weinberger <richard@nod.at> Cc: <stable@kernel.org> # 3.0.x [richard@nod.at: on 3.0 the to be patched file is arch/um/sys-x86_64/vdso/vma.c]
-
由 Richard Weinberger 提交于
Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Richard Weinberger 提交于
Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
most of it belonged in irqflags.h, actually Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
... and switch get_thread_register() to HOST_... for register numbers Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
a) exports in gmon_syms.c duplicate kernel/gcov/* ones b) excluding -pg in vdso compile is not enough - -fprofile-arcs and -ftest-coverage also needs to be excluded Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
... so take it to arch/um/x86/asm. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
it's x86-only and we have no business playing with it in asm/mmu.h; make the latter have struct uml_arch_mm_context arch; instead of struct uml_ldt ldt; and let arch/<subarch>/um/asm/mm_context.h decide what'll be in there. While we are at it, kill host_ldt.h - it's not needed in part of places that include it (we want asm/ldt.h in those) and it can be trivially expanded into the single remaining one. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
analog of [PATCH] i386: let usermode execute the "enter" instruction from circa 2006. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
it's i386-specific; moreover, analogs on other targets have incompatible interface - PTRACE_GET_THREAD_AREA does exist elsewhere, but struct user_desc does *not* Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
its only purpose is to shadow the x86 asm/desc.h Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NRichard Weinberger <richard@nod.at>
-
- 01 11月, 2011 3 次提交
-
-
由 Borislav Petkov 提交于
Remove edac_mce pieces and use the normal MCE decoder notifier chain by retaining the same functionality with considerably less code. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
由 Christopher Yeoh 提交于
The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. The following patch attempts to achieve this by allowing a destination process, given an address and size from a source process, to copy memory directly from the source process into its own address space via a system call. There is also a symmetrical ability to copy from the current process's address space into a destination process's address space. - Use of /proc/pid/mem has been considered, but there are issues with using it: - Does not allow for specifying iovecs for both src and dest, assuming preadv or pwritev was implemented either the area read from or written to would need to be contiguous. - Currently mem_read allows only processes who are currently ptrace'ing the target and are still able to ptrace the target to read from the target. This check could possibly be moved to the open call, but its not clear exactly what race this restriction is stopping (reason appears to have been lost) - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix domain socket is a bit ugly from a userspace point of view, especially when you may have hundreds if not (eventually) thousands of processes that all need to do this with each other - Doesn't allow for some future use of the interface we would like to consider adding in the future (see below) - Interestingly reading from /proc/pid/mem currently actually involves two copies! (But this could be fixed pretty easily) As mentioned previously use of vmsplice instead was considered, but has problems. Since you need the reader and writer working co-operatively if the pipe is not drained then you block. Which requires some wrapping to do non blocking on the send side or polling on the receive. In all to all communication it requires ordering otherwise you can deadlock. And in the example of many MPI tasks writing to one MPI task vmsplice serialises the copying. There are some cases of MPI collectives where even a single copy interface does not get us the performance gain we could. For example in an MPI_Reduce rather than copy the data from the source we would like to instead use it directly in a mathops (say the reduce is doing a sum) as this would save us doing a copy. We don't need to keep a copy of the data from the source. I haven't implemented this, but I think this interface could in the future do all this through the use of the flags - eg could specify the math operation and type and the kernel rather than just copying the data would apply the specified operation between the source and destination and store it in the destination. Although we don't have a "second user" of the interface (though I've had some nibbles from people who may be interested in using it for intra process messaging which is not MPI). This interface is something which hardware vendors are already doing for their custom drivers to implement fast local communication. And so in addition to this being useful for OpenMPI it would mean the driver maintainers don't have to fix things up when the mm changes. There was some discussion about how much faster a true zero copy would go. Here's a link back to the email with some testing I did on that: http://marc.info/?l=linux-mm&m=130105930902915&w=2 There is a basic man page for the proposed interface here: http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt This has been implemented for x86 and powerpc, other architecture should mainly (I think) just need to add syscall numbers for the process_vm_readv and process_vm_writev. There are 32 bit compatibility versions for 64-bit kernels. For arch maintainers there are some simple tests to be able to quickly verify that the syscalls are working correctly here: http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: <linux-man@vger.kernel.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Borislav Petkov 提交于
Drop the edac_mce custom hook in favor of the generic notifier mechanism. Also, do not log the error to mcelog if the notified agent was able to decode it. Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com> Acked-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
-
- 30 10月, 2011 1 次提交
-
-
由 Jan Kiszka 提交于
AMD processors apparently have a bug in the hardware task switching support when NPT is enabled. If the task switch triggers a NPF, we can get wrong EXITINTINFO along with that fault. On resume, spurious exceptions may then be injected into the guest. We were able to reproduce this bug when our guest triggered #SS and the handler were supposed to run over a separate task with not yet touched stack pages. Work around the issue by continuing to emulate task switches even in NPT mode. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
-
- 28 10月, 2011 1 次提交
-
-
由 Eric W. Biederman 提交于
This was found by inspection while tracking a similar bug in compat_statfs64, that has been fixed in mainline since decemeber. - This fixes a bug where not all of the f_spare fields were cleared on mips and s390. - Add the f_flags field to struct compat_statfs - Copy f_flags to userspace in case someone cares. - Use __clear_user to copy the f_spare field to userspace to ensure that all of the elements of f_spare are cleared. On some architectures f_spare is has 5 ints and on some architectures f_spare only has 4 ints. Which makes the previous technique of clearing each int individually broken. I don't expect anyone actually uses the old statfs system call anymore but if they do let them benefit from having the compat and the native version working the same. Signed-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 27 10月, 2011 1 次提交
-
-
由 Rusty Russell 提交于
Host might be running under KVM, but we shouldn't allow Guest to think it can use KVM hypercalls (it can't, and it will embarrass itself if it tries). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 25 10月, 2011 1 次提交
-
-
由 Josh Stone 提交于
When compiling an i386_defconfig kernel with gcc-4.6.1-9.fc15.i686, I noticed a warning about the asm operand for test_bit in kprobes' can_boost. I discovered that this caused only the first long of twobyte_is_boostable[] to be output. Jakub filed and fixed gcc PR50571 to correct the warning and this output issue. But to solve it for less current gcc, we can make kprobes' twobyte_is_boostable[] non-const, and it won't be optimized out. Before: CC arch/x86/kernel/kprobes.o In file included from include/linux/bitops.h:22:0, from include/linux/kernel.h:17, from [...]/arch/x86/include/asm/percpu.h:44, from [...]/arch/x86/include/asm/current.h:5, from [...]/arch/x86/include/asm/processor.h:15, from [...]/arch/x86/include/asm/atomic.h:6, from include/linux/atomic.h:4, from include/linux/mutex.h:18, from include/linux/notifier.h:13, from include/linux/kprobes.h:34, from arch/x86/kernel/kprobes.c:43: [...]/arch/x86/include/asm/bitops.h: In function ‘can_boost.part.1’: [...]/arch/x86/include/asm/bitops.h:319:2: warning: use of memory input without lvalue in asm operand 1 is deprecated [enabled by default] $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 00 00 00 00 bt %eax,0x0 554: R_386_32 .rodata.cst4 $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... Contents of section .rodata.cst4: 0000 4c030000 L... Only a single long of twobyte_is_boostable[] is in the object file. After, without the const on twobyte_is_boostable: $ objdump -rd arch/x86/kernel/kprobes.o | grep -A1 -w bt 551: 0f a3 05 20 00 00 00 bt %eax,0x20 554: R_386_32 .data $ objdump -s -j .rodata.cst4 -j .data arch/x86/kernel/kprobes.o arch/x86/kernel/kprobes.o: file format elf32-i386 Contents of section .data: 0000 48000000 00000000 00000000 00000000 H............... 0010 00000000 00000000 00000000 00000000 ................ 0020 4c030000 0f000200 ffff0000 ffcff0c0 L............... 0030 0000ffff 3bbbfff8 03ff2ebb 26bb2e77 ....;.......&..w Now all 32 bytes are output into .data instead. Signed-off-by: NJosh Stone <jistone@redhat.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 24 10月, 2011 3 次提交
-
-
由 Mika Westerberg 提交于
The MSIC MFD driver creates platform devices for MSIC device drivers so we don't need to create them in platform code anymore. This patch adds a new runtime check which determines whether we are running on a Medfield platform and enables the MSIC MFD driver accordingly. Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Alan Cox 提交于
Add a notifier so that drivers can hook into SCU availability in order to take actions post initialisation when/if the SCU becomes available. In the ideal world we wouldn't need this and we could avoid any init dependancies of this form, but in practice we can't do it for some cases. Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Takashi Iwai 提交于
Commit 4b239f45 ("x86-64, mm: Put early page table high") causes a S4 regression since 2.6.39, namely the machine reboots occasionally at S4 resume. It doesn't happen always, overall rate is about 1/20. But, like other bugs, once when this happens, it continues to happen. This patch fixes the problem by essentially reverting the memory assignment in the older way. Signed-off-by: NTakashi Iwai <tiwai@suse.de> Cc: <stable@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Yinghai Lu <yinghai.lu@oracle.com> [ We'll hopefully find the real fix, but that's too late for 3.1 now ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 10月, 2011 4 次提交
-
-
由 Joerg Roedel 提交于
With per-bus iommu_ops the iommu_found function needs to work on a bus_type too. This patch adds a bus_type parameter to that function and converts all call-places. The function is also renamed to iommu_present because the function now checks if an iommu is present for a given bus and does not check for a global iommu anymore. Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Jussi Kivilinna 提交于
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Jussi Kivilinna 提交于
Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Jussi Kivilinna 提交于
Patch adds 3-way parallel x86_64 assembly implementation of twofish as new module. New assembler functions crypt data in three blocks chunks, improving cipher performance on out-of-order CPUs. Patch has been tested with tcrypt and automated filesystem tests. Summary of the tcrypt benchmarks: Twofish 3-way-asm vs twofish asm (128bit 8kb block ECB) encrypt: 1.3x speed decrypt: 1.3x speed Twofish 3-way-asm vs twofish asm (128bit 8kb block CBC) encrypt: 1.07x speed decrypt: 1.4x speed Twofish 3-way-asm vs twofish asm (128bit 8kb block CTR) encrypt: 1.4x speed Twofish 3-way-asm vs AES asm (128bit 8kb block ECB) encrypt: 1.0x speed decrypt: 1.0x speed Twofish 3-way-asm vs AES asm (128bit 8kb block CBC) encrypt: 0.84x speed decrypt: 1.09x speed Twofish 3-way-asm vs AES asm (128bit 8kb block CTR) encrypt: 1.15x speed Full output: http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-3way-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-twofish-asm-x86_64.txt http://koti.mbnet.fi/axh/kernel/crypto/tcrypt-speed-aes-asm-x86_64.txt Tests were run on: vendor_id : AuthenticAMD cpu family : 16 model : 10 model name : AMD Phenom(tm) II X6 1055T Processor Also userspace test were run on: vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU E7330 @ 2.40GHz stepping : 11 Userspace test results: Encryption/decryption of twofish 3-way vs x86_64-asm on AMD Phenom II: encrypt: 1.27x decrypt: 1.25x Encryption/decryption of twofish 3-way vs x86_64-asm on Intel Xeon E7330: encrypt: 1.36x decrypt: 1.36x Signed-off-by: NJussi Kivilinna <jussi.kivilinna@mbnet.fi> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-