- 21 3月, 2012 2 次提交
-
-
由 Al Viro 提交于
same story - nobody uses it and it's been pointless since "mm: Remove i_mmap_lock lockbreak" went in. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
no point, really - the only instance that cares about those arguments of tlb_finish_mmu() is itanic and there we explicitly check if that's called from exit_mmap() (i.e. that ->fullmm is set), in which case we ignore those arguments completely. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 07 3月, 2012 2 次提交
-
-
由 Mikulas Patocka 提交于
Commit 6bd4837d ("mm: simplify find_vma_prev()") broke memory management on PA-RISC. After application of the patch, programs that allocate big arrays on the stack crash with segfault, for example, this will crash if compiled without optimization: int main() { char array[200000]; array[199999] = 0; return 0; } The reason is that PA-RISC has up-growing stack and the stack is usually the last memory area. In the above example, a page fault happens above the stack. Previously, if we passed too high address to find_vma_prev, it returned NULL and stored the last VMA in *pprev. After "simplify find_vma_prev" change, it stores NULL in *pprev. Consequently, the stack area is not found and it is not expanded, as it used to be before the change. This patch restores the old behavior and makes it return the last VMA in *pprev if the requested address is higher than address of any other VMA. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Currently error is -ENOMEM when rejecting VM_GROWSDOWN|VM_GROWSUP from shared anonymous: hoist the file case's -EINVAL up for both. Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 06 3月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 1月, 2012 2 次提交
-
-
由 KOSAKI Motohiro 提交于
commit 297c5eee ("mm: make the vma list be doubly linked") added the vm_prev member to vm_area_struct. We can simplify find_vma_prev() by using it. Also, this change helps to improve page fault performance because it has stronger locality of reference. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
migrate was doing an rmap_walk with speculative lock-less access on pagetables. That could lead it to not serializing properly against mremap PT locks. But a second problem remains in the order of vmas in the same_anon_vma list used by the rmap_walk. If vma_merge succeeds in copy_vma, the src vma could be placed after the dst vma in the same_anon_vma list. That could still lead to migrate missing some pte. This patch adds an anon_vma_moveto_tail() function to force the dst vma at the end of the list before mremap starts to solve the problem. If the mremap is very large and there are a lots of parents or childs sharing the anon_vma root lock, this should still scale better than taking the anon_vma root lock around every pte copy practically for the whole duration of mremap. Update: Hugh noticed special care is needed in the error path where move_page_tables goes in the reverse direction, a second anon_vma_moveto_tail() call is needed in the error path. This program exercises the anon_vma_moveto_tail: === int main() { static struct timeval oldstamp, newstamp; long diffsec; char *p, *p2, *p3, *p4; if (posix_memalign((void **)&p, 2*1024*1024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void **)&p2, 2*1024*1024, SIZE)) perror("memalign"), exit(1); if (posix_memalign((void **)&p3, 2*1024*1024, SIZE)) perror("memalign"), exit(1); memset(p, 0xff, SIZE); printf("%p\n", p); memset(p2, 0xff, SIZE); memset(p3, 0x77, 4096); if (memcmp(p, p2, SIZE)) printf("error\n"); p4 = mremap(p+SIZE/2, SIZE/2, SIZE/2, MREMAP_FIXED|MREMAP_MAYMOVE, p3); if (p4 != p3) perror("mremap"), exit(1); p4 = mremap(p4, SIZE/2, SIZE/2, MREMAP_FIXED|MREMAP_MAYMOVE, p+SIZE/2); if (p4 != p+SIZE/2) perror("mremap"), exit(1); if (memcmp(p, p2, SIZE)) printf("error\n"); printf("ok\n"); return 0; } === $ perf probe -a anon_vma_moveto_tail Add new event: probe:anon_vma_moveto_tail (on anon_vma_moveto_tail) You can now use it on all perf tools, such as: perf record -e probe:anon_vma_moveto_tail -aR sleep 1 $ perf record -e probe:anon_vma_moveto_tail -aR ./anon_vma_moveto_tail 0x7f2ca2800000 ok [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.043 MB perf.data (~1860 samples) ] $ perf report --stdio 100.00% anon_vma_moveto [kernel.kallsyms] [k] anon_vma_moveto_tail Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Reported-by: NNai Xia <nai.xia@gmail.com> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Pawel Sikora <pluto@agmk.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 11月, 2011 1 次提交
-
-
由 Kautuk Consul 提交于
The ret variable is really not needed in mm_take_all_locks(). Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 10月, 2011 1 次提交
-
-
由 Paul Gortmaker 提交于
The files changed within are only using the EXPORT_SYMBOL macro variants. They are not using core modular infrastructure and hence don't need module.h but only the export.h header. Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
-
- 26 7月, 2011 1 次提交
-
-
由 Dmitry Fink 提交于
- shmem pages are not immediately available, but they are not potentially available either, even if we swap them out, they will just relocate from memory into swap, total amount of immediate and potentially available memory is not going to be affected, so we shouldn't count them as potentially free in the first place. - nr_free_pages() is not an expensive operation anymore, there is no need to split the decision making in two halves and repeat code. Signed-off-by: NDmitry Fink <dmitry.fink@palm.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 6月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
We have some users of this function that date back to before the vma list was doubly linked, and just are silly. These days, you can find the previous vma by just following the vma->vm_prev pointer. In some cases you don't need any find_vma() lookup at all, and in other cases you're better off with the regular "find_vma()" that uses the vma cache front-end lookup. Some "find_vma_prev()" users are still valid, though. For example, in the case of a stack that grows up, it can be the case that we don't find any 'vma' at all (because we're looking up an address that is past the last vma), and that the stack that we want to grow is the 'prev' vma. But that kind of special case aside, we generally should prefer to use 'find_vma()'. Noticed due to a totally unrelated POWER memory corruption bug that just happened to hit in 'find_vma_prev()' and made me go "Hmm - why are we using that function here?". Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 5月, 2011 1 次提交
-
-
由 KOSAKI Motohiro 提交于
The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 5月, 2011 9 次提交
-
-
由 Peter Zijlstra 提交于
Straightforward conversion of anon_vma->lock to a mutex. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NHugh Dickins <hughd@google.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Straightforward conversion of i_mmap_lock to a mutex. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NHugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Hugh says: "The only significant loser, I think, would be page reclaim (when concurrent with truncation): could spin for a long time waiting for the i_mmap_mutex it expects would soon be dropped? " Counter points: - cpu contention makes the spin stop (need_resched()) - zap pages should be freeing pages at a higher rate than reclaim ever can I think the simplification of the truncate code is definitely worth it. Effectively reverts: 2aa15890 ("mm: prevent concurrent unmap_mapping_range() on the same inode") and takes out the code that caused its problem. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Rework the existing mmu_gather infrastructure. The direct purpose of these patches was to allow preemptible mmu_gather, but even without that I think these patches provide an improvement to the status quo. The first 9 patches rework the mmu_gather infrastructure. For review purpose I've split them into generic and per-arch patches with the last of those a generic cleanup. The next patch provides generic RCU page-table freeing, and the followup is a patch converting s390 to use this. I've also got 4 patches from DaveM lined up (not included in this series) that uses this to implement gup_fast() for sparc64. Then there is one patch that extends the generic mmu_gather batching. After that follow the mm preemptibility patches, these make part of the mm a lot more preemptible. It converts i_mmap_lock and anon_vma->lock to mutexes which together with the mmu_gather rework makes mmu_gather preemptible as well. Making i_mmap_lock a mutex also enables a clean-up of the truncate code. This also allows for preemptible mmu_notifiers, something that XPMEM I think wants. Furthermore, it removes the new and universially detested unmap_mutex. This patch: Remove the first obstacle towards a fully preemptible mmu_gather. The current scheme assumes mmu_gather is always done with preemption disabled and uses per-cpu storage for the page batches. Change this to try and allocate a page for batching and in case of failure, use a small on-stack array to make some progress. Preemptible mmu_gather is desired in general and usable once i_mmap_lock becomes a mutex. Doing it before the mutex conversion saves us from having to rework the code by moving the mmu_gather bits inside the pte_lock. Also avoid flushing the tlb batches from under the pte lock, this is useful even without the i_mmap_lock conversion as it significantly reduces pte lock hold times. [akpm@linux-foundation.org: fix comment tpyo] Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Currently we have expand_upwards exported while expand_downwards is accessible only via expand_stack or expand_stack_downwards. check_stack_guard_page is a nice example of the asymmetry. It uses expand_stack for VM_GROWSDOWN while expand_upwards is called for VM_GROWSUP case. Let's clean this up by exporting both functions and make those names consistent. Let's use expand_{upwards,downwards} because expanding doesn't always involve stack manipulation (an example is ia64_do_page_fault which uses expand_upwards for registers backing store expansion). expand_downwards has to be defined for both CONFIG_STACK_GROWS{UP,DOWN} because get_arg_page calls the downwards version in the early process initialization phase for growsup configuration. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NHugh Dickins <hughd@google.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Namhyung Kim 提交于
When I was reading nommu code, I found that it handles the vma list/tree in an unusual way. IIUC, because there can be more than one identical/overrapped vmas in the list/tree, it sorts the tree more strictly and does a linear search on the tree. But it doesn't applied to the list (i.e. the list could be constructed in a different order than the tree so that we can't use the list when finding the first vma in that order). Since inserting/sorting a vma in the tree and link is done at the same time, we can easily construct both of them in the same order. And linear searching on the tree could be more costly than doing it on the list, it can be converted to use the list. Also, after the commit 297c5eee ("mm: make the vma list be doubly linked") made the list be doubly linked, there were a couple of code need to be fixed to construct the list properly. Patch 1/6 is a preparation. It maintains the list sorted same as the tree and construct doubly-linked list properly. Patch 2/6 is a simple optimization for the vma deletion. Patch 3/6 and 4/6 convert tree traversal to list traversal and the rest are simple fixes and cleanups. This patch: @vma added into @mm should be sorted by start addr, end addr and VMA struct addr in that order because we may get identical VMAs in the @mm. However this was true only for the rbtree, not for the list. This patch fixes this by remembering 'rb_prev' during the tree traversal like find_vma_prepare() does and linking the @vma via __vma_link_list(). After this patch, we can iterate the whole VMAs in correct order simply by using @mm->mmap list. [akpm@linux-foundation.org: avoid duplicating __vma_link_list()] Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Acked-by: NGreg Ungerer <gerg@uclinux.org> Cc: David Howells <dhowells@redhat.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
Avoid merging a VMA with another VMA which is cloned from the parent process. The cloned VMA shares the anon_vma lock with the parent process's VMA. If we do the merge, more vmas (even the new range is only for current process) use the perent process's anon_vma lock. This introduces scalability issues. find_mergeable_anon_vma() already considers this case. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
If we only change vma->vm_end, we can avoid taking anon_vma lock even if 'insert' isn't NULL, which is the case of split_vma. As I understand it, we need the lock before because rmap must get the 'insert' VMA when we adjust old VMA's vm_end (the 'insert' VMA is linked to anon_vma list in __insert_vm_struct before). But now this isn't true any more. The 'insert' VMA is already linked to anon_vma list in __split_vma(with anon_vma_clone()) instead of __insert_vm_struct. There is no race rmap can't get required VMAs. So the anon_vma lock is unnecessary, and this can reduce one locking in brk case and improve scalability. Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Andi Kleen <andi@firstfloor.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
Make some variables have correct alignment/section to avoid cache issue. In a workload which heavily does mmap/munmap, the variables will be used frequently. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Rik van Riel <riel@redhat.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 5月, 2011 1 次提交
-
-
由 Hugh Dickins 提交于
Commit a626ca6a ("vm: fix vm_pgoff wrap in stack expansion") fixed the case of an expanding mapping causing vm_pgoff wrapping when you had downward stack expansion. But there was another case where IA64 and PA-RISC expand mappings: upward expansion. This fixes that case too. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 4月, 2011 1 次提交
-
-
由 Jiri Kosina 提交于
5520e894 ("brk: fix min_brk lower bound computation for COMPAT_BRK") tried to get the whole logic of brk randomization for legacy (libc5-based) applications finally right. It turns out that the way to detect whether brk has actually been randomized in the end or not introduced by that patch still doesn't work for those binaries, as reported by Geert: : /sbin/init from my old m68k ramdisk exists prematurely. : : Before the patch: : : | brk(0x80005c8e) = 0x80006000 : : After the patch: : : | brk(0x80005c8e) = 0x80005c8e : : Old libc5 considers brk() to have failed if the return value is not : identical to the requested value. I don't like it, but currently see no better option than a bit flag in task_struct to catch the CONFIG_COMPAT_BRK && randomize_va_space == 2 case. Signed-off-by: NJiri Kosina <jkosina@suse.cz> Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org> Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 4月, 2011 1 次提交
-
-
由 Linus Torvalds 提交于
Commit 982134ba ("mm: avoid wrapping vm_pgoff in mremap()") fixed the case of a expanding mapping causing vm_pgoff wrapping when you used mremap. But there was another case where we expand mappings hiding in plain sight: the automatic stack expansion. This fixes that case too. This one also found by Robert Święcki, using his nasty system call fuzzer tool. Good job. Reported-and-tested-by: NRobert Święcki <robert@swiecki.net> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 1月, 2011 3 次提交
-
-
由 Jiri Kosina 提交于
Even if CONFIG_COMPAT_BRK is set in the kernel configuration, it can still be overriden by randomize_va_space sysctl. If this is the case, the min_brk computation in sys_brk() implementation is wrong, as it solely takes into account COMPAT_BRK setting, assuming that brk start is not randomized. But that might not be the case if randomize_va_space sysctl has been set to '2' at the time the binary has been loaded from disk. In such case, the check has to be done in a same way as in !CONFIG_COMPAT_BRK case. In addition to that, the check for the COMPAT_BRK case introduced back in a5b4592c ("brk: make sys_brk() honor COMPAT_BRK when computing lower bound") is slightly wrong -- the lower bound shouldn't be mm->end_code, but mm->end_data instead, as that's where the legacy applications expect brk section to start (i.e. immediately after last global variable). [akpm@linux-foundation.org: fix comment] Signed-off-by: NJiri Kosina <jkosina@suse.cz> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
An huge pmd can only be mapped if the corresponding 2M virtual range is fully contained in the vma. At times the VM calls split_vma twice, if the first split_vma succeeds and the second fail, the first split_vma remains in effect and it's not rolled back. For split_vma or vma_adjust to fail an allocation failure is needed so it's a very unlikely event (the out of memory killer would normally fire before any allocation failure is visible to kernel and userland and if an out of memory condition happens it's unlikely to happen exactly here). Nevertheless it's safer to ensure that no huge pmd can be left around if the vma is adjusted in a way that can't fit hugepages anymore at the new vm_start/vm_end address. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
register in khugepaged if the vma grows. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2010 1 次提交
-
-
由 Tavis Ormandy 提交于
The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: NTavis Ormandy <taviso@google.com> Acked-by: NKees Cook <kees@ubuntu.com> Acked-by: NRobert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
Normal syscall audit doesn't catch 5th argument of syscall. It also doesn't catch the contents of userland structures pointed to be syscall argument, so for both old and new mmap(2) ABI it doesn't record the descriptor we are mapping. For old one it also misses flags. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 23 9月, 2010 1 次提交
-
-
由 Andrea Arcangeli 提交于
If __split_vma fails because of an out of memory condition the anon_vma_chain isn't teardown and freed potentially leading to rmap walks accessing freed vma information plus there's a memleak. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NJohannes Weiner <jweiner@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2010 1 次提交
-
-
由 Luck, Tony 提交于
pa-risc and ia64 have stacks that grow upwards. Check that they do not run into other mappings. By making VM_GROWSUP 0x0 on architectures that do not ever use it, we can avoid some unpleasant #ifdefs in check_stack_guard_page(). Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 8月, 2010 1 次提交
-
-
由 Linus Torvalds 提交于
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: NIan Campbell <ijc@hellion.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 8月, 2010 4 次提交
-
-
由 Andrea Arcangeli 提交于
There's no anon-vma related mangling happening inside __vma_link anymore so no need of anon_vma locking there. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Always (and only) lock the root (oldest) anon_vma whenever we do something in an anon_vma. The recently introduced anon_vma scalability is due to the rmap code scanning only the VMAs that need to be scanned. Many common operations still took the anon_vma lock on the root anon_vma, so always taking that lock is not expected to introduce any scalability issues. However, always taking the same lock does mean we only need to take one lock, which means rmap_walk on pages from any anon_vma in the vma is excluded from occurring during an munmap, expand_stack or other operation that needs to exclude rmap_walk and similar functions. Also add the proper locking to vma_adjust. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Subsitute a direct call of spin_lock(anon_vma->lock) with an inline function doing exactly the same. This makes it easier to do the substitution to the root anon_vma lock in a following patch. We will deal with the handful of special locks (nested, dec_and_lock, etc) separately. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rik van Riel 提交于
Rename anon_vma_lock to vma_lock_anon_vma. This matches the naming style used in page_lock_anon_vma and will come in really handy further down in this patch series. Signed-off-by: NRik van Riel <riel@redhat.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: NLarry Woodman <lwoodman@redhat.com> Acked-by: NLarry Woodman <lwoodman@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 6月, 2010 1 次提交
-
-
由 Eric B Munson 提交于
Add the capacility to track data mmap()s. This can be used together with PERF_SAMPLE_ADDR for data profiling. Signed-off-by: NAnton Blanchard <anton@samba.org> [Updated code for stable perf ABI] Signed-off-by: NEric B Munson <ebmunson@us.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1274193049-25997-1-git-send-email-ebmunson@us.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 27 4月, 2010 1 次提交
-
-
由 Rik van Riel 提交于
Check whether the VMA has a vm_ops before calling close, just like we check vm_ops before calling open a few dozen lines higher up in the function. Signed-off-by: NRik van Riel <riel@redhat.com> Reported-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 4月, 2010 2 次提交
-
-
由 Linus Torvalds 提交于
When we move the boundaries between two vma's due to things like mprotect, we need to make sure that the anon_vma of the pages that got moved from one vma to another gets properly copied around. And that was not always the case, in this rather hard-to-follow code sequence. Clarify the code, and fix it so that it copies the anon_vma from the right source. Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
This changes the anon_vma reuse case to require that we only reuse simple anon_vma's - ie the case when the vma only has a single anon_vma associated with it. This means that a reuse of an anon_vma from an adjacent vma will always guarantee that both vma's are associated not only with the same anon_vma, they will also have the same anon_vma chain (of just a single entry in this case). And since anon_vma re-use was the only case where the same anon_vma might be associated with different chains of anon_vma's, we now have the case that every vma that shares the same anon_vma will always also have the same chain. That makes it much easier to think about merging vma's that share the same anon_vma's: you can always just drop the other anon_vma chain in anon_vma_merge() since you know that they are always identical. This also splits up the function to validate the anon_vma re-use, and adds a lot of commentary about the possible races. Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-