- 11 1月, 2012 5 次提交
-
-
由 Konstantin Khlebnikov 提交于
It not exported and now nobody uses it. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
This patch adds helper free_hot_cold_page_list() to free list of 0-order pages. It frees pages directly from list without temporary page-vector. It also calls trace_mm_pagevec_free() to simulate pagevec_free() behaviour. bloat-o-meter: add/remove: 1/1 grow/shrink: 1/3 up/down: 267/-295 (-28) function old new delta free_hot_cold_page_list - 264 +264 get_page_from_freelist 2129 2132 +3 __pagevec_free 243 239 -4 split_free_page 380 373 -7 release_pages 606 510 -96 free_page_list 188 - -188 Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Logic added in commit 8cab4754 ("vmscan: make mapped executable pages the first class citizen") was noticeably weakened in commit 64574746 ("vmscan: detect mapped file pages used only once"). Currently these pages can become "first class citizens" only after second usage. After this patch page_check_references() will activate they after first usage, and executable code gets yet better chance to stay in memory. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Commit 64574746 ("vmscan: detect mapped file pages used only once") greatly decreases lifetime of single-used mapped file pages. Unfortunately it also decreases life time of all shared mapped file pages. Because after commit bf3f3bc5 ("mm: don't mark_page_accessed in fault path") page-fault handler does not mark page active or even referenced. Thus page_check_references() activates file page only if it was used twice while it stays in inactive list, meanwhile it activates anon pages after first access. Inactive list can be small enough, this way reclaimer can accidentally throw away any widely used page if it wasn't used twice in short period. After this patch page_check_references() also activate file mapped page at first inactive list scan if this page is already used multiple times via several ptes. I found this while trying to fix degragation in rhel6 (~2.6.32) from rhel5 (~2.6.18). There a complete mess with >100 web/mail/spam/ftp containers, they share all their files but there a lot of anonymous pages: ~500mb shared file mapped memory and 15-20Gb non-shared anonymous memory. In this situation major-pagefaults are very costly, because all containers share the same page. In my load kernel created a disproportionate pressure on the file memory, compared with the anonymous, they equaled only if I raise swappiness up to 150 =) These patches actually wasn't helped a lot in my problem, but I saw noticable (10-20 times) reduce in count and average time of major-pagefault in file-mapped areas. Actually both patches are fixes for commit v2.6.33-5448-g64574746, because it was aimed at one scenario (singly used pages), but it breaks the logic in other scenarios (shared and/or executable pages) Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: NPekka Enberg <penberg@kernel.org> Acked-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Shaohua Li <shaohua.li@intel.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The tracing ring-buffer used this function briefly, but not anymore. Make it local to the writeback code again. Also, move the function so that no forward declaration needs to be reintroduced. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMel Gorman <mgorman@suse.de> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 1月, 2012 1 次提交
-
-
由 Steven Rostedt 提交于
Including trace/events/*.h TRACE_EVENT() macro headers in other headers can cause strange side effects if another trace/event/*.h header includes that header. Having trace/events/kmem.h inside slab_def.h caused a compile error in sparc64 when changes were done to some header files. Moving the kmem.h trace header out of slab.h and into slab.c fixes the problem. Note, both slub.c and slob.c already include the trace/events/kmem.h file. Only slab.c had it missing. Link: http://lkml.kernel.org/r/20120105190405.1e3191fb5a43b2a0f1655e1f@canb.auug.org.auReported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 1月, 2012 1 次提交
-
-
由 Glauber Costa 提交于
Sockets can also be created through sock_clone. Because it copies all data in the sock structure, it also copies the memcg-related pointer, and all should be fine. However, since we now use reference counts in socket creation, we are left with some sockets that have no reference counts. It matters when we destroy them, since it leads to a mismatch. Signed-off-by: NGlauber Costa <glommer@parallels.com> CC: David S. Miller <davem@davemloft.net> CC: Greg Thelen <gthelen@google.com> CC: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2012 1 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 1月, 2012 9 次提交
-
-
由 Jan Beulich 提交于
Just like the per-CPU ones they had several problems/shortcomings: Only the first memory operand was mentioned in the asm() operands, and the 2x64-bit version didn't have a memory clobber while the 2x32-bit one did. The former allowed the compiler to not recognize the need to re-load the data in case it had it cached in some register, while the latter was overly destructive. The types of the local copies of the old and new values were incorrect (the types of the pointed-to variables should be used here, to make sure the respective old/new variable types are compatible). The __dummy/__junk variables were pointless, given that local copies of the inputs already existed (and can hence be used for discarded outputs). The 32-bit variant of cmpxchg_double_local() referenced cmpxchg16b_local(). At once also: - change the return value type to what it really is: 'bool' - unify 32- and 64-bit variants - abstract out the common part of the 'normal' and 'local' variants Signed-off-by: NJan Beulich <jbeulich@suse.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/4F01F12A020000780006A19B@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export kill_bdev as well, so brd doesn't have to open code it. Reduce buffer_head.h requirement accordingly. Removed a rather large comment from invalidate_bdev, as it looked a bit obsolete to bother moving. The small comment replacing it says enough. Signed-off-by: NNick Piggin <npiggin@suse.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 30 12月, 2011 2 次提交
-
-
由 Hillf Danton 提交于
If a huge page is enqueued under the protection of hugetlb_lock, then the operation is atomic and safe. Signed-off-by: NHillf Danton <dhillf@gmail.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@vger.kernel.org> [2.6.37+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
commit 8aacc9f5 ("mm/mempolicy.c: fix pgoff in mbind vma merge") is the slightly incorrect fix. Why? Think following case. 1. map 4 pages of a file at offset 0 [0123] 2. map 2 pages just after the first mapping of the same file but with page offset 2 [0123][23] 3. mbind() 2 pages from the first mapping at offset 2. mbind_range() should treat new vma is, [0123][23] |23| mbind vma but it does [0123][23] |01| mbind vma Oops. then, it makes wrong vma merge and splitting ([01][0123] or similar). This patch fixes it. [testcase] test result - before the patch case4: 126: test failed. expect '2,4', actual '2,2,2' case5: passed case6: passed case7: passed case8: passed case_n: 246: test failed. expect '4,2', actual '1,4' ------------[ cut here ]------------ kernel BUG at mm/filemap.c:135! invalid opcode: 0000 [#4] SMP DEBUG_PAGEALLOC (snip long bug on messages) test result - after the patch case4: passed case5: passed case6: passed case7: passed case8: passed case_n: passed source: mbind_vma_test.c ============================================================ #include <numaif.h> #include <numa.h> #include <sys/mman.h> #include <stdio.h> #include <unistd.h> #include <stdlib.h> #include <string.h> static unsigned long pagesize; void* mmap_addr; struct bitmask *nmask; char buf[1024]; FILE *file; char retbuf[10240] = ""; int mapped_fd; char *rubysrc = "ruby -e '\ pid = %d; \ vstart = 0x%llx; \ vend = 0x%llx; \ s = `pmap -q #{pid}`; \ rary = []; \ s.each_line {|line|; \ ary=line.split(\" \"); \ addr = ary[0].to_i(16); \ if(vstart <= addr && addr < vend) then \ rary.push(ary[1].to_i()/4); \ end; \ }; \ print rary.join(\",\"); \ '"; void init(void) { void* addr; char buf[128]; nmask = numa_allocate_nodemask(); numa_bitmask_setbit(nmask, 0); pagesize = getpagesize(); sprintf(buf, "%s", "mbind_vma_XXXXXX"); mapped_fd = mkstemp(buf); if (mapped_fd == -1) perror("mkstemp "), exit(1); unlink(buf); if (lseek(mapped_fd, pagesize*8, SEEK_SET) < 0) perror("lseek "), exit(1); if (write(mapped_fd, "\0", 1) < 0) perror("write "), exit(1); addr = mmap(NULL, pagesize*8, PROT_NONE, MAP_SHARED, mapped_fd, 0); if (addr == MAP_FAILED) perror("mmap "), exit(1); if (mprotect(addr+pagesize, pagesize*6, PROT_READ|PROT_WRITE) < 0) perror("mprotect "), exit(1); mmap_addr = addr + pagesize; /* make page populate */ memset(mmap_addr, 0, pagesize*6); } void fin(void) { void* addr = mmap_addr - pagesize; munmap(addr, pagesize*8); memset(buf, 0, sizeof(buf)); memset(retbuf, 0, sizeof(retbuf)); } void mem_bind(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_BIND, nmask->maskp, nmask->size, 0); if (err) perror("mbind "), exit(err); } void mem_interleave(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_INTERLEAVE, nmask->maskp, nmask->size, 0); if (err) perror("mbind "), exit(err); } void mem_unbind(int index, int len) { int err; err = mbind(mmap_addr+pagesize*index, pagesize*len, MPOL_DEFAULT, NULL, 0, 0); if (err) perror("mbind "), exit(err); } void Assert(char *expected, char *value, char *name, int line) { if (strcmp(expected, value) == 0) { fprintf(stderr, "%s: passed\n", name); return; } else { fprintf(stderr, "%s: %d: test failed. expect '%s', actual '%s'\n", name, line, expected, value); // exit(1); } } /* AAAA PPPPPPNNNNNN might become PPNNNNNNNNNN case 4 below */ void case4(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 4); mem_unbind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("2,4", retbuf, "case4", __LINE__); fin(); } /* AAAA PPPPPPNNNNNN might become PPPPPPPPPPNN case 5 below */ void case5(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case5", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPPPPPPPPP 6 */ void case6(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_bind(4, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("6", retbuf, "case6", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPPPPPXXXX 7 */ void case7(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_interleave(4, 2); mem_bind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case7", __LINE__); fin(); } /* AAAA PPPPNNNNXXXX might become PPPPNNNNNNNN 8 */ void case8(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); mem_bind(0, 2); mem_interleave(4, 2); mem_interleave(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("2,4", retbuf, "case8", __LINE__); fin(); } void case_n(void) { init(); sprintf(buf, rubysrc, getpid(), mmap_addr, mmap_addr+pagesize*6); /* make redundunt mappings [0][1234][34][7] */ mmap(mmap_addr + pagesize*4, pagesize*2, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_SHARED, mapped_fd, pagesize*3); /* Expect to do nothing. */ mem_unbind(2, 2); file = popen(buf, "r"); fread(retbuf, sizeof(retbuf), 1, file); Assert("4,2", retbuf, "case_n", __LINE__); fin(); } int main(int argc, char** argv) { case4(); case5(); case6(); case7(); case8(); case_n(); return 0; } ============================================================= Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Caspar Zhang <caspar@casparzhang.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <stable@vger.kernel.org> [3.1.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2011 2 次提交
-
-
由 Glauber Costa 提交于
This reverts commit e5671dfa. After a follow up discussion with Michal, it was agreed it would be better to leave the kmem controller with just the tcp files, deferring the behavior of the other general memory.kmem.* files for a later time, when more caches are controlled. This is because generic kmem files are not used by tcp accounting and it is not clear how other slab caches would fit into the scheme. We are reverting the original commit so we can track the reference. Part of the patch is kept, because it was used by the later tcp code. Conflicts are shown in the bottom. init/Kconfig is removed from the revert entirely. Signed-off-by: NGlauber Costa <glommer@parallels.com> Acked-by: NMichal Hocko <mhocko@suse.cz> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: David S. Miller <davem@davemloft.net> Conflicts: Documentation/cgroups/memory.txt mm/memcontrol.c Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Lameter 提交于
We simply say that regular this_cpu use must be safe regardless of preemption and interrupt state. That has no material change for x86 and s390 implementations of this_cpu operations. However, arches that do not provide their own implementation for this_cpu operations will now get code generated that disables interrupts instead of preemption. -tj: This is part of on-going percpu API cleanup. For detailed discussion of the subject, please refer to the following thread. http://thread.gmane.org/gmane.linux.kernel/1222078Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <alpine.DEB.2.00.1112221154380.11787@router.home>
-
- 22 12月, 2011 2 次提交
-
-
由 Dave Kleikamp 提交于
lockdep reports a deadlock in jfs because a special inode's rw semaphore is taken recursively. The mapping's gfp mask is GFP_NOFS, but is not used when __read_cache_page() calls add_to_page_cache_lru(). Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kay Sievers 提交于
This moves the 'memory sysdev_class' over to a regular 'memory' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Signed-off-by: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 21 12月, 2011 3 次提交
-
-
由 Kautuk Consul 提交于
Static storage is not required for the struct vmap_area in __get_vm_area_node. Removing "static" to store this variable on the stack instead. Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Frantisek Hrbata 提交于
An integer overflow will happen on 64bit archs if task's sum of rss, swapents and nr_ptes exceeds (2^31)/1000 value. This was introduced by commit f755a042 oom: use pte pages in OOM score where the oom score computation was divided into several steps and it's no longer computed as one expression in unsigned long(rss, swapents, nr_pte are unsigned long), where the result value assigned to points(int) is in range(1..1000). So there could be an int overflow while computing 176 points *= 1000; and points may have negative value. Meaning the oom score for a mem hog task will be one. 196 if (points <= 0) 197 return 1; For example: [ 3366] 0 3366 35390480 24303939 5 0 0 oom01 Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical memory, but it's oom score is one. In this situation the mem hog task is skipped and oom killer kills another and most probably innocent task with oom score greater than one. The points variable should be of type long instead of int to prevent the int overflow. Signed-off-by: NFrantisek Hrbata <fhrbata@redhat.com> Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [2.6.36+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hillf Danton 提交于
If the request is to create non-root group and we fail to meet it, we should leave the root unchanged. Signed-off-by: NHillf Danton <dhillf@gmail.com> Acked-by: NHugh Dickins <hughd@google.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Balbir Singh <bsingharora@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2011 1 次提交
-
-
由 Eugene Surovegin 提交于
per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc case to the page boundary, which is bogus for any non-page-aligned address. This affects the only in-tree user of this function - sysfs handler for per-cpu 'crash_notes' physical address. The trouble is that the crash_notes per-cpu variable is not page-aligned: crash_notes = 0xc08e8ed4 PER-CPU OFFSET VALUES: CPU 0: 3711f000 CPU 1: 37129000 CPU 2: 37133000 CPU 3: 3713d000 So, the per-cpu addresses are: crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4 crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4 crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4 crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4 However, /sys/devices/system/cpu/cpu*/crash_notes says: /sys/devices/system/cpu/cpu0/crash_notes: 36b57000 /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000 /sys/devices/system/cpu/cpu2/crash_notes: 36b43000 /sys/devices/system/cpu/cpu3/crash_notes: 36b39000 As you can see, all values are rounded down to a page boundary. Consequently, this is where kexec sets up the NOTE segments, and thus where the secondary kernel is looking for them. However, when the first kernel crashes, it saves the notes to the unaligned addresses, where they are not found. Fix it by adding offset_in_page() to the translated page address. -tj: Combined Eugene's and Petr's commit messages. Signed-off-by: NEugene Surovegin <ebs@ebshome.net> Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NPetr Tesarik <ptesarik@suse.cz> Cc: stable@kernel.org
-
- 13 12月, 2011 4 次提交
-
-
由 Tejun Heo 提交于
Currently, there's no way to pass multiple tasks to cgroup_subsys methods necessitating the need for separate per-process and per-task methods. This patch introduces cgroup_taskset which can be used to pass multiple tasks and their associated cgroups to cgroup_subsys methods. Three methods - can_attach(), cancel_attach() and attach() - are converted to use cgroup_taskset. This unifies passed parameters so that all methods have access to all information. Conversions in this patchset are identical and don't introduce any behavior change. -v2: documentation updated as per Paul Menage's suggestion. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com> Acked-by: NPaul Menage <paul@paulmenage.org> Acked-by: NLi Zefan <lizf@cn.fujitsu.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: James Morris <jmorris@namei.org>
-
由 Glauber Costa 提交于
This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Glauber Costa 提交于
The goal of this work is to move the memory pressure tcp controls to a cgroup, instead of just relying on global conditions. To avoid excessive overhead in the network fast paths, the code that accounts allocated memory to a cgroup is hidden inside a static_branch(). This branch is patched out until the first non-root cgroup is created. So when nobody is using cgroups, even if it is mounted, no significant performance penalty should be seen. This patch handles the generic part of the code, and has nothing tcp-specific. Signed-off-by: NGlauber Costa <glommer@parallels.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Glauber Costa 提交于
This patch lays down the foundation for the kernel memory component of the Memory Controller. As of today, I am only laying down the following files: * memory.independent_kmem_limit * memory.kmem.limit_in_bytes (currently ignored) * memory.kmem.usage_in_bytes (always zero) Signed-off-by: NGlauber Costa <glommer@parallels.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: Michal Hocko <mhocko@suse.cz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 12月, 2011 9 次提交
-
-
由 Mel Gorman 提交于
Commit f5252e00 ("mm: avoid null pointer access in vm_struct via /proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after it is fully initialised. Unfortunately, it did not check that __vmalloc_area_node() successfully populated the area. In the event of allocation failure, the vmalloc area is freed but the pointer to freed memory is inserted into the vmlist leading to a a crash later in get_vmalloc_info(). This patch adds a check for ____vmalloc_area_node() failure within __vmalloc_node_range. It does not use "goto fail" as in the previous error path as a warning was already displayed by __vmalloc_area_node() before it called vfree in its failure path. Credit goes to Luciano Chavez for doing all the real work of identifying exactly where the problem was. Signed-off-by: NMel Gorman <mgorman@suse.de> Reported-by: NLuciano Chavez <lnx1138@linux.vnet.ibm.com> Tested-by: NLuciano Chavez <lnx1138@linux.vnet.ibm.com> Reviewed-by: NRik van Riel <riel@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: <stable@vger.kernel.org> [3.1.x+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
setup_zone_migrate_reserve() expects that zone->start_pfn starts at pageblock_nr_pages aligned pfn otherwise we could access beyond an existing memblock resulting in the following panic if CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid: IP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 *pdpt = 0000000000000000 *pde = f000ff53f000ff53 Oops: 0000 [#1] SMP Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform EIP: 0060:[<c02d331d>] EFLAGS: 00010006 CPU: 0 EIP is at setup_zone_migrate_reserve+0xcd/0x180 EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000 ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000) Call Trace: [<c02d389c>] __setup_per_zone_wmarks+0xec/0x160 [<c02d3a1f>] setup_per_zone_wmarks+0xf/0x20 [<c08a771c>] init_per_zone_wmark_min+0x27/0x86 [<c020111b>] do_one_initcall+0x2b/0x160 [<c086639d>] kernel_init+0xbe/0x157 [<c05cae26>] kernel_thread_helper+0x6/0xd Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 <2b> 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f EIP: [<c02d331d>] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58 CR2: 00000000000012b4 We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because highstart_pfn = 0x36ffe. The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page in a pageblock is reserved before marking it MIGRATE_RESERVE"). Make sure that start_pfn is always aligned to pageblock_nr_pages to ensure that pfn_valid s always called at the start of each pageblock. Architectures with holes in pageblocks will be correctly handled by pfn_valid_within in pageblock_is_reserved. Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NDang Bo <bdang@vmware.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Arve Hjnnevg <arve@android.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.0+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hillf Danton 提交于
Avoid unlocking and unlocked page if we failed to lock it. Signed-off-by: NHillf Danton <dhillf@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Youquan Song 提交于
Commit 70b50f94 ("mm: thp: tail page refcounting fix") keeps all page_tail->_count zero at all times. But the current kernel does not set page_tail->_count to zero if a 1GB page is utilized. So when an IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a tail page's _count does not equal zero. kernel BUG at include/linux/mm.h:386! invalid opcode: 0000 [#1] SMP Call Trace: gup_pud_range+0xb8/0x19d get_user_pages_fast+0xcb/0x192 ? trace_hardirqs_off+0xd/0xf hva_to_pfn+0x119/0x2f2 gfn_to_pfn_memslot+0x2c/0x2e kvm_iommu_map_pages+0xfd/0x1c1 kvm_iommu_map_memslots+0x7c/0xbd kvm_iommu_map_guest+0xaa/0xbf kvm_vm_ioctl_assigned_device+0x2ef/0xa47 kvm_vm_ioctl+0x36c/0x3a2 do_vfs_ioctl+0x49e/0x4e4 sys_ioctl+0x5a/0x7c system_call_fastpath+0x16/0x1b RIP gup_huge_pud+0xf2/0x159 Signed-off-by: NYouquan Song <youquan.song@intel.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Arcangeli 提交于
khugepaged can sometimes cause suspend to fail, requiring that the user retry the suspend operation. Use wait_event_freezable_timeout() instead of schedule_timeout_interruptible() to avoid missing freezer wakeups. A try_to_freeze() would have been needed in the khugepaged_alloc_hugepage tight loop too in case of the allocation failing repeatedly, and wait_event_freezable_timeout will provide it too. khugepaged would still freeze just fine by trying again the next minute but it's better if it freezes immediately. Reported-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Tested-by: NJiri Slaby <jslaby@suse.cz> Cc: Tejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Cc: "Rafael J. Wysocki" <rjw@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
Use atomic-long operations instead of looping around cmpxchg(). [akpm@linux-foundation.org: massage atomic.h inclusions] Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Konstantin Khlebnikov 提交于
A shrinker function can return -1, means that it cannot do anything without a risk of deadlock. For example prune_super() does this if it cannot grab a superblock refrence, even if nr_to_scan=0. Currently we interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan' according to this. So the next time around this shrinker can cause really big pressure. Let's skip such shrinkers instead. Also make total_scan signed, otherwise the check (total_scan < 0) below never works. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
Now that all early memory information is in memblock when enabled, we can implement reverse free area iterator and use it to implement NUMA aware allocator which is then wrapped for simpler variants instead of the confusing and inefficient mending of information in separate NUMA aware allocator. Implement for_each_free_mem_range_reverse(), use it to reimplement memblock_find_in_range_node() which in turn is used by all allocators. The visible allocator interface is inconsistent and can probably use some cleanup too. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org>
-
由 Tejun Heo 提交于
Now all ARCH_POPULATES_NODE_MAP archs select HAVE_MEBLOCK_NODE_MAP - there's no user of early_node_map[] left. Kill early_node_map[] and replace ARCH_POPULATES_NODE_MAP with HAVE_MEMBLOCK_NODE_MAP. Also, relocate for_each_mem_pfn_range() and helper from mm.h to memblock.h as page_alloc.c would no longer host an alternative implementation. This change is ultimately one to one mapping and shouldn't cause any observable difference; however, after the recent changes, there are some functions which now would fit memblock.c better than page_alloc.c and dependency on HAVE_MEMBLOCK_NODE_MAP instead of HAVE_MEMBLOCK doesn't make much sense on some of them. Further cleanups for functions inside HAVE_MEMBLOCK_NODE_MAP in mm.h would be nice. -v2: Fix compile bug introduced by mis-spelling CONFIG_HAVE_MEMBLOCK_NODE_MAP to CONFIG_MEMBLOCK_HAVE_NODE_MAP in mmzone.h. Reported by Stephen Rothwell. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Chen Liqin <liqin.chen@sunplusct.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com>
-