- 24 12月, 2010 2 次提交
-
-
由 Paul Mundt 提交于
Now that these have been introduced in to the vmalloc API, sync up the nommu side of things. At present we don't deal with VMAs as such, so for the time being these will simply BUG() out. In the future it should be possible to support this interface by layering on top of the vm_regions. Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
-
由 Paul Mundt 提交于
Commit e1ca7788 ("mm: add vzalloc() and vzalloc_node() helpers") ended up accidentally deleting the vmalloc_node() symbol export, resulting in: "vmalloc_node" [net/core/pktgen.ko] undefined! "vmalloc_node" [net/netfilter/x_tables.ko] undefined! regressions. Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
-
- 23 12月, 2010 3 次提交
-
-
由 Michal Nazarewicz 提交于
GCC complained about update_mmu_cache() not being defined in migrate.c. Including <asm/tlbflush.h> seems to solve the problem. Signed-off-by: NMichal Nazarewicz <m.nazarewicz@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Using TASK_INTERRUPTIBLE in balance_dirty_pages() seems wrong. If it's going to do that then it must break out if signal_pending(), otherwise it's pretty much guaranteed to degenerate into a busywait loop. Plus we *do* want these processes to appear in D state and to contribute to load average. So it should be TASK_UNINTERRUPTIBLE. -- Andrew Morton Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
del_page_from_lru_list() already called mem_cgroup_del_lru(). So we must not call it again. It adds unnecessary overhead. It was not a runtime bug because the TestClearPageCgroupAcctLRU() early in mem_cgroup_del_lru_list() will prevent any double-deletion, etc. Signed-off-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 12月, 2010 1 次提交
-
-
由 Tavis Ormandy 提交于
The install_special_mapping routine (used, for example, to setup the vdso) skips the security check before insert_vm_struct, allowing a local attacker to bypass the mmap_min_addr security restriction by limiting the available pages for special mappings. bprm_mm_init() also skips the check, and although I don't think this can be used to bypass any restrictions, I don't see any reason not to have the security check. $ uname -m x86_64 $ cat /proc/sys/vm/mmap_min_addr 65536 $ cat install_special_mapping.s section .bss resb BSS_SIZE section .text global _start _start: mov eax, __NR_pause int 0x80 $ nasm -D__NR_pause=29 -DBSS_SIZE=0xfffed000 -f elf -o install_special_mapping.o install_special_mapping.s $ ld -m elf_i386 -Ttext=0x10000 -Tbss=0x11000 -o install_special_mapping install_special_mapping.o $ ./install_special_mapping & [1] 14303 $ cat /proc/14303/maps 0000f000-00010000 r-xp 00000000 00:00 0 [vdso] 00010000-00011000 r-xp 00001000 00:19 2453665 /home/taviso/install_special_mapping 00011000-ffffe000 rwxp 00000000 00:00 0 [stack] It's worth noting that Red Hat are shipping with mmap_min_addr set to 4096. Signed-off-by: NTavis Ormandy <taviso@google.com> Acked-by: NKees Cook <kees@ubuntu.com> Acked-by: NRobert Swiecki <swiecki@google.com> [ Changed to not drop the error code - akpm ] Reviewed-by: NJames Morris <jmorris@namei.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 07 12月, 2010 1 次提交
-
-
由 Rafael J. Wysocki 提交于
There is a problem that swap pages allocated before the creation of a hibernation image can be released and used for storing the contents of different memory pages while the image is being saved. Since the kernel stored in the image doesn't know of that, it causes memory corruption to occur after resume from hibernation, especially on systems with relatively small RAM that need to swap often. This issue can be addressed by keeping the GFP_IOFS bits clear in gfp_allowed_mask during the entire hibernation, including the saving of the image, until the system is finally turned off or the hibernation is aborted. Unfortunately, for this purpose it's necessary to rework the way in which the hibernate and suspend code manipulates gfp_allowed_mask. This change is based on an earlier patch from Hugh Dickins. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl> Reported-by: NOndrej Zary <linux@rainbow-software.org> Acked-by: NHugh Dickins <hughd@google.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: stable@kernel.org
-
- 04 12月, 2010 1 次提交
-
-
由 Tero Roponen 提交于
Commit f7cb1933 ("SLUB: Pass active and inactive redzone flags instead of boolean to debug functions") missed two instances of check_object(). This caused a lot of warnings during 'slabinfo -v' finally leading to a crash: BUG ext4_xattr: Freepointer corrupt ... BUG buffer_head: Freepointer corrupt ... BUG ext4_alloc_context: Freepointer corrupt ... ... BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 IP: [<ffffffff810a291f>] file_sb_list_del+0x1c/0x35 PGD 79d78067 PUD 79e67067 PMD 0 Oops: 0002 [#1] SMP last sysfs file: /sys/kernel/slab/:t-0000192/validate This patch fixes the problem by converting the two missed instances. Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NTero Roponen <tero.roponen@gmail.com> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 03 12月, 2010 6 次提交
-
-
由 KOSAKI Motohiro 提交于
commit 62b61f61 ("ksm: memory hotremove migration only") caused the following new lockdep warning. ======================================================= [ INFO: possible circular locking dependency detected ] ------------------------------------------------------- bash/1621 is trying to acquire lock: ((memory_chain).rwsem){.+.+.+}, at: [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0 but task is already holding lock: (ksm_thread_mutex){+.+.+.}, at: [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (ksm_thread_mutex){+.+.+.}: [<ffffffff8108b70a>] lock_acquire+0xaa/0x140 [<ffffffff81505d74>] __mutex_lock_common+0x44/0x3f0 [<ffffffff81506228>] mutex_lock_nested+0x48/0x60 [<ffffffff8113a3aa>] ksm_memory_callback+0x3a/0xc0 [<ffffffff8150c21c>] notifier_call_chain+0x8c/0xe0 [<ffffffff8107934e>] __blocking_notifier_call_chain+0x7e/0xc0 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813afbfb>] memory_notify+0x1b/0x20 [<ffffffff81141b7c>] remove_memory+0x1cc/0x5f0 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170 [<ffffffff8114f398>] vfs_write+0xc8/0x190 [<ffffffff8114fc14>] sys_write+0x54/0x90 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b -> #0 ((memory_chain).rwsem){.+.+.+}: [<ffffffff8108b5ba>] __lock_acquire+0x155a/0x1600 [<ffffffff8108b70a>] lock_acquire+0xaa/0x140 [<ffffffff81506601>] down_read+0x51/0xa0 [<ffffffff81079339>] __blocking_notifier_call_chain+0x69/0xc0 [<ffffffff810793a6>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff813afbfb>] memory_notify+0x1b/0x20 [<ffffffff81141f1e>] remove_memory+0x56e/0x5f0 [<ffffffff813af53d>] memory_block_change_state+0xfd/0x1a0 [<ffffffff813afd62>] store_mem_state+0xe2/0xf0 [<ffffffff813a0bb0>] sysdev_store+0x20/0x30 [<ffffffff811bc116>] sysfs_write_file+0xe6/0x170 [<ffffffff8114f398>] vfs_write+0xc8/0x190 [<ffffffff8114fc14>] sys_write+0x54/0x90 [<ffffffff810028b2>] system_call_fastpath+0x16/0x1b But it's a false positive. Both memory_chain.rwsem and ksm_thread_mutex have an outer lock (mem_hotplug_mutex). So they cannot deadlock. Thus, This patch annotate ksm_thread_mutex is not deadlock source. [akpm@linux-foundation.org: update comment, from Hugh] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KOSAKI Motohiro 提交于
Presently hwpoison is using lock_system_sleep() to prevent a race with memory hotplug. However lock_system_sleep() is a no-op if CONFIG_HIBERNATION=n. Therefore we need a new lock. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Suggested-by: NHugh Dickins <hughd@google.com> Acked-by: NHugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeremy Fitzhardinge 提交于
On stock 2.6.37-rc4, running: # mount lilith:/export /mnt/lilith # find /mnt/lilith/ -type f -print0 | xargs -0 file crashes the machine fairly quickly under Xen. Often it results in oops messages, but the couple of times I tried just now, it just hung quietly and made Xen print some rude messages: (XEN) mm.c:2389:d80 Bad type (saw 7400000000000001 != exp 3000000000000000) for mfn 1d7058 (pfn 18fa7) (XEN) mm.c:964:d80 Attempt to create linear p.t. with write perms (XEN) mm.c:2389:d80 Bad type (saw 7400000000000010 != exp 1000000000000000) for mfn 1d2e04 (pfn 1d1fb) (XEN) mm.c:2965:d80 Error while pinning mfn 1d2e04 Which means the domain tried to map a pagetable page RW, which would allow it to map arbitrary memory, so Xen stopped it. This is because vm_unmap_ram() left some pages mapped in the vmalloc area after NFS had finished with them, and those pages got recycled as pagetable pages while still having these RW aliases. Removing those mappings immediately removes the Xen-visible aliases, and so it has no problem with those pages being reused as pagetable pages. Deferring the TLB flush doesn't upset Xen because it can flush the TLB itself as needed to maintain its invariants. When unmapping a region in the vmalloc space, clear the ptes immediately. There's no point in deferring this because there's no amortization benefit. The TLBs are left dirty, and they are flushed lazily to amortize the cost of the IPIs. This specific motivation for this patch is an oops-causing regression since 2.6.36 when using NFS under Xen, triggered by the NFS client's use of vm_map_ram() introduced in 56e4ebf8 ("NFS: readdir with vmapped pages") . XFS also uses vm_map_ram() and could cause similar problems. Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Bryan Schumaker <bjschuma@netapp.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Alex Elder <aelder@sgi.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
The nr_dirty_[background_]threshold fields are misplaced before the numa_* fields, and users will read strange values. This is the right order. Before patch, nr_dirty_background_threshold will read as 0 (the value from numa_miss). numa_hit 128501 numa_miss 0 numa_foreign 0 numa_interleave 7388 numa_local 128501 numa_other 0 nr_dirty_threshold 144291 nr_dirty_background_threshold 72145 Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Michael Rubin <mrubin@google.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Zeng Zhaoming 提交于
find_task_by_vpid() should be protected by rcu_read_lock(), to prevent free_pid() reclaiming pid. Signed-off-by: NZeng Zhaoming <zengzm.kernel@gmail.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dean Nelson 提交于
Have hugetlb_fault() call unlock_page(page) only if it had previously called lock_page(page). Setting CONFIG_DEBUG_VM=y and then running the libhugetlbfs test suite, resulted in the tripping of VM_BUG_ON(!PageLocked(page)) in unlock_page() having been called by hugetlb_fault() when page == pagecache_page. This patch remedied the problem. Signed-off-by: NDean Nelson <dnelson@redhat.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 12月, 2010 1 次提交
-
-
由 Linus Torvalds 提交于
NFS needs to be able to release objects that are stored in the page cache once the page itself is no longer visible from the page cache. This patch adds a callback to the address space operations that allows filesystems to perform page cleanups once the page has been removed from the page cache. Original patch by: Linus Torvalds <torvalds@linux-foundation.org> [trondmy: cover the cases of invalidate_inode_pages2() and truncate_inode_pages()] Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 25 11月, 2010 6 次提交
-
-
由 David Sterba 提交于
Commit d33b9f45 ("mm: hugetlb: fix hugepage memory leak in walk_page_range()") introduces a check if a vma is a hugetlbfs one and later in 5dc37642 ("mm hugetlb: add hugepage support to pagemap") it is moved under #ifdef CONFIG_HUGETLB_PAGE but a needless find_vma call is left behind and its result is not used anywhere else in the function. The side-effect of caching vma for @addr inside walk->mm is neither utilized in walk_page_range() nor in called functions. Signed-off-by: NDavid Sterba <dsterba@suse.cz> Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: NAndi Kleen <ak@linux.intel.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Matt Mackall <mpm@selenic.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
mm/page_alloc.c: fix build_all_zonelist() where percpu_alloc() is wrongly called under stop_machine_run() During memory hotplug, build_allzonelists() may be called under stop_machine_run(). In this function, setup_zone_pageset() is called. But it's bug because it will do page allocation under stop_machine_run(). Here is a report from Alok Kataria. BUG: sleeping function called from invalid context at kernel/mutex.c:94 in_atomic(): 0, irqs_disabled(): 1, pid: 4, name: migration/0 Pid: 4, comm: migration/0 Not tainted 2.6.35.6-45.fc14.x86_64 #1 Call Trace: [<ffffffff8103d12b>] __might_sleep+0xeb/0xf0 [<ffffffff81468245>] mutex_lock+0x24/0x50 [<ffffffff8110eaa6>] pcpu_alloc+0x6d/0x7ee [<ffffffff81048888>] ? load_balance+0xbe/0x60e [<ffffffff8103a1b3>] ? rt_se_boosted+0x21/0x2f [<ffffffff8103e1cf>] ? dequeue_rt_stack+0x18b/0x1ed [<ffffffff8110f237>] __alloc_percpu+0x10/0x12 [<ffffffff81465e22>] setup_zone_pageset+0x38/0xbe [<ffffffff810d6d81>] ? build_zonelists_node.clone.58+0x79/0x8c [<ffffffff81452539>] __build_all_zonelists+0x419/0x46c [<ffffffff8108ef01>] ? cpu_stopper_thread+0xb2/0x198 [<ffffffff8108f075>] stop_machine_cpu_stop+0x8e/0xc5 [<ffffffff8108efe7>] ? stop_machine_cpu_stop+0x0/0xc5 [<ffffffff8108ef57>] cpu_stopper_thread+0x108/0x198 [<ffffffff81467a37>] ? schedule+0x5b2/0x5cc [<ffffffff8108ee4f>] ? cpu_stopper_thread+0x0/0x198 [<ffffffff81065f29>] kthread+0x7f/0x87 [<ffffffff8100aae4>] kernel_thread_helper+0x4/0x10 [<ffffffff81065eaa>] ? kthread+0x0/0x87 [<ffffffff8100aae0>] ? kernel_thread_helper+0x0/0x10 Built 5 zonelists in Node order, mobility grouping on. Total pages: 289456 Policy zone: Normal This patch tries to fix the issue by moving setup_zone_pageset() out from stop_machine_run(). It's obviously not necessary to be called under stop_machine_run(). [akpm@linux-foundation.org: remove unneeded local] Reported-by: NAlok Kataria <akataria@vmware.com> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Petr Vandrovec <petr@vmware.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Swap accounting can be configured by CONFIG_CGROUP_MEM_RES_CTLR_SWAP configuration option and then it is turned on by default. There is a boot option (noswapaccount) which can disable this feature. This makes it hard for distributors to enable the configuration option as this feature leads to a bigger memory consumption and this is a no-go for general purpose distribution kernel. On the other hand swap accounting may be very usuful for some workloads. This patch adds a new configuration option which controls the default behavior (CGROUP_MEM_RES_CTLR_SWAP_ENABLED). If the option is selected then the feature is turned on by default. It also adds a new boot parameter swapaccount[=1|0] which enhances the original noswapaccount parameter semantic by means of enable/disable logic (defaults to 1 if no value is provided to be still consistent with noswapaccount). The default behavior is unchanged (if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is enabled then CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is enabled as well) Signed-off-by: NMichal Hocko <mhocko@suse.cz> Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daisuke Nishimura 提交于
__mem_cgroup_try_charge() can be called under down_write(&mmap_sem)(e.g. mlock does it). This means it can cause deadlock if it races with move charge: Ex.1) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | down_write(&mmap_sem) mc.moving_task = current | .. mem_cgroup_precharge_mc() | __mem_cgroup_try_charge() mem_cgroup_count_precharge() | prepare_to_wait() down_read(&mmap_sem) | if (mc.moving_task) -> cannot aquire the lock | -> true | schedule() Ex.2) move charge | try charge --------------------------------------+------------------------------ mem_cgroup_can_attach() | mc.moving_task = current | mem_cgroup_precharge_mc() | mem_cgroup_count_precharge() | down_read(&mmap_sem) | .. | up_read(&mmap_sem) | | down_write(&mmap_sem) mem_cgroup_move_task() | .. mem_cgroup_move_charge() | __mem_cgroup_try_charge() down_read(&mmap_sem) | prepare_to_wait() -> cannot aquire the lock | if (mc.moving_task) | -> true | schedule() To avoid this deadlock, we do all the move charge works (both can_attach() and attach()) under one mmap_sem section. And after this patch, we set/clear mc.moving_task outside mc.lock, because we use the lock only to check mc.from/to. Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Fix this: kernel BUG at mm/memcontrol.c:2155! invalid opcode: 0000 [#1] last sysfs file: Pid: 18, comm: sh Not tainted 2.6.37-rc3 #3 /Bochs EIP: 0060:[<c10731b2>] EFLAGS: 00000246 CPU: 0 EIP is at mem_cgroup_move_account+0xe2/0xf0 EAX: 00000004 EBX: c6f931d4 ECX: c681c300 EDX: c681c000 ESI: c681c300 EDI: ffffffea EBP: c681c000 ESP: c46f3e30 DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068 Process sh (pid: 18, ti=c46f2000 task=c6826e60 task.ti=c46f2000) Stack: 00000155 c681c000 0805f000 c46ee180 c46f3e5c c7058820 c1074d37 00000000 08060000 c46db9a0 c46ec080 c7058820 0805f000 08060000 c46f3e98 c1074c50 c106c75e c46f3e98 c46ec080 08060000 0805ffff c46db9a0 c46f3e98 c46e0340 Call Trace: [<c1074d37>] ? mem_cgroup_move_charge_pte_range+0xe7/0x130 [<c1074c50>] ? mem_cgroup_move_charge_pte_range+0x0/0x130 [<c106c75e>] ? walk_page_range+0xee/0x1d0 [<c10725d6>] ? mem_cgroup_move_task+0x66/0x90 [<c1074c50>] ? mem_cgroup_move_charge_pte_range+0x0/0x130 [<c1072570>] ? mem_cgroup_move_task+0x0/0x90 [<c1042616>] ? cgroup_attach_task+0x136/0x200 [<c1042878>] ? cgroup_tasks_write+0x48/0xc0 [<c1041e9e>] ? cgroup_file_write+0xde/0x220 [<c101398d>] ? do_page_fault+0x17d/0x3f0 [<c108a79d>] ? alloc_fd+0x2d/0xd0 [<c1041dc0>] ? cgroup_file_write+0x0/0x220 [<c1077ba2>] ? vfs_write+0x92/0xc0 [<c1077c81>] ? sys_write+0x41/0x70 [<c1140e3d>] ? syscall_call+0x7/0xb Code: 03 00 74 09 8b 44 24 04 e8 1c f1 ff ff 89 73 04 8d 86 b0 00 00 00 b9 01 00 00 00 89 da 31 ff e8 65 f5 ff ff e9 4d ff ff ff 0f 0b <0f> 0b 0f 0b 0f 0b 90 8d b4 26 00 00 00 00 83 ec 10 8b 0d f4 e3 EIP: [<c10731b2>] mem_cgroup_move_account+0xe2/0xf0 SS:ESP 0068:c46f3e30 ---[ end trace 7daa1582159b6532 ]--- lock_page_cgroup and unlock_page_cgroup are implemented using bit_spinlock. bit_spinlock doesn't touch the bit if we are on non-SMP machine, so we can't use the bit to check whether the lock was taken. Let's introduce is_page_cgroup_locked based on bit_spin_is_locked instead of PageCgroupLocked to fix it. [akpm@linux-foundation.org: s/is_page_cgroup_locked/page_is_cgroup_locked/] Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Steven J. Magnani 提交于
Depending on processor speed, page size, and the amount of memory a process is allowed to amass, cleanup of a large VM may freeze the system for many seconds. This can result in a watchdog timeout. Make sure other tasks receive some service when cleaning up large VMs. Signed-off-by: NSteven J. Magnani <steve@digidescorp.com> Cc: Greg Ungerer <gerg@snapgear.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 11月, 2010 1 次提交
-
-
由 Pavel Emelyanov 提交于
There are two places, that do not release the slub_lock. Respective bugs were introduced by sysfs changes ab4d5ed5 (slub: Enable sysfs support for !CONFIG_SLUB_DEBUG) and 2bce6485 ( slub: Allow removal of slab caches during boot). Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NPavel Emelyanov <xemul@openvz.org> Signed-off-by: NPekka Enberg <penberg@kernel.org>
-
- 12 11月, 2010 4 次提交
-
-
由 Nick Piggin 提交于
Salman Qazi describes the following radix-tree bug: In the following case, we get can get a deadlock: 0. The radix tree contains two items, one has the index 0. 1. The reader (in this case find_get_pages) takes the rcu_read_lock. 2. The reader acquires slot(s) for item(s) including the index 0 item. 3. The non-zero index item is deleted, and as a consequence the other item is moved to the root of the tree. The place where it used to be is queued for deletion after the readers finish. 3b. The zero item is deleted, removing it from the direct slot, it remains in the rcu-delayed indirect node. 4. The reader looks at the index 0 slot, and finds that the page has 0 ref count 5. The reader looks at it again, hoping that the item will either be freed or the ref count will increase. This never happens, as the slot it is looking at will never be updated. Also, this slot can never be reclaimed because the reader is holding rcu_read_lock and is in an infinite loop. The fix is to re-use the same "indirect" pointer case that requires a slot lookup retry into a general "retry the lookup" bit. Signed-off-by: NNick Piggin <npiggin@kernel.dk> Reported-by: NSalman Qazi <sqazi@google.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shaohua Li 提交于
nr_dirty and nr_congested are increased only when the page is dirty. So if all pages are clean, both them will be zero. In this case, we should not mark the zone congested. Signed-off-by: NShaohua Li <shaohua.li@intel.com> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Hansen 提交于
70 hours into some stress tests of a 2.6.32-based enterprise kernel, we ran into a NULL dereference in here: int block_is_partially_uptodate(struct page *page, read_descriptor_t *desc, unsigned long from) { ----> struct inode *inode = page->mapping->host; It looks like page->mapping was the culprit. (xmon trace is below). After closer examination, I realized that do_generic_file_read() does a find_get_page(), and eventually locks the page before calling block_is_partially_uptodate(). However, it doesn't revalidate the page->mapping after the page is locked. So, there's a small window between the find_get_page() and ->is_partially_uptodate() where the page could get truncated and page->mapping cleared. We _have_ a reference, so it can't get reclaimed, but it certainly can be truncated. I think the correct thing is to check page->mapping after the trylock_page(), and jump out if it got truncated. This patch has been running in the test environment for a month or so now, and we have not seen this bug pop up again. xmon info: 1f:mon> e cpu 0x1f: Vector: 300 (Data Access) at [c0000002ae36f770] pc: c0000000001e7a6c: .block_is_partially_uptodate+0xc/0x100 lr: c000000000142944: .generic_file_aio_read+0x1e4/0x770 sp: c0000002ae36f9f0 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc000000378f99e30 paca = 0xc000000000f66300 pid = 21946, comm = bash 1f:mon> r R00 = 0025c0500000006d R16 = 0000000000000000 R01 = c0000002ae36f9f0 R17 = c000000362cd3af0 R02 = c000000000e8cd80 R18 = ffffffffffffffff R03 = c0000000031d0f88 R19 = 0000000000000001 R04 = c0000002ae36fa68 R20 = c0000003bb97b8a0 R05 = 0000000000000000 R21 = c0000002ae36fa68 R06 = 0000000000000000 R22 = 0000000000000000 R07 = 0000000000000001 R23 = c0000002ae36fbb0 R08 = 0000000000000002 R24 = 0000000000000000 R09 = 0000000000000000 R25 = c000000362cd3a80 R10 = 0000000000000000 R26 = 0000000000000002 R11 = c0000000001e7b60 R27 = 0000000000000000 R12 = 0000000042000484 R28 = 0000000000000001 R13 = c000000000f66300 R29 = c0000003bb97b9b8 R14 = 0000000000000001 R30 = c000000000e28a08 R15 = 000000000000ffff R31 = c0000000031d0f88 pc = c0000000001e7a6c .block_is_partially_uptodate+0xc/0x100 lr = c000000000142944 .generic_file_aio_read+0x1e4/0x770 msr = 8000000000009032 cr = 22000488 ctr = c0000000001e7a60 xer = 0000000020000000 trap = 300 dar = 0000000000000000 dsisr = 40000000 1f:mon> t [link register ] c000000000142944 .generic_file_aio_read+0x1e4/0x770 [c0000002ae36f9f0] c000000000142a14 .generic_file_aio_read+0x2b4/0x770 (unreliable) [c0000002ae36fb40] c0000000001b03e4 .do_sync_read+0xd4/0x160 [c0000002ae36fce0] c0000000001b153c .vfs_read+0xec/0x1f0 [c0000002ae36fd80] c0000000001b1768 .SyS_read+0x58/0xb0 [c0000002ae36fe30] c00000000000852c syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 00000080a840bc54 SP (fffca15df30) is in userspace 1f:mon> di c0000000001e7a6c c0000000001e7a6c e9290000 ld r9,0(r9) c0000000001e7a70 418200c0 beq c0000000001e7b30 # .block_is_partially_uptodate+0xd0/0x100 c0000000001e7a74 e9440008 ld r10,8(r4) c0000000001e7a78 78a80020 clrldi r8,r5,32 c0000000001e7a7c 3c000001 lis r0,1 c0000000001e7a80 812900a8 lwz r9,168(r9) c0000000001e7a84 39600001 li r11,1 c0000000001e7a88 7c080050 subf r0,r8,r0 c0000000001e7a8c 7f805040 cmplw cr7,r0,r10 c0000000001e7a90 7d6b4830 slw r11,r11,r9 c0000000001e7a94 796b0020 clrldi r11,r11,32 c0000000001e7a98 419d00a8 bgt cr7,c0000000001e7b40 # .block_is_partially_uptodate+0xe0/0x100 c0000000001e7a9c 7fa55840 cmpld cr7,r5,r11 c0000000001e7aa0 7d004214 add r8,r0,r8 c0000000001e7aa4 79080020 clrldi r8,r8,32 c0000000001e7aa8 419c0078 blt cr7,c0000000001e7b20 # .block_is_partially_uptodate+0xc0/0x100 Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NRik van Riel <riel@redhat.com> Cc: <arunabal@in.ibm.com> Cc: <sbest@us.ibm.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Carpenter 提交于
The original code had a null dereference if alloc_percpu() failed. This was introduced in commit 711d3d2c ("memcg: cpu hotplug aware percpu count updates") Signed-off-by: NDan Carpenter <error27@gmail.com> Reviewed-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 10 11月, 2010 1 次提交
-
-
由 Pekka Enberg 提交于
As pointed out by Linus, commit dab5855b ("perf_counter: Add mmap event hooks to mprotect()") is fundamentally wrong as mprotect_fixup() can free 'vma' due to merging. Fix the problem by moving perf_event_mmap() hook to mprotect_fixup(). Note: there's another successful return path from mprotect_fixup() if old flags equal to new flags. We don't, however, need to call perf_event_mmap() there because 'perf' already knows the VMA is executable. Reported-by: NDave Jones <davej@redhat.com> Analyzed-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Reviewed-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NPekka Enberg <penberg@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 11月, 2010 1 次提交
-
-
由 Wu Fengguang 提交于
Fix regression introduced by commit 79da826a ("writeback: report dirty thresholds in /proc/vmstat"). The incorrect pointer arithmetic can result in problems like this: BUG: unable to handle kernel paging request at 07c06d16 IP: [<c050c336>] strnlen+0x6/0x20 Call Trace: [<c050a249>] ? string+0x39/0xe0 [<c042be6b>] ? __wake_up_common+0x4b/0x80 [<c050afcc>] ? vsnprintf+0x1ec/0x380 [<c04b380e>] ? seq_printf+0x2e/0x60 [<c04829a6>] ? vmstat_show+0x26/0x30 [<c04b3bb6>] ? seq_read+0xa6/0x380 [<c04b3b10>] ? seq_read+0x0/0x380 [<c04d5d2f>] ? proc_reg_read+0x5f/0x90 [<c049c4a1>] ? vfs_read+0xa1/0x140 [<c04d5cd0>] ? proc_reg_read+0x0/0x90 [<c049c981>] ? sys_read+0x41/0x70 [<c0402bd0>] ? sysenter_do_call+0x12/0x26 Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Michael Rubin <mrubin@google.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 11月, 2010 1 次提交
-
-
由 Michel Lespinasse 提交于
This slipped by when unifying the filemap and swap versions of lock_page_or_retry()... Signed-off-by: NMichel Lespinasse <walken@google.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 30 10月, 2010 1 次提交
-
-
由 Al Viro 提交于
Normal syscall audit doesn't catch 5th argument of syscall. It also doesn't catch the contents of userland structures pointed to be syscall argument, so for both old and new mmap(2) ABI it doesn't record the descriptor we are mapping. For old one it also misses flags. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 29 10月, 2010 2 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Eric Dumazet 提交于
When a node contains only HighMem memory, slab_node(MPOL_BIND) dereferences a NULL pointer. [ This code seems to go back all the way to commit 19770b32: "mm: filter based on a nodemask as well as a gfp_mask". Which was back in April 2008, and it got merged into 2.6.26. - Linus ] Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Christoph Lameter <cl@linux.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 28 10月, 2010 8 次提交
-
-
由 Miklos Szeredi 提交于
Replace iterated page_cache_release() with release_pages(), which is faster and shorter. Needs release_pages() to be exported to modules. Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
This patch extracts the core logic from mem_cgroup_update_file_mapped() as mem_cgroup_update_file_stat() and adds a wrapper. As a planned future update, memory cgroup has to count dirty pages to implement dirty_ratio/limit. And more, the number of dirty pages is required to kick flusher thread to start writeback. (Now, no kick.) This patch is preparation for it and makes other statistics implementation clearer. Just a clean up. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com> Reviewed-by: NGreg Thelen <gthelen@google.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
An event counter MEM_CGROUP_ON_MOVE is used for quick check whether file stat update can be done in async manner or not. Now, it use percpu counter and for_each_possible_cpu to update. This patch replaces for_each_possible_cpu to for_each_online_cpu and adds necessary synchronization logic at CPU HOTPLUG. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Now, memcgroup's per cpu coutner uses for_each_possible_cpu() to get the value. It's better to use for_each_online_cpu() and a cpu hotplug handler. This patch only handles statistics counter. MEM_CGROUP_ON_MOVE will be handled in another patch. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
In memory cgroup management, we sometimes have to walk through subhierarchy of cgroup to gather informaiton, or lock something, etc. Now, to do that, mem_cgroup_walk_tree() function is provided. It calls given callback function per cgroup found. But the bad thing is that it has to pass a fixed style function and argument, "void*" and it adds much type casting to memcontrol.c. To make the code clean, this patch replaces walk_tree() with for_each_mem_cgroup_tree(iter, root) An iterator style call. The good point is that iterator call doesn't have to assume what kind of function is called under it. A bad point is that it may cause reference-count leak if a caller use "break" from the loop by mistake. I think the benefit is larger. The modified code seems straigtforward and easy to read because we don't have misterious callbacks and pointer cast. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
At accounting file events per memory cgroup, we need to find memory cgroup via page_cgroup->mem_cgroup. Now, we use lock_page_cgroup() for guarantee pc->mem_cgroup is not overwritten while we make use of it. But, considering the context which page-cgroup for files are accessed, we can use alternative light-weight mutual execusion in the most case. At handling file-caches, the only race we have to take care of is "moving" account, IOW, overwriting page_cgroup->mem_cgroup. (See comment in the patch) Unlike charge/uncharge, "move" happens not so frequently. It happens only when rmdir() and task-moving (with a special settings.) This patch adds a race-checker for file-cache-status accounting v.s. account moving. The new per-cpu-per-memcg counter MEM_CGROUP_ON_MOVE is added. The routine for account move 1. Increment it before start moving 2. Call synchronize_rcu() 3. Decrement it after the end of moving. By this, file-status-counting routine can check it needs to call lock_page_cgroup(). In most case, I doesn't need to call it. Following is a perf data of a process which mmap()/munmap 32MB of file cache in a minute. Before patch: 28.25% mmap mmap [.] main 22.64% mmap [kernel.kallsyms] [k] page_fault 9.96% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped 3.67% mmap [kernel.kallsyms] [k] filemap_fault 3.50% mmap [kernel.kallsyms] [k] unmap_vmas 2.99% mmap [kernel.kallsyms] [k] __do_fault 2.76% mmap [kernel.kallsyms] [k] find_get_page After patch: 30.00% mmap mmap [.] main 23.78% mmap [kernel.kallsyms] [k] page_fault 5.52% mmap [kernel.kallsyms] [k] mem_cgroup_update_file_mapped 3.81% mmap [kernel.kallsyms] [k] unmap_vmas 3.26% mmap [kernel.kallsyms] [k] find_get_page 3.18% mmap [kernel.kallsyms] [k] __do_fault 3.03% mmap [kernel.kallsyms] [k] filemap_fault 2.40% mmap [kernel.kallsyms] [k] handle_mm_fault 2.40% mmap [kernel.kallsyms] [k] do_page_fault This patch reduces memcg's cost to some extent. (mem_cgroup_update_file_mapped is called by both of map/unmap) Note: It seems some more improvements are required..but no idea. maybe removing set/unset flag is required. Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 KAMEZAWA Hiroyuki 提交于
Presently memory cgroup accounts file-mapped by counter and flag. counter is working in the same way with zone_stat but FileMapped flag only exists in memcg (for helping move_account). This flag can be updated wrongly in a case. Assume CPU0 and CPU1 and a thread mapping a page on CPU0, another thread unmapping it on CPU1. CPU0 CPU1 rmv rmap (mapcount 1->0) add rmap (mapcount 0->1) lock_page_cgroup() memcg counter+1 (some delay) set MAPPED FLAG. unlock_page_cgroup() lock_page_cgroup() memcg counter-1 clear MAPPED flag In the above sequence counter is properly updated but FLAG is not. This means that representing a state by a flag which is maintained by counter needs some special care. To handle this, when clearing a flag, this patch check mapcount directly and clear the flag only when mapcount == 0. (if mapcount >0, someone will make it to zero later and flag will be cleared.) Reverse case, dec-after-inc cannot be a problem because page_table_lock() works well for it. (IOW, to make above sequence, 2 processes should touch the same page at once with map/unmap.) Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
It appears i386 uses kmap_atomic infrastructure regardless of CONFIG_HIGHMEM which results in a compile error when highmem is disabled. Cure this by providing the needed few bits for both CONFIG_HIGHMEM and CONFIG_X86_32. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Reported-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-