1. 17 12月, 2008 1 次提交
  2. 16 12月, 2008 1 次提交
  3. 11 12月, 2008 5 次提交
  4. 03 12月, 2008 2 次提交
  5. 02 12月, 2008 2 次提交
    • K
      memcg: memory hotplug fix for notifier callback · dc19f9db
      KAMEZAWA Hiroyuki 提交于
      Fixes for memcg/memory hotplug.
      
      While memory hotplug allocate/free memmap, page_cgroup doesn't free
      page_cgroup at OFFLINE when page_cgroup is allocated via bootomem.
      (Because freeing bootmem requires special care.)
      
      Then, if page_cgroup is allocated by bootmem and memmap is freed/allocated
      by memory hotplug, page_cgroup->page == page is no longer true.
      
      But current MEM_ONLINE handler doesn't check it and update
      page_cgroup->page if it's not necessary to allocate page_cgroup.  (This
      was not found because memmap is not freed if SPARSEMEM_VMEMMAP is y.)
      
      And I noticed that MEM_ONLINE can be called against "part of section".
      So, freeing page_cgroup at CANCEL_ONLINE will cause trouble.  (freeing
      used page_cgroup) Don't rollback at CANCEL.
      
      One more, current memory hotplug notifier is stopped by slub because it
      sets NOTIFY_STOP_MASK to return vaule.  So, page_cgroup's callback never
      be called.  (low priority than slub now.)
      
      I think this slub's behavior is not intentional(BUG). and fixes it.
      
      Another way to be considered about page_cgroup allocation:
        - free page_cgroup at OFFLINE even if it's from bootmem
          and remove specieal handler. But it requires more changes.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12041Signed-off-by: NKAMEZAWA Hiruyoki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Tested-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc19f9db
    • N
      mm: vmalloc fix lazy unmapping cache aliasing · b29acbdc
      Nick Piggin 提交于
      Jim Radford has reported that the vmap subsystem rewrite was sometimes
      causing his VIVT ARM system to behave strangely (seemed like going into
      infinite loops trying to fault in pages to userspace).
      
      We determined that the problem was most likely due to a cache aliasing
      issue.  flush_cache_vunmap was only being called at the moment the page
      tables were to be taken down, however with lazy unmapping, this can happen
      after the page has subsequently been freed and allocated for something
      else.  The dangling alias may still have dirty data attached to it.
      
      The fix for this problem is to do the cache flushing when the caller has
      called vunmap -- it would be a bug for them to write anything else to the
      mapping at that point.
      
      That appeared to solve Jim's problems.
      Reported-by: NJim Radford <radford@blackbean.org>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b29acbdc
  6. 01 12月, 2008 2 次提交
  7. 20 11月, 2008 7 次提交
  8. 17 11月, 2008 1 次提交
  9. 16 11月, 2008 1 次提交
  10. 13 11月, 2008 5 次提交
    • K
      memcg: bugfix for memory hotplug · 33c5d3d6
      KAMEZAWA Hiroyuki 提交于
      The start pfn calculation in page_cgroup's memory hotplug notifier chain
      is wrong.
      Tested-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33c5d3d6
    • K
      mm: remove lru_add_drain_all() from the munlock path · 8891d6da
      KOSAKI Motohiro 提交于
      lockdep warns about following message at boot time on one of my test
      machine.  Then, schedule_on_each_cpu() sholdn't be called when the task
      have mmap_sem.
      
      Actually, lru_add_drain_all() exist to prevent the unevictalble pages
      stay on reclaimable lru list.  but currenct unevictable code can rescue
      unevictable pages although it stay on reclaimable list.
      
      So removing is better.
      
      In addition, this patch add lru_add_drain_all() to sys_mlock() and
      sys_mlockall().  it isn't must.  but it reduce the failure of moving to
      unevictable list.  its failure can rescue in vmscan later.  but reducing
      is better.
      
      Note, if above rescuing happend, the Mlocked and the Unevictable field
      mismatching happend in /proc/meminfo.  but it doesn't cause any real
      trouble.
      
      =======================================================
      [ INFO: possible circular locking dependency detected ]
      2.6.28-rc2-mm1 #2
      -------------------------------------------------------
      lvm/1103 is trying to acquire lock:
       (&cpu_hotplug.lock){--..}, at: [<c0130789>] get_online_cpus+0x29/0x50
      
      but task is already holding lock:
       (&mm->mmap_sem){----}, at: [<c01878ae>] sys_mlockall+0x4e/0xb0
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #3 (&mm->mmap_sem){----}:
             [<c0153da2>] check_noncircular+0x82/0x110
             [<c0185e6a>] might_fault+0x4a/0xa0
             [<c0156161>] validate_chain+0xb11/0x1070
             [<c0185e6a>] might_fault+0x4a/0xa0
             [<c0156923>] __lock_acquire+0x263/0xa10
             [<c015714c>] lock_acquire+0x7c/0xb0			(*) grab mmap_sem
             [<c0185e6a>] might_fault+0x4a/0xa0
             [<c0185e9b>] might_fault+0x7b/0xa0
             [<c0185e6a>] might_fault+0x4a/0xa0
             [<c0294dd0>] copy_to_user+0x30/0x60
             [<c01ae3ec>] filldir+0x7c/0xd0
             [<c01e3a6a>] sysfs_readdir+0x11a/0x1f0			(*) grab sysfs_mutex
             [<c01ae370>] filldir+0x0/0xd0
             [<c01ae370>] filldir+0x0/0xd0
             [<c01ae4c6>] vfs_readdir+0x86/0xa0			(*) grab i_mutex
             [<c01ae75b>] sys_getdents+0x6b/0xc0
             [<c010355a>] syscall_call+0x7/0xb
             [<ffffffff>] 0xffffffff
      
      -> #2 (sysfs_mutex){--..}:
             [<c0153da2>] check_noncircular+0x82/0x110
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c0156161>] validate_chain+0xb11/0x1070
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c0156923>] __lock_acquire+0x263/0xa10
             [<c015714c>] lock_acquire+0x7c/0xb0			(*) grab sysfs_mutex
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c01e3d2c>] sysfs_addrm_start+0x2c/0xc0
             [<c01e422f>] create_dir+0x3f/0x90
             [<c01e42a9>] sysfs_create_dir+0x29/0x50
             [<c04faaf5>] _spin_unlock+0x25/0x40
             [<c028f21d>] kobject_add_internal+0xcd/0x1a0
             [<c028f37a>] kobject_set_name_vargs+0x3a/0x50
             [<c028f41d>] kobject_init_and_add+0x2d/0x40
             [<c019d4d2>] sysfs_slab_add+0xd2/0x180
             [<c019d580>] sysfs_add_func+0x0/0x70
             [<c019d5dc>] sysfs_add_func+0x5c/0x70			(*) grab slub_lock
             [<c01400f2>] run_workqueue+0x172/0x200
             [<c014008f>] run_workqueue+0x10f/0x200
             [<c0140bd0>] worker_thread+0x0/0xf0
             [<c0140c6c>] worker_thread+0x9c/0xf0
             [<c0143c80>] autoremove_wake_function+0x0/0x50
             [<c0140bd0>] worker_thread+0x0/0xf0
             [<c0143972>] kthread+0x42/0x70
             [<c0143930>] kthread+0x0/0x70
             [<c01042db>] kernel_thread_helper+0x7/0x1c
             [<ffffffff>] 0xffffffff
      
      -> #1 (slub_lock){----}:
             [<c0153d2d>] check_noncircular+0xd/0x110
             [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
             [<c0156161>] validate_chain+0xb11/0x1070
             [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
             [<c015433d>] mark_lock+0x35d/0xd00
             [<c0156923>] __lock_acquire+0x263/0xa10
             [<c015714c>] lock_acquire+0x7c/0xb0
             [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
             [<c04f93a3>] down_read+0x43/0x80
             [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0		(*) grab slub_lock
             [<c04f650f>] slab_cpuup_callback+0x11f/0x1d0
             [<c04fd9ac>] notifier_call_chain+0x3c/0x70
             [<c04f5454>] _cpu_up+0x84/0x110
             [<c04f552b>] cpu_up+0x4b/0x70				(*) grab cpu_hotplug.lock
             [<c06d1530>] kernel_init+0x0/0x170
             [<c06d15e5>] kernel_init+0xb5/0x170
             [<c06d1530>] kernel_init+0x0/0x170
             [<c01042db>] kernel_thread_helper+0x7/0x1c
             [<ffffffff>] 0xffffffff
      
      -> #0 (&cpu_hotplug.lock){--..}:
             [<c0155bff>] validate_chain+0x5af/0x1070
             [<c040f7e0>] dev_status+0x0/0x50
             [<c0156923>] __lock_acquire+0x263/0xa10
             [<c015714c>] lock_acquire+0x7c/0xb0
             [<c0130789>] get_online_cpus+0x29/0x50
             [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
             [<c0130789>] get_online_cpus+0x29/0x50
             [<c0130789>] get_online_cpus+0x29/0x50
             [<c017bc30>] lru_add_drain_per_cpu+0x0/0x10
             [<c0130789>] get_online_cpus+0x29/0x50			(*) grab cpu_hotplug.lock
             [<c0140cf2>] schedule_on_each_cpu+0x32/0xe0
             [<c0187095>] __mlock_vma_pages_range+0x85/0x2c0
             [<c0156945>] __lock_acquire+0x285/0xa10
             [<c0188f09>] vma_merge+0xa9/0x1d0
             [<c0187450>] mlock_fixup+0x180/0x200
             [<c0187548>] do_mlockall+0x78/0x90			(*) grab mmap_sem
             [<c01878e1>] sys_mlockall+0x81/0xb0
             [<c010355a>] syscall_call+0x7/0xb
             [<ffffffff>] 0xffffffff
      
      other info that might help us debug this:
      
      1 lock held by lvm/1103:
       #0:  (&mm->mmap_sem){----}, at: [<c01878ae>] sys_mlockall+0x4e/0xb0
      
      stack backtrace:
      Pid: 1103, comm: lvm Not tainted 2.6.28-rc2-mm1 #2
      Call Trace:
       [<c01555fc>] print_circular_bug_tail+0x7c/0xd0
       [<c0155bff>] validate_chain+0x5af/0x1070
       [<c040f7e0>] dev_status+0x0/0x50
       [<c0156923>] __lock_acquire+0x263/0xa10
       [<c015714c>] lock_acquire+0x7c/0xb0
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c04f8b55>] mutex_lock_nested+0xa5/0x2f0
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c017bc30>] lru_add_drain_per_cpu+0x0/0x10
       [<c0130789>] get_online_cpus+0x29/0x50
       [<c0140cf2>] schedule_on_each_cpu+0x32/0xe0
       [<c0187095>] __mlock_vma_pages_range+0x85/0x2c0
       [<c0156945>] __lock_acquire+0x285/0xa10
       [<c0188f09>] vma_merge+0xa9/0x1d0
       [<c0187450>] mlock_fixup+0x180/0x200
       [<c0187548>] do_mlockall+0x78/0x90
       [<c01878e1>] sys_mlockall+0x81/0xb0
       [<c010355a>] syscall_call+0x7/0xb
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NKamalesh Babulal <kamalesh@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8891d6da
    • D
      cpusets: update mems allowed in page allocator · e33c3b5e
      David Rientjes 提交于
      If all allowable memory is unreclaimable, it is possible to loop forever
      in the page allocator for ~__GFP_NORETRY allocations.
      
      During this time, it is also possible for a task's cpuset to expand its
      set of allowable nodes so that it now includes free memory.  The cached
      copy of this set, current->mems_allowed, is stale, however, since there
      has not been a subsequent call to cpuset_update_task_memory_state().
      
      The cached copy of the set of allowable nodes is now updated in the page
      allocator's slow path so the additional memory is available to
      get_page_from_freelist().
      
      [akpm@linux-foundation.org: add comment]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e33c3b5e
    • A
      hugetlb: make unmap_ref_private multi-size-aware · 7526674d
      Adam Litke 提交于
      Oops.  Part of the hugetlb private reservation code was not fully
      converted to use hstates.
      
      When a huge page must be unmapped from VMAs due to a failed COW,
      HPAGE_SIZE is used in the call to unmap_hugepage_range() regardless of
      the page size being used.  This works if the VMA is using the default
      huge page size.  Otherwise we might unmap too much, too little, or
      trigger a BUG_ON.  Rare but serious -- fix it.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7526674d
    • D
      parisc: fix find_extend_vma() breakage · 1c127185
      Denys Vlasenko 提交于
      The STACK_GROWSUP case of stack expansion was missing a test for 'prev',
      which got removed by commit cb8f488c
      ("mmap.c: deinline a few functions") by mistake.
      
      I found my original email in "sent" folder. The patch in that mail
      does NOT remove !prev. That change had beed added by someone else.
      
      Ok, I think we are not much interested in who did it, let's
      fix it for good.
      
      [ "It looks like this was caused by me fixing rejects.  That was the
        fancy include-lots-of-context-so-it-wont-apply patch." - akpm ]
      Reported-and-bisected-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c127185
  11. 07 11月, 2008 9 次提交
  12. 31 10月, 2008 3 次提交
  13. 23 10月, 2008 1 次提交
    • K
      memcg: fix page_cgroup allocation · 94b6da5a
      KAMEZAWA Hiroyuki 提交于
      page_cgroup_init() is called from mem_cgroup_init(). But at this
      point, we cannot call alloc_bootmem().
      (and this caused panic at boot.)
      
      This patch moves page_cgroup_init() to init/main.c.
      
      Time table is following:
      ==
        parse_args(). # we can trust mem_cgroup_subsys.disabled bit after this.
        ....
        cgroup_init_early()  # "early" init of cgroup.
        ....
        setup_arch()         # memmap is allocated.
        ...
        page_cgroup_init();
        mem_init();   # we cannot call alloc_bootmem after this.
        ....
        cgroup_init() # mem_cgroup is initialized.
      ==
      
      Before page_cgroup_init(), mem_map must be initialized. So,
      I added page_cgroup_init() to init/main.c directly.
      
      (*) maybe this is not very clean but
          - cgroup_init_early() is too early
          - in cgroup_init(), we have to use vmalloc instead of alloc_bootmem().
          use of vmalloc area in x86-32 is important and we should avoid very large
          vmalloc() in x86-32. So, we want to use alloc_bootmem() and added page_cgroup_init()
          directly to init/main.c
      
      [akpm@linux-foundation.org: remove unneeded/bad mem_cgroup_subsys declaration]
      [akpm@linux-foundation.org: fix build]
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94b6da5a