1. 05 3月, 2008 12 次提交
    • H
      memcg: fix oops on NULL lru list · fb59e9f1
      Hugh Dickins 提交于
      While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
      called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
       I couldn't see what racing tasks on other cpus were doing, but surmise that
      another must have been in mem_cgroup_charge_common on the same page, between
      its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
      which I'd almost changed to a kmalloc).
      
      Normally such a race cannot happen, the ref_cnt prevents it, the final
      uncharge cannot race with the initial charge.  But force_empty buggers the
      ref_cnt, that's what it's all about; and thereafter forced pages are
      vulnerable to races such as this (just think of a shared page also mapped into
      an mm of another mem_cgroup than that just emptied).  And remain vulnerable
      until they're freed indefinitely later.
      
      This patch just fixes the oops by moving the unlock_page_cgroups down below
      adding to and removing from the list (only possible given the previous patch);
      and while we're at it, we might as well make it an invariant that
      page->page_cgroup is always set while pc is on lru.
      
      But this behaviour of force_empty seems highly unsatisfactory to me: why have
      a ref_cnt if we always have to cope with it being violated (as in the earlier
      page migration patch).  We may prefer force_empty to move pages to an orphan
      mem_cgroup (could be the root, but better not), from which other cgroups could
      recover them; we might need to reverse the locking again; but no time now for
      such concerns.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb59e9f1
    • H
      memcg: simplify force_empty and move_lists · 9b3c0a07
      Hirokazu Takahashi 提交于
      As for force_empty, though this may not be the main topic here,
      mem_cgroup_force_empty_list() can be implemented simpler.  It is possible to
      make the function just call mem_cgroup_uncharge_page() instead of releasing
      page_cgroups by itself.  The tip is to call get_page() before invoking
      mem_cgroup_uncharge_page(), so the page won't be released during this
      function.
      
      Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
      the page might have been reassigned to an lru of a different mem_cgroup, and
      now be emptied from that; but Hugh claims that's okay, the end state is the
      same as when it hasn't gone to another list.
      
      And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
      mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
      holding page_cgroup lock (but still has to use try_lock_page_cgroup).
      Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b3c0a07
    • H
      memcg: fix mem_cgroup_move_lists locking · 2680eed7
      Hugh Dickins 提交于
      Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
      into page freeing, I've hit it from time to time in testing on some machines,
      sometimes only after many days.  Recently found a machine which could usually
      produce it within a few hours, which got me there at last.
      
      The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
      arrangement of structures was such that you got page_cgroups from the lru list
      neatly put on to SLUB's freelist.  Kamezawa-san identified the same hole
      independently.
      
      The main problem was that it was missing the lock_page_cgroup it needs to
      safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
      couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected.  See the code for
      comments on the constraints.
      
      This patch immediately gets replaced by a simpler one from Hirokazu-san; but
      is it just foolish pride that tells me to put this one on record, in case we
      need to come back to it later?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2680eed7
    • H
      memcg: css_put after remove_list · 6d48ff8b
      Hugh Dickins 提交于
      mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
      it, and before removing page_cgroup from one of its lru lists: isn't there a
      danger that struct mem_cgroup memory could be freed and reused before
      completing that, so corrupting something?  Never seen it, and for all I know
      there may be other constraints which make it impossible; but let's be
      defensive and reverse the ordering there.
      
      mem_cgroup_force_empty_list is safe because there's an extra css_get around
      all its works; but even so, change its ordering the same way round, to help
      get in the habit of doing it like this.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d48ff8b
    • H
      memcg: remove clear_page_cgroup and atomics · b9c565d5
      Hugh Dickins 提交于
      Remove clear_page_cgroup: it's an unhelpful helper, see for example how
      mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
      (serious races from that?  I'm not sure).
      
      Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
      atomic: it's always manipulated under lock_page_cgroup, except where
      force_empty unilaterally reset it to 0 (and how does uncharge's
      atomic_dec_and_test protect against that?).
      
      Simplify this page_cgroup locking: if you've got the lock and the pc is
      attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
      check that pc->page matches page (we're on the way to finding why sometimes it
      doesn't, but this patch doesn't fix that).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9c565d5
    • H
      memcg: memcontrol uninlined and static · d5b69e38
      Hugh Dickins 提交于
      More cleanup to memcontrol.c, this time changing some of the code generated.
      Let the compiler decide what to inline (except for page_cgroup_locked which is
      only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
      was quite a waste since bit_spin_lock etc.  are inlines in a header file; made
      mem_cgroup_force_empty and mem_cgroup_write_strategy static.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5b69e38
    • H
      memcg: memcontrol whitespace cleanups · 8869b8f6
      Hugh Dickins 提交于
      Sorry, before getting down to more important changes, I'd like to do some
      cleanup in memcontrol.c.  This patch doesn't change the code generated, but
      cleans up whitespace, moves up a double declaration, removes an unused enum,
      removes void returns, removes misleading comments, that kind of thing.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8869b8f6
    • H
      memcg: remove mem_cgroup_uncharge · 8289546e
      Hugh Dickins 提交于
      Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
      trivial wrapper around it) and mem_cgroup_end_migration (which does the same
      as mem_cgroup_uncharge_page).  And it often ends up having to lock just to let
      its caller unlock.  Remove it (but leave the silly locking until a later
      patch).
      
      Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8289546e
    • H
      memcg: mem_cgroup_charge never NULL · 7e924aaf
      Hugh Dickins 提交于
      My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
      mem_cgroup_charge_common.  It seemed convenient at the time, but hard to
      justify now: there's a perfectly appropriate swappage to charge and uncharge
      instead, this is not on any hot path through shmem_getpage, and no performance
      hit was observed from the slight extra overhead.
      
      So revert that NULL page handling from mem_cgroup_charge_common; and make it
      clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
      was a helper I found more of a hindrance to understanding.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e924aaf
    • H
      memcg: bad page if page_cgroup when free · 9442ec9d
      Hugh Dickins 提交于
      Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
      page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
      were set here, it'd likely cause corruption when the page is reused.
      
      Don't use page_assign_page_cgroup to clear it: that should be private to
      memcontrol.c, and always called with the lock taken; and memmap_init_zone
      doesn't need it either - like page->mapping and other pointers throughout the
      kernel, Linux assumes pointers in zeroed structures are NULL pointers.
      
      Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9442ec9d
    • H
      memcg: move_lists on page not page_cgroup · 427d5416
      Hugh Dickins 提交于
      Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
      it's more convenient if it acts upon the page itself not the page_cgroup; and
      in a later patch this becomes important to handle within memcontrol.c.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      427d5416
    • H
      memcg: mm_match_cgroup not vm_match_cgroup · bd845e38
      Hugh Dickins 提交于
      vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
      it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd845e38
  2. 24 2月, 2008 2 次提交
  3. 10 2月, 2008 1 次提交
  4. 08 2月, 2008 25 次提交