1. 27 3月, 2008 1 次提交
  2. 25 3月, 2008 2 次提交
  3. 20 3月, 2008 7 次提交
  4. 19 3月, 2008 1 次提交
  5. 18 3月, 2008 1 次提交
    • C
      slub page alloc fallback: Enable interrupts for GFP_WAIT. · caeab084
      Christoph Lameter 提交于
      The fallback path needs to enable interrupts like done for
      the other page allocator calls. This was not necessary with
      the alternate fast path since we handled irq enable/disable in
      the slow path. The regular fastpath handles irq enable/disable
      around calls to the slow path so we need to restore the proper
      status before calling the page allocator from the slowpath.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      caeab084
  6. 11 3月, 2008 3 次提交
    • N
      iov_iter_advance() fix · f7009264
      Nick Piggin 提交于
      iov_iter_advance() skips over zero-length iovecs, however it does not properly
      terminate at the end of the iovec array.  Fix this by checking against
      i->count before we skip a zero-length iov.
      
      The bug was reproduced with a test program that continually randomly creates
      iovs to writev.  The fix was also verified with the same program and also it
      could verify that the correct data was contained in the file after each
      writev.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Tested-by: N"Kevin Coffman" <kwc@citi.umich.edu>
      Cc: "Alexey Dobriyan" <adobriyan@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7009264
    • A
      hugetlb: correct page count for surplus huge pages · 2668db91
      Adam Litke 提交于
      Free pages in the hugetlb pool are free and as such have a reference count of
      zero.  Regular allocations into the pool from the buddy are "freed" into the
      pool which results in their page_count dropping to zero.  However, surplus
      pages can be directly utilized by the caller without first being freed to the
      pool.  Therefore, a call to put_page_testzero() is in order so that such a
      page will be handed to the caller with a correct count.
      
      This has not affected end users because the bad page count is reset before the
      page is handed off.  However, under CONFIG_DEBUG_VM this triggers a BUG when
      the page count is validated.
      
      Thanks go to Mel for first spotting this issue and providing an initial fix.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2668db91
    • L
      mempolicy: fix reference counting bugs · 69682d85
      Lee Schermerhorn 提交于
      Address 3 known bugs in the current memory policy reference counting method.
      I have a series of patches to rework the reference counting to reduce overhead
      in the allocation path.  However, that series will require testing in -mm once
      I repost it.
      
      1) alloc_page_vma() does not release the extra reference taken for
         vma/shared mempolicy when the mode == MPOL_INTERLEAVE.  This can result in
         leaking mempolicy structures.  This is probably occurring, but not being
         noticed.
      
         Fix:  add the conditional release of the reference.
      
      2) hugezonelist unconditionally releases a reference on the mempolicy when
         mode == MPOL_INTERLEAVE.  This can result in decrementing the reference
         count for system default policy [should have no ill effect] or premature
         freeing of task policy.  If this occurred, the next allocation using task
         mempolicy would use the freed structure and probably BUG out.
      
         Fix:  add the necessary check to the release.
      
      3) The current reference counting method assumes that vma 'get_policy()'
         methods automatically add an extra reference a non-NULL returned mempolicy.
          This is true for shmem_get_policy() used by tmpfs mappings, including
         regular page shm segments.  However, SHM_HUGETLB shm's, backed by
         hugetlbfs, just use the vma policy without the extra reference.  This
         results in freeing of the vma policy on the first allocation, with reuse of
         the freed mempolicy structure on subsequent allocations.
      
         Fix: Rather than add another condition to the conditional reference
         release, which occur in the allocation path, just add a reference when
         returning the vma policy in shm_get_policy() to match the assumptions.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: David Rientjes <rientjes@google.com>
      Cc: <eric.whitney@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69682d85
  7. 10 3月, 2008 1 次提交
  8. 07 3月, 2008 5 次提交
  9. 05 3月, 2008 19 次提交
    • N
      hugetlb: fix pool shrinking while in restricted cpuset · 348e1e04
      Nishanth Aravamudan 提交于
      Adam Litke noticed that currently we grow the hugepage pool independent of any
      cpuset the running process may be in, but when shrinking the pool, the cpuset
      is checked.  This leads to inconsistency when shrinking the pool in a
      restricted cpuset -- an administrator may have been able to grow the pool on a
      node restricted by a containing cpuset, but they cannot shrink it there.
      
      There are two options: either prevent growing of the pool outside of the
      cpuset or allow shrinking outside of the cpuset.  >From previous discussions
      on linux-mm, /proc/sys/vm/nr_hugepages is an administrative interface that
      should not be restricted by cpusets.  So allow shrinking the pool by removing
      pages from nodes outside of current's cpuset.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Acked-by: NAdam Litke <agl@us.ibm.com>
      Cc: William Irwin <wli@holomorphy.com>
      Cc: Lee Schermerhorn <Lee.Schermerhonr@hp.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      348e1e04
    • A
      hugetlb: close a difficult to trigger reservation race · ac09b3a1
      Adam Litke 提交于
      A hugetlb reservation may be inadequately backed in the event of racing
      allocations and frees when utilizing surplus huge pages.  Consider the
      following series of events in processes A and B:
      
       A) Allocates some surplus pages to satisfy a reservation
       B) Frees some huge pages
       A) A notices the extra free pages and drops hugetlb_lock to free some of
          its surplus pages back to the buddy allocator.
       B) Allocates some huge pages
       A) Reacquires hugetlb_lock and returns from gather_surplus_huge_pages()
      
      Avoid this by commiting the reservation after pages have been allocated but
      before dropping the lock to free excess pages.  For parity, release the
      reservation in return_unused_surplus_pages().
      
      This patch also corrects the cpuset_mems_nr() error path in
      hugetlb_acct_memory().  If the cpuset check fails, uncommit the
      reservation, but also be sure to return any surplus huge pages that may
      have been allocated to back the failed reservation.
      
      Thanks to Andy Whitcroft for discovering this.
      Signed-off-by: NAdam Litke <agl@us.ibm.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac09b3a1
    • H
      memcg: fix oops on NULL lru list · fb59e9f1
      Hugh Dickins 提交于
      While testing force_empty, during an exit_mmap, __mem_cgroup_remove_list
      called from mem_cgroup_uncharge_page oopsed on a NULL pointer in the lru list.
       I couldn't see what racing tasks on other cpus were doing, but surmise that
      another must have been in mem_cgroup_charge_common on the same page, between
      its unlock_page_cgroup and spin_lock_irqsave near done (thanks to that kzalloc
      which I'd almost changed to a kmalloc).
      
      Normally such a race cannot happen, the ref_cnt prevents it, the final
      uncharge cannot race with the initial charge.  But force_empty buggers the
      ref_cnt, that's what it's all about; and thereafter forced pages are
      vulnerable to races such as this (just think of a shared page also mapped into
      an mm of another mem_cgroup than that just emptied).  And remain vulnerable
      until they're freed indefinitely later.
      
      This patch just fixes the oops by moving the unlock_page_cgroups down below
      adding to and removing from the list (only possible given the previous patch);
      and while we're at it, we might as well make it an invariant that
      page->page_cgroup is always set while pc is on lru.
      
      But this behaviour of force_empty seems highly unsatisfactory to me: why have
      a ref_cnt if we always have to cope with it being violated (as in the earlier
      page migration patch).  We may prefer force_empty to move pages to an orphan
      mem_cgroup (could be the root, but better not), from which other cgroups could
      recover them; we might need to reverse the locking again; but no time now for
      such concerns.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb59e9f1
    • H
      memcg: simplify force_empty and move_lists · 9b3c0a07
      Hirokazu Takahashi 提交于
      As for force_empty, though this may not be the main topic here,
      mem_cgroup_force_empty_list() can be implemented simpler.  It is possible to
      make the function just call mem_cgroup_uncharge_page() instead of releasing
      page_cgroups by itself.  The tip is to call get_page() before invoking
      mem_cgroup_uncharge_page(), so the page won't be released during this
      function.
      
      Kamezawa-san points out that by the time mem_cgroup_uncharge_page() uncharges,
      the page might have been reassigned to an lru of a different mem_cgroup, and
      now be emptied from that; but Hugh claims that's okay, the end state is the
      same as when it hasn't gone to another list.
      
      And once force_empty stops taking lock_page_cgroup within mz->lru_lock,
      mem_cgroup_move_lists() can be simplified to take mz->lru_lock directly while
      holding page_cgroup lock (but still has to use try_lock_page_cgroup).
      Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b3c0a07
    • H
      memcg: fix mem_cgroup_move_lists locking · 2680eed7
      Hugh Dickins 提交于
      Ever since the VM_BUG_ON(page_get_page_cgroup(page)) (now Bad page state) went
      into page freeing, I've hit it from time to time in testing on some machines,
      sometimes only after many days.  Recently found a machine which could usually
      produce it within a few hours, which got me there at last.
      
      The culprit is mem_cgroup_move_lists, whose locking is inadequate; and the
      arrangement of structures was such that you got page_cgroups from the lru list
      neatly put on to SLUB's freelist.  Kamezawa-san identified the same hole
      independently.
      
      The main problem was that it was missing the lock_page_cgroup it needs to
      safely page_get_page_cgroup; but it's tricky to go beyond that too, and I
      couldn't do it with SLAB_DESTROY_BY_RCU as I'd expected.  See the code for
      comments on the constraints.
      
      This patch immediately gets replaced by a simpler one from Hirokazu-san; but
      is it just foolish pride that tells me to put this one on record, in case we
      need to come back to it later?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2680eed7
    • H
      memcg: css_put after remove_list · 6d48ff8b
      Hugh Dickins 提交于
      mem_cgroup_uncharge_page does css_put on the mem_cgroup before uncharging from
      it, and before removing page_cgroup from one of its lru lists: isn't there a
      danger that struct mem_cgroup memory could be freed and reused before
      completing that, so corrupting something?  Never seen it, and for all I know
      there may be other constraints which make it impossible; but let's be
      defensive and reverse the ordering there.
      
      mem_cgroup_force_empty_list is safe because there's an extra css_get around
      all its works; but even so, change its ordering the same way round, to help
      get in the habit of doing it like this.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d48ff8b
    • H
      memcg: remove clear_page_cgroup and atomics · b9c565d5
      Hugh Dickins 提交于
      Remove clear_page_cgroup: it's an unhelpful helper, see for example how
      mem_cgroup_uncharge_page had to unlock_page_cgroup just in order to call it
      (serious races from that?  I'm not sure).
      
      Once that's gone, you can see it's pointless for page_cgroup's ref_cnt to be
      atomic: it's always manipulated under lock_page_cgroup, except where
      force_empty unilaterally reset it to 0 (and how does uncharge's
      atomic_dec_and_test protect against that?).
      
      Simplify this page_cgroup locking: if you've got the lock and the pc is
      attached, then the ref_cnt must be positive: VM_BUG_ONs to check that, and to
      check that pc->page matches page (we're on the way to finding why sometimes it
      doesn't, but this patch doesn't fix that).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9c565d5
    • H
      memcg: memcontrol uninlined and static · d5b69e38
      Hugh Dickins 提交于
      More cleanup to memcontrol.c, this time changing some of the code generated.
      Let the compiler decide what to inline (except for page_cgroup_locked which is
      only used when CONFIG_DEBUG_VM): the __always_inline on lock_page_cgroup etc.
      was quite a waste since bit_spin_lock etc.  are inlines in a header file; made
      mem_cgroup_force_empty and mem_cgroup_write_strategy static.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5b69e38
    • H
      memcg: memcontrol whitespace cleanups · 8869b8f6
      Hugh Dickins 提交于
      Sorry, before getting down to more important changes, I'd like to do some
      cleanup in memcontrol.c.  This patch doesn't change the code generated, but
      cleans up whitespace, moves up a double declaration, removes an unused enum,
      removes void returns, removes misleading comments, that kind of thing.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8869b8f6
    • H
      memcg: remove mem_cgroup_uncharge · 8289546e
      Hugh Dickins 提交于
      Nothing uses mem_cgroup_uncharge apart from mem_cgroup_uncharge_page, (a
      trivial wrapper around it) and mem_cgroup_end_migration (which does the same
      as mem_cgroup_uncharge_page).  And it often ends up having to lock just to let
      its caller unlock.  Remove it (but leave the silly locking until a later
      patch).
      
      Moved mem_cgroup_cache_charge next to mem_cgroup_charge in memcontrol.h.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8289546e
    • H
      memcg: mem_cgroup_charge never NULL · 7e924aaf
      Hugh Dickins 提交于
      My memcgroup patch to fix hang with shmem/tmpfs added NULL page handling to
      mem_cgroup_charge_common.  It seemed convenient at the time, but hard to
      justify now: there's a perfectly appropriate swappage to charge and uncharge
      instead, this is not on any hot path through shmem_getpage, and no performance
      hit was observed from the slight extra overhead.
      
      So revert that NULL page handling from mem_cgroup_charge_common; and make it
      clearer by bringing page_cgroup_assign_new_page_cgroup into its body - that
      was a helper I found more of a hindrance to understanding.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e924aaf
    • H
      memcg: bad page if page_cgroup when free · 9442ec9d
      Hugh Dickins 提交于
      Replace free_hot_cold_page's VM_BUG_ON(page_get_page_cgroup(page)) by a "Bad
      page state" and clear: most users don't have CONFIG_DEBUG_VM on, and if it
      were set here, it'd likely cause corruption when the page is reused.
      
      Don't use page_assign_page_cgroup to clear it: that should be private to
      memcontrol.c, and always called with the lock taken; and memmap_init_zone
      doesn't need it either - like page->mapping and other pointers throughout the
      kernel, Linux assumes pointers in zeroed structures are NULL pointers.
      
      Instead use page_reset_bad_cgroup, added to memcontrol.h for this only.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9442ec9d
    • H
      memcg: fix VM_BUG_ON from page migration · 98837c7f
      Hugh Dickins 提交于
      Page migration gave me free_hot_cold_page's VM_BUG_ON page->page_cgroup.
      remove_migration_pte was calling mem_cgroup_charge on the new page whenever it
      found a swap pte, before it had determined it to be a migration entry.  That
      left a surplus reference count on the page_cgroup, so it was still attached
      when the page was later freed.
      
      Move that mem_cgroup_charge down to where we're sure it's a migration entry.
      We were already under i_mmap_lock or anon_vma->lock, so its GFP_KERNEL was
      already inappropriate: change that to GFP_ATOMIC.
      
      It's essential that remove_migration_pte removes all the migration entries,
      other crashes follow if not.  So proceed even when the charge fails: normally
      it cannot, but after a mem_cgroup_force_empty it might - comment in the code.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98837c7f
    • H
      memcg: when do_swap's do_wp_page fails · 61469f1d
      Hugh Dickins 提交于
      Don't uncharge when do_swap_page's call to do_wp_page fails: the page which
      was charged for is there in the pagetable, and will be correctly uncharged
      when that area is unmapped - it was only its COWing which failed.
      
      And while we're here, remove earlier XXX comment: yes, OR in do_wp_page's
      return value (maybe VM_FAULT_WRITE) with do_swap_page's there; but if it
      fails, mask out success bits, which might confuse some arches e.g.  sparc.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61469f1d
    • H
      memcg: page_cache_release not __free_page · 6dbf6d3b
      Hugh Dickins 提交于
      There's nothing wrong with mem_cgroup_charge failure in do_wp_page and
      do_anonymous page using __free_page, but it does look odd when nearby code
      uses page_cache_release: use that instead (while turning a blind eye to
      ancient inconsistencies of page_cache_release versus put_page).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dbf6d3b
    • H
      memcg: move_lists on page not page_cgroup · 427d5416
      Hugh Dickins 提交于
      Each caller of mem_cgroup_move_lists is having to use page_get_page_cgroup:
      it's more convenient if it acts upon the page itself not the page_cgroup; and
      in a later patch this becomes important to handle within memcontrol.c.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: David Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      427d5416
    • H
      memcg: mm_match_cgroup not vm_match_cgroup · bd845e38
      Hugh Dickins 提交于
      vm_match_cgroup is a perverse name for a macro to match mm with cgroup: rename
      it mm_match_cgroup, matching mm_init_cgroup and mm_free_cgroup.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd845e38
    • B
      Memory controller: rename to Memory Resource Controller · 00f0b825
      Balbir Singh 提交于
      Rename Memory Controller to Memory Resource Controller.  Reflect the same
      changes in the CONFIG definition for the Memory Resource Controller.  Group
      together the config options for Resource Counters and Memory Resource
      Controller.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00f0b825
    • E
      alloc_percpu() fails to allocate percpu data · be852795
      Eric Dumazet 提交于
      Some oprofile results obtained while using tbench on a 2x2 cpu machine were
      very surprising.
      
      For example, loopback_xmit() function was using high number of cpu cycles
      to perform the statistic updates, supposed to be real cheap since they use
      percpu data
      
              pcpu_lstats = netdev_priv(dev);
              lb_stats = per_cpu_ptr(pcpu_lstats, smp_processor_id());
              lb_stats->packets++;  /* HERE : serious contention */
              lb_stats->bytes += skb->len;
      
      struct pcpu_lstats is a small structure containing two longs.  It appears
      that on my 32bits platform, alloc_percpu(8) allocates a single cache line,
      instead of giving to each cpu a separate cache line.
      
      Using the following patch gave me impressive boost in various benchmarks
      ( 6 % in tbench)
      (all percpu_counters hit this bug too)
      
      Long term fix (ie >= 2.6.26) would be to let each CPU allocate their own
      block of memory, so that we dont need to roudup sizes to L1_CACHE_BYTES, or
      merging the SGI stuff of course...
      
      Note : SLUB vs SLAB is important here to *show* the improvement, since they
      dont have the same minimum allocation sizes (8 bytes vs 32 bytes).  This
      could very well explain regressions some guys reported when they switched
      to SLUB.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be852795