1. 28 4月, 2008 9 次提交
    • M
      mm: have zonelist contains structs with both a zone pointer and zone_idx · dd1a239f
      Mel Gorman 提交于
      Filtering zonelists requires very frequent use of zone_idx().  This is costly
      as it involves a lookup of another structure and a substraction operation.  As
      the zone_idx is often required, it should be quickly accessible.  The node idx
      could also be stored here if it was found that accessing zone->node is
      significant which may be the case on workloads where nodemasks are heavily
      used.
      
      This patch introduces a struct zoneref to store a zone pointer and a zone
      index.  The zonelist then consists of an array of these struct zonerefs which
      are looked up as necessary.  Helpers are given for accessing the zone index as
      well as the node index.
      
      [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers]
      [hugh@veritas.com: mm-have-zonelist: fix memcg ooms]
      [hugh@veritas.com: just return do_try_to_free_pages]
      [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd1a239f
    • M
      mm: use two zonelist that are filtered by GFP mask · 54a6eb5c
      Mel Gorman 提交于
      Currently a node has two sets of zonelists, one for each zone type in the
      system and a second set for GFP_THISNODE allocations.  Based on the zones
      allowed by a gfp mask, one of these zonelists is selected.  All of these
      zonelists consume memory and occupy cache lines.
      
      This patch replaces the multiple zonelists per-node with two zonelists.  The
      first contains all populated zones in the system, ordered by distance, for
      fallback allocations when the target/preferred node has no free pages.  The
      second contains all populated zones in the node suitable for GFP_THISNODE
      allocations.
      
      An iterator macro is introduced called for_each_zone_zonelist() that interates
      through each zone allowed by the GFP flags in the selected zonelist.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54a6eb5c
    • M
      mm: remember what the preferred zone is for zone_statistics · 18ea7e71
      Mel Gorman 提交于
      On NUMA, zone_statistics() is used to record events like numa hit, miss and
      foreign.  It assumes that the first zone in a zonelist is the preferred zone.
      When multiple zonelists are replaced by one that is filtered, this is no
      longer the case.
      
      This patch records what the preferred zone is rather than assuming the first
      zone in the zonelist is it.  This simplifies the reading of later patches in
      this set.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18ea7e71
    • M
      mm: introduce node_zonelist() for accessing the zonelist for a GFP mask · 0e88460d
      Mel Gorman 提交于
      Introduce a node_zonelist() helper function.  It is used to lookup the
      appropriate zonelist given a node and a GFP mask.  The patch on its own is a
      cleanup but it helps clarify parts of the two-zonelist-per-node patchset.  If
      necessary, it can be merged with the next patch in this set without problems.
      Reviewed-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e88460d
    • M
      mm: use zonelists instead of zones when direct reclaiming pages · dac1d27b
      Mel Gorman 提交于
      The following patches replace multiple zonelists per node with two zonelists
      that are filtered based on the GFP flags.  The patches as a set fix a bug with
      regard to the use of MPOL_BIND and ZONE_MOVABLE.  With this patchset, the
      MPOL_BIND will apply to the two highest zones when the highest zone is
      ZONE_MOVABLE.  This should be considered as an alternative fix for the
      MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters
      only custom zonelists.
      
      The first patch cleans up an inconsistency where direct reclaim uses
      zonelist->zones where other places use zonelist.
      
      The second patch introduces a helper function node_zonelist() for looking up
      the appropriate zonelist for a GFP mask which simplifies patches later in the
      set.
      
      The third patch defines/remembers the "preferred zone" for numa statistics, as
      it is no longer always the first zone in a zonelist.
      
      The forth patch replaces multiple zonelists with two zonelists that are
      filtered.  The two zonelists are due to the fact that the memoryless patchset
      introduces a second set of zonelists for __GFP_THISNODE.
      
      The fifth patch introduces helper macros for retrieving the zone and node
      indices of entries in a zonelist.
      
      The final patch introduces filtering of the zonelists based on a nodemask.
      Two zonelists exist per node, one for normal allocations and one for
      __GFP_THISNODE.
      
      Performance results varied depending on the machine configuration.  In real
      workloads the gain/loss will depend on how much the userspace portion of the
      benchmark benefits from having more cache available due to reduced referencing
      of zonelists.
      
      These are the range of performance losses/gains when running against
      2.6.24-rc4-mm1.  The set and these machines are a mix of i386, x86_64 and
      ppc64 both NUMA and non-NUMA.
      			     loss   to  gain
      Total CPU time on Kernbench: -0.86% to  1.13%
      Elapsed   time on Kernbench: -0.79% to  0.76%
      page_test from aim9:         -4.37% to  0.79%
      brk_test  from aim9:         -0.71% to  4.07%
      fork_test from aim9:         -1.84% to  4.60%
      exec_test from aim9:         -0.71% to  1.08%
      
      This patch:
      
      The allocator deals with zonelists which indicate the order in which zones
      should be targeted for an allocation.  Similarly, direct reclaim of pages
      iterates over an array of zones.  For consistency, this patch converts direct
      reclaim to use a zonelist.  No functionality is changed by this patch.  This
      simplifies zonelist iterators in the next patch.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dac1d27b
    • N
      mm: remove nopage · 3c18ddd1
      Nick Piggin 提交于
      Nothing in the tree uses nopage any more.  Remove support for it in the
      core mm code and documentation (and a few stray references to it in
      comments).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c18ddd1
    • H
      remove sparse warning for mmzone.h · ddc81ed2
      Harvey Harrison 提交于
      include/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction
      
      Calculate the offset into the node_zones array rather than the index
      using casts to (char *) and comparing against the index * sizeof(struct zone).
      
      On X86_32 this saves a sar, but code size increases by one byte per
      is_highmem() use due to 32-bit cmps rather than 16 bit cmps.
      
      Before:
       207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
       20d:   c1 f8 0b                sar    $0xb,%eax
       210:   83 f8 02                cmp    $0x2,%eax
       213:   74 16                   je     22b <kmap_atomic_prot+0x144>
       215:   83 f8 03                cmp    $0x3,%eax
       218:   0f 85 8f 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
       21e:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
       225:   0f 85 82 00 00 00       jne    2ad <kmap_atomic_prot+0x1c6>
       22b:   64 a1 00 00 00 00       mov    %fs:0x0,%eax
      
      After:
       207:   2b 80 8c 07 00 00       sub    0x78c(%eax),%eax
       20d:   3d 00 10 00 00          cmp    $0x1000,%eax
       212:   74 18                   je     22c <kmap_atomic_prot+0x145>
       214:   3d 00 18 00 00          cmp    $0x1800,%eax
       219:   0f 85 8f 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
       21f:   83 3d 00 00 00 00 02    cmpl   $0x2,0x0
       226:   0f 85 82 00 00 00       jne    2ae <kmap_atomic_prot+0x1c7>
       22c:   64 a1 00 00 00 00       mov    %fs:0x0,%eax
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddc81ed2
    • C
      Remove set_migrateflags() · 488514d1
      Christoph Lameter 提交于
      Migrate flags must be set on slab creation as agreed upon when the antifrag
      logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
      unmovable and others in the reclaimable section depending on which flag was
      active when a new slab page was allocated.
      
      This likely slid in somehow when antifrag was merged. Remove it.
      
      The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
      SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
      effect there.
      
      Radix tree allocations are not directly reclaimable but they are allocated
      with __GFP_RECLAIMABLE set on each allocation.  We now set
      SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
      tree slabs are consistently placed in the reclaimable section.  Radix tree
      slabs will also be accounted as such.
      
      There is then no user left of set_migratepages. So remove it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      488514d1
    • B
      hotplug memory remove: generic __remove_pages() support · ea01ea93
      Badari Pulavarty 提交于
      Generic helper function to remove section mappings and sysfs entries for the
      section of the memory we are removing.  offline_pages() correctly adjusted
      zone and marked the pages reserved.
      
      TODO: Yasunori Goto is working on patches to free up allocations from bootmem.
      Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com>
      Acked-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea01ea93
  2. 27 4月, 2008 31 次提交