1. 28 3月, 2006 5 次提交
    • K
      [PATCH] uninline zone helpers · 95144c78
      KAMEZAWA Hiroyuki 提交于
      Helper functions for for_each_online_pgdat/for_each_zone look too big to be
      inlined.  Speed of these helper macro itself is not very important.  (inner
      loops are tend to do more work than this)
      
      This patch make helper function to be out-of-lined.
      
      	inline		out-of-line
      .text   005c0680        005bf6a0
      
      005c0680 - 005bf6a0 = FE0 = 4Kbytes.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      95144c78
    • K
      [PATCH] for_each_online_pgdat: remove pgdat_list · ae0f15fb
      KAMEZAWA Hiroyuki 提交于
      By using for_each_online_pgdat(), pgdat_list is not necessary now.  This patch
      removes it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ae0f15fb
    • K
      [PATCH] define for_each_online_pgdat · 8357f869
      KAMEZAWA Hiroyuki 提交于
      This patch defines for_each_online_pgdat() as a replacement of
      for_each_pgdat()
      
      Now, online nodes are managed by node_online_map.  But for_each_pgdat()
      uses pgdat_link to iterate over all nodes(pgdat).  This means management
      structure for online pgdat is duplicated.
      
      I think using node_online_map for for_each_pgdat() is simple and sane
      rather ather than pgdat_link.  New macro is named as
      for_each_online_pgdat().  Following patch will fix callers of
      for_each_pgdat().
      
      The bootmem allocater uses for_each_pgdat() before pgdat initialization.  I
      don't think it's sane.  Following patch will fix it.
      Signed-off-by: NYasunori Goto     <y-goto@jp.fujitsu.com>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8357f869
    • K
      [PATCH] remove zone_mem_map · a0140c1d
      KAMEZAWA Hiroyuki 提交于
      This patch removes zone_mem_map.
      
      pfn_to_page uses pgdat, page_to_pfn uses zone.  page_to_pfn can use pgdat
      instead of zone, which is only one user of zone_mem_map.  By modifing it,
      we can remove zone_mem_map.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Christoph Lameter <christoph@lameter.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a0140c1d
    • K
      [PATCH] unify pfn_to_page: generic functions · a117e66e
      KAMEZAWA Hiroyuki 提交于
      There are 3 memory models, FLATMEM, DISCONTIGMEM, SPARSEMEM.
      Each arch has its own page_to_pfn(), pfn_to_page() for each models.
      But most of them can use the same arithmetic.
      
      This patch adds asm-generic/memory_model.h, which includes generic
      page_to_pfn(), pfn_to_page() definitions for each memory model.
      
      When CONFIG_OUT_OF_LINE_PFN_TO_PAGE=y, out-of-line functions are
      used instead of macro. This is enabled by some archs and  reduces
      text size.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a117e66e
  2. 02 2月, 2006 2 次提交
  3. 19 1月, 2006 1 次提交
  4. 12 1月, 2006 1 次提交
  5. 09 1月, 2006 2 次提交
  6. 07 1月, 2006 8 次提交
  7. 23 11月, 2005 1 次提交
    • L
      Fix up GFP_ZONEMASK for GFP_DMA32 usage · ac3461ad
      Linus Torvalds 提交于
      There was some confusion about the different zone usage, this should fix
      up the resulting mess in the GFP zonemask handling.
      
      The different zone usage is still confusing (it's very easy to mix up
      the individual zone numbers with the GFP zone _list_ numbers), so we
      might want to clean up some of this in the future, but in the meantime
      this should fix the actual problems.
      Acked-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ac3461ad
  8. 15 11月, 2005 3 次提交
    • A
      [PATCH] x86_64: Speed up numa_node_id by putting it directly into the PDA · 69d81fcd
      Andi Kleen 提交于
      Not go from the CPU number to an mapping array.
      Mode number is often used now in fast paths.
      
      This also adds a generic numa_node_id to all the topology includes
      
      Suggested by Eric Dumazet
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      69d81fcd
    • A
      [PATCH] x86_64: Remove obsolete ARCH_HAS_ATOMIC_UNSIGNED and page_flags_t · 07808b74
      Andi Kleen 提交于
      Has been introduced for x86-64 at some point to save memory
      in struct page, but has been obsolete for some time. Just
      remove it.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      07808b74
    • A
      [PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424
      Andi Kleen 提交于
      Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.
      
      As a bit of historical background: when the x86-64 port
      was originally designed we had some discussion if we should
      use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
      both. Both was ruled out at this point because it was in early
      2.4 when VM is still quite shakey and had bad troubles even
      dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
      because we worried about older soundcards and the floppy.
      
      But this has always caused problems since then because
      device drivers had trouble getting enough DMA able memory. These days
      the VM works much better and the wide use of NUMA has proven
      it can deal with many zones successfully.
      
      So this patch adds both zones.
      
      This helps drivers who need a lot of memory below 4GB because
      their hardware is not accessing more (graphic drivers - proprietary
      and free ones, video frame buffer drivers, sound drivers etc.).
      Previously they could only use IOMMU+16MB GFP_DMA, which
      was not enough memory.
      
      Another common problem is that hardware who has full memory
      addressing for >4GB misses it for some control structures in memory
      (like transmit rings or other metadata).  They tended to allocate memory
      in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
      but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
      (even on AMD systems the IOMMU tends to be quite small) especially if you have
      many devices.  With the new zone pci_alloc_consistent can just put
      this stuff into memory below 4GB which works better.
      
      One argument was still if the zone should be 4GB or 2GB. The main
      motivation for 2GB would be an unnamed not so unpopular hardware
      raid controller (mostly found in older machines from a particular four letter
      company) who has a strange 2GB restriction in firmware. But
      that one works ok with swiotlb/IOMMU anyways, so it doesn't really
      need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
      it seems to be the most common restriction.
      
      The new zone is so far added only for x86-64.
      
      For other architectures who don't set up this
      new zone nothing changes. Architectures can set a compatibility
      define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
      as GFP_DMA. Otherwise it's a nop because on 32bit architectures
      it's normally not needed because GFP_NORMAL (=0) is DMA able
      enough.
      
      One problem is still that GFP_DMA means different things on different
      architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
      (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
      the swiotlb because 16MB is not enough) ... . This was quite
      ugly and is now obsolete.
      
      These should be now converted to use GFP_DMA32 unconditionally. I haven't done
      this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
      which will use GFP_DMA32 transparently.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a2f1b424
  9. 14 11月, 2005 1 次提交
  10. 30 10月, 2005 3 次提交
  11. 28 10月, 2005 1 次提交
  12. 05 9月, 2005 3 次提交
    • D
      [PATCH] sparsemem extreme: hotplug preparation · 28ae55c9
      Dave Hansen 提交于
      This splits up sparse_index_alloc() into two pieces.  This is needed
      because we'll allocate the memory for the second level in a different place
      from where we actually consume it to keep the allocation from happening
      underneath a lock
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      28ae55c9
    • B
      [PATCH] sparsemem extreme implementation · 3e347261
      Bob Picco 提交于
      With cleanups from Dave Hansen <haveblue@us.ibm.com>
      
      SPARSEMEM_EXTREME makes mem_section a one dimensional array of pointers to
      mem_sections.  This two level layout scheme is able to achieve smaller
      memory requirements for SPARSEMEM with the tradeoff of an additional shift
      and load when fetching the memory section.  The current SPARSEMEM
      implementation is a one dimensional array of mem_sections which is the
      default SPARSEMEM configuration.  The patch attempts isolates the
      implementation details of the physical layout of the sparsemem section
      array.
      
      SPARSEMEM_EXTREME requires bootmem to be functioning at the time of
      memory_present() calls.  This is not always feasible, so architectures
      which do not need it may allocate everything statically by using
      SPARSEMEM_STATIC.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3e347261
    • B
      [PATCH] SPARSEMEM EXTREME · 802f192e
      Bob Picco 提交于
      A new option for SPARSEMEM is ARCH_SPARSEMEM_EXTREME.  Architecture
      platforms with a very sparse physical address space would likely want to
      select this option.  For those architecture platforms that don't select the
      option, the code generated is equivalent to SPARSEMEM currently in -mm.
      I'll be posting a patch on ia64 ml which uses this new SPARSEMEM feature.
      
      ARCH_SPARSEMEM_EXTREME makes mem_section a one dimensional array of
      pointers to mem_sections.  This two level layout scheme is able to achieve
      smaller memory requirements for SPARSEMEM with the tradeoff of an
      additional shift and load when fetching the memory section.  The current
      SPARSEMEM -mm implementation is a one dimensional array of mem_sections
      which is the default SPARSEMEM configuration.  The patch attempts isolates
      the implementation details of the physical layout of the sparsemem section
      array.
      
      ARCH_SPARSEMEM_EXTREME depends on 64BIT and is by default boolean false.
      
      I've boot tested under aim load ia64 configured for ARCH_SPARSEMEM_EXTREME.
       I've also boot tested a 4 way Opteron machine with !ARCH_SPARSEMEM_EXTREME
      and tested with aim.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      802f192e
  13. 24 6月, 2005 7 次提交
    • A
      [PATCH] sparsemem hotplug base · 29751f69
      Andy Whitcroft 提交于
      Make sparse's initalization be accessible at runtime.  This allows sparse
      mappings to be created after boot in a hotplug situation.
      
      This patch is separated from the previous one just to give an indication how
      much of the sparse infrastructure is *just* for hotplug memory.
      
      The section_mem_map doesn't really store a pointer.  It stores something that
      is convenient to do some math against to get a pointer.  It isn't valid to
      just do *section_mem_map, so I don't think it should be stored as a pointer.
      
      There are a couple of things I'd like to store about a section.  First of all,
      the fact that it is !NULL does not mean that it is present.  There could be
      such a combination where section_mem_map *is* NULL, but the math gets you
      properly to a real mem_map.  So, I don't think that check is safe.
      
      Since we're storing 32-bit-aligned structures, we have a few bits in the
      bottom of the pointer to play with.  Use one bit to encode whether there's
      really a mem_map there, and the other one to tell whether there's a valid
      section there.  We need to distinguish between the two because sometimes
      there's a gap between when a section is discovered to be present and when we
      can get the mem_map for it.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NJack Steiner <steiner@sgi.com>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      29751f69
    • A
      [PATCH] sparsemem swiss cheese numa layouts · 641c7673
      Andy Whitcroft 提交于
      The part of the sparsemem patch which modifies memmap_init_zone() has recently
      become a problem.  It changes behavior so that there is a call to
      pfn_to_page() for each individual page inside of a node's range:
      node_start_pfn through node_end_pfn.  It used to simply do this once, at the
      beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
      of a node made it necessary to change.
      
      Mike Kravetz recently wrote a patch which made the NUMA code accept some new
      kinds of layouts.  The system's memory was laid out like this, with node 0's
      memory in two pieces: one before and one after node 1's memory:
      
      	Node 0: +++++     +++++
      	Node 1:      +++++
      
      Previous behavior before Mike's patch was to assign nodes like this:
      
      	Node 0: 00000     XXXXX
      	Node 1:      11111
      
      Where the 'X' areas were simply thrown away.  The new behavior was to make the
      pg_data_t span node 0 across all of its areas, including areas that are really
      node 1's: Node 0: 000000000000000 Node 1: 11111
      
      This wastes a little bit of mem_map space, but ends up being OK, and more
      fully utilizes the system's memory.  memmap_init_zone() initializes all of the
      "struct page"s for node 0, even for the "hole", but those never get used,
      because there is no pfn_to_page() that resolves to those pages.  However, only
      calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
      allocated for node0->node_mem_map because:
      
      	struct page *start = pfn_to_page(start_pfn);
      	// effectively start = &node->node_mem_map[0]
      	for (page = start; page < (start + size); page++) {
      		init_page_here();...
      		page++;
      	}
      
      Slow, and wasteful, but generally harmless.
      
      But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
      does):
      
      	for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
      		page = pfn_to_page(pfn);
      	}
      
      And you end up trying to initialize node 1's pages too early, along with bogus
      data from node 0.  This patch checks for those weird layouts and declines to
      touch the pages, making the more frequent pfn_to_page() calls OK to do.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      641c7673
    • A
      [PATCH] sparsemem memory model · d41dee36
      Andy Whitcroft 提交于
      Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
      mem_map[] is needed by discontiguous memory machines (like in the old
      CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
      replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
      become a complete replacement.
      
      A significant advantage over DISCONTIGMEM is that it's completely separated
      from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
      and DISCONTIG are often confused.
      
      Another advantage is that sparse doesn't require each NUMA node's ranges to be
      contiguous.  It can handle overlapping ranges between nodes with no problems,
      where DISCONTIGMEM currently throws away that memory.
      
      Sparsemem uses an array to provide different pfn_to_page() translations for
      each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
      to be chopped up.
      
      In order to do quick pfn_to_page() operations, the section number of the page
      is encoded in page->flags.  Part of the sparsemem infrastructure enables
      sharing of these bits more dynamically (at compile-time) between the
      page_zone() and sparsemem operations.  However, on 32-bit architectures, the
      number of bits is quite limited, and may require growing the size of the
      page->flags type in certain conditions.  Several things might force this to
      occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
      memory), an increase in the physical address space, or an increase in the
      number of used page->flags.
      
      One thing to note is that, once sparsemem is present, the NUMA node
      information no longer needs to be stored in the page->flags.  It might provide
      speed increases on certain platforms and will be stored there if there is
      room.  But, if out of room, an alternate (theoretically slower) mechanism is
      used.
      
      This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
      there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
      often have to compile out the same areas of code.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NYasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NBob Picco <bob.picco@hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d41dee36
    • A
      [PATCH] generify early_pfn_to_nid · b159d43f
      Andy Whitcroft 提交于
      Provide a default implementation for early_pfn_to_nid returning node 0.  Allow
      architectures to override this with their own implementation out of
      asm/mmzone.h.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NMartin Bligh <mbligh@aracnet.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b159d43f
    • D
      [PATCH] Introduce new Kconfig option for NUMA or DISCONTIG · 93b7504e
      Dave Hansen 提交于
      There is some confusion that arose when working on SPARSEMEM patch between
      what is needed for DISCONTIG vs. NUMA.
      
      Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
      All of the current NUMA implementations require an implementation of
      DISCONTIG.  Because of this, quite a lot of code which is really needed for
      NUMA is actually under DISCONTIG #ifdefs.  For SPARSEMEM, we changed some
      of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
      case.
      
      Introducing this new NEED_MULTIPLE_NODES config option allows code that is
      needed for both NUMA or DISCONTIG to be separated out from code that is
      specific to DISCONTIG.
      
      One great advantage of this approach is that it doesn't require every
      architecture to be converted over.  All of the current implementations
      should "just work", only the ones implementing SPARSEMEM will have to be
      fixed up.
      
      The change to free_area_init() makes it work inside, or out of the new
      config option.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      93b7504e
    • D
      [PATCH] sparsemem base: reorganize page->flags bit operations · 348f8b6c
      Dave Hansen 提交于
      Generify the value fields in the page_flags.  The aim is to allow the location
      and size of these fields to be varied.  Additionally we want to move away from
      fixed allocations per field whilst still enforcing the overall bit utilisation
      limits.  We rely on the compiler to spot and optimise the accessor functions.
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      348f8b6c
    • D
      [PATCH] remove non-DISCONTIG use of pgdat->node_mem_map · 408fde81
      Dave Hansen 提交于
      This patch effectively eliminates direct use of pgdat->node_mem_map outside
      of the DISCONTIG code.  On a flat memory system, these fields aren't
      currently used, neither are they on a sparsemem system.
      
      There was also a node_mem_map(nid) macro on many architectures.  Its use
      along with the use of ->node_mem_map itself was not consistent.  It has
      been removed in favor of two new, more explicit, arch-independent macros:
      
      	pgdat_page_nr(pgdat, pagenr)
      	nid_page_nr(nid, pagenr)
      
      I called them "pgdat" and "nid" because we overload the term "node" to mean
      "NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
      believe the newer names are much clearer.
      
      These macros can be overridden in the sparsemem case with a theoretically
      slower operation using node_start_pfn and pfn_to_page(), instead.  We could
      make this the only behavior if people want, but I don't want to change too
      much at once.  One thing at a time.
      
      This patch removes more code than it adds.
      
      Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
      generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
      here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/
      
      Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NMartin J. Bligh <mbligh@aracnet.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      408fde81
  14. 22 6月, 2005 2 次提交
    • C
      [PATCH] node local per-cpu-pages · e7c8d5c9
      Christoph Lameter 提交于
      This patch modifies the way pagesets in struct zone are managed.
      
      Each zone has a per-cpu array of pagesets.  So any particular CPU has some
      memory in each zone structure which belongs to itself.  Even if that CPU is
      not local to that zone.
      
      So the patch relocates the pagesets for each cpu to the node that is nearest
      to the cpu instead of allocating the pagesets in the (possibly remote) target
      zone.  This means that the operations to manage pages on remote zone can be
      done with information available locally.
      
      We play a macro trick so that non-NUMA pmachines avoid the additional
      pointer chase on the page allocator fastpath.
      
      AIM7 benchmark on a 32 CPU SGI Altix
      
      w/o patches:
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      484.68  100       484.6769     12.01      1.97   Fri Mar 25 11:01:42 2005
        100    27140.46   89       271.4046     21.44    148.71   Fri Mar 25 11:02:04 2005
        200    30792.02   82       153.9601     37.80    296.72   Fri Mar 25 11:02:42 2005
        300    32209.27   81       107.3642     54.21    451.34   Fri Mar 25 11:03:37 2005
        400    34962.83   78        87.4071     66.59    588.97   Fri Mar 25 11:04:44 2005
        500    31676.92   75        63.3538     91.87    742.71   Fri Mar 25 11:06:16 2005
        600    36032.69   73        60.0545     96.91    885.44   Fri Mar 25 11:07:54 2005
        700    35540.43   77        50.7720    114.63   1024.28   Fri Mar 25 11:09:49 2005
        800    33906.70   74        42.3834    137.32   1181.65   Fri Mar 25 11:12:06 2005
        900    34120.67   73        37.9119    153.51   1325.26   Fri Mar 25 11:14:41 2005
       1000    34802.37   74        34.8024    167.23   1465.26   Fri Mar 25 11:17:28 2005
      
      with slab API changes and pageset patch:
      
      Tasks    jobs/min  jti  jobs/min/task      real       cpu
          1      485.00  100       485.0000     12.00      1.96   Fri Mar 25 11:46:18 2005
        100    28000.96   89       280.0096     20.79    150.45   Fri Mar 25 11:46:39 2005
        200    32285.80   79       161.4290     36.05    293.37   Fri Mar 25 11:47:16 2005
        300    40424.15   84       134.7472     43.19    438.42   Fri Mar 25 11:47:59 2005
        400    39155.01   79        97.8875     59.46    590.05   Fri Mar 25 11:48:59 2005
        500    37881.25   82        75.7625     76.82    730.19   Fri Mar 25 11:50:16 2005
        600    39083.14   78        65.1386     89.35    872.79   Fri Mar 25 11:51:46 2005
        700    38627.83   77        55.1826    105.47   1022.46   Fri Mar 25 11:53:32 2005
        800    39631.94   78        49.5399    117.48   1169.94   Fri Mar 25 11:55:30 2005
        900    36903.70   79        41.0041    141.94   1310.78   Fri Mar 25 11:57:53 2005
       1000    36201.23   77        36.2012    160.77   1458.31   Fri Mar 25 12:00:34 2005
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NShobhit Dayal <shobhit@calsoftinc.com>
      Signed-off-by: NShai Fultheim <Shai@Scalex86.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7c8d5c9
    • M
      [PATCH] VM: rate limit early reclaim · 1e7e5a90
      Martin Hicks 提交于
      When early zone reclaim is turned on the LRU is scanned more frequently when a
      zone is low on memory.  This limits when the zone reclaim can be called by
      skipping the scan if another thread (either via kswapd or sync reclaim) is
      already reclaiming from the zone.
      Signed-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e7e5a90