1. 29 10月, 2010 1 次提交
  2. 21 9月, 2010 1 次提交
  3. 28 8月, 2010 3 次提交
    • Y
      x86: Remove old bootmem code · 774ea0bc
      Yinghai Lu 提交于
      Requested by Ingo, Thomas and HPA.
      
      The old bootmem code is no longer necessary, and the transition is
      complete.  Remove it.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      774ea0bc
    • Y
      x86, memblock: Replace e820_/_early string with memblock_ · a9ce6bc1
      Yinghai Lu 提交于
      1.include linux/memblock.h directly. so later could reduce e820.h reference.
      2 this patch is done by sed scripts mainly
      
      -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      a9ce6bc1
    • Y
      x86: Use memblock to replace early_res · 72d7c3b3
      Yinghai Lu 提交于
      1. replace find_e820_area with memblock_find_in_range
      2. replace reserve_early with memblock_x86_reserve_range
      3. replace free_early with memblock_x86_free_range.
      4. NO_BOOTMEM will switch to use memblock too.
      5. use _e820, _early wrap in the patch, in following patch, will
         replace them all
      6. because memblock_x86_free_range support partial free, we can remove some special care
      7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
         so adjust some calling later in setup.c::setup_arch()
         -- corruption_check and mptable_update
      
      -v2: Move reserve_brk() early
          Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
          that could happen We have more then 128 RAM entry in E820 tables, and
          memblock_x86_fill() could use memblock_find_in_range() to find a new place for
          memblock.memory.region array.
          and We don't need to use extend_brk() after fill_memblock_area()
          So move reserve_brk() early before fill_memblock_area().
      -v3: Move find_smp_config early
          To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
          in right place.
      -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
          memblock.reserved already..
          use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
      -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
          active_region for 32bit does include high pages
          need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
      -v6: Use current_limit instead
      -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
      -v8: Set memblock_can_resize early to handle EFI with more RAM entries
      -v9: update after kmemleak changes in mainline
      Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      72d7c3b3
  4. 28 5月, 2010 1 次提交
  5. 16 2月, 2010 3 次提交
    • D
      x86, numa: Remove configurable node size support for numa emulation · ca2107c9
      David Rientjes 提交于
      Now that numa=fake=<size>[MG] is implemented, it is possible to remove
      configurable node size support.  The command-line parsing was already
      broken (numa=fake=*128, for example, would not work) and since fake nodes
      are now interleaved over physical nodes, this support is no longer
      required.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151343080.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ca2107c9
    • D
      x86, numa: Add fixed node size option for numa emulation · 8df5bb34
      David Rientjes 提交于
      numa=fake=N specifies the number of fake nodes, N, to partition the
      system into and then allocates them by interleaving over physical nodes.
      This requires knowledge of the system capacity when attempting to
      allocate nodes of a certain size: either very large nodes to benchmark
      scalability of code that operates on individual nodes, or very small
      nodes to find bugs in the VM.
      
      This patch introduces numa=fake=<size>[MG] so it is possible to specify
      the size of each node to allocate.  When used, nodes of the size
      specified will be allocated and interleaved over the set of physical
      nodes.
      
      FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
      include/asm/numa_64.h.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      8df5bb34
    • D
      x86, numa: Fix numa emulation calculation of big nodes · 68fd111e
      David Rientjes 提交于
      numa=fake=N uses split_nodes_interleave() to partition the system into N
      fake nodes.  Each node size must have be a multiple of
      FAKE_NODE_MIN_SIZE, otherwise it is possible to get strange alignments.
      Because of this, the remaining memory from each node when rounded to
      FAKE_NODE_MIN_SIZE is consolidated into a number of "big nodes" that are
      bigger than the rest.
      
      The calculation of the number of big nodes is incorrect since it is using
      a logical AND operator when it should be multiplying the rounded-off
      portion of each node with N.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1002151342230.26927@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      68fd111e
  6. 13 2月, 2010 1 次提交
  7. 11 2月, 2010 2 次提交
    • Y
      x86: Make early_node_mem get mem > 4 GB if possible · cef625ee
      Yinghai Lu 提交于
      So we could put pgdata for the node high, and later sparse
      vmmap will get the section nr that need.
      
      With this patch will make <4 GB ram not use a sparse vmmap.
      
      before this patch, will get, before swiotlb try get bootmem
      [    0.000000] nid=1 start=0 end=2080000 aligned=1
      [    0.000000]   free [10 - 96]
      [    0.000000]   free [b12 - 1000]
      [    0.000000]   free [359f - 38a3]
      [    0.000000]   free [38b5 - 3a00]
      [    0.000000]   free [41e01 - 42000]
      [    0.000000]   free [73dde - 73e00]
      [    0.000000]   free [73fdd - 74000]
      [    0.000000]   free [741dd - 74200]
      [    0.000000]   free [743dd - 74400]
      [    0.000000]   free [745dd - 74600]
      [    0.000000]   free [747dd - 74800]
      [    0.000000]   free [749dd - 74a00]
      [    0.000000]   free [74bdd - 74c00]
      [    0.000000]   free [74ddd - 74e00]
      [    0.000000]   free [74fdd - 75000]
      [    0.000000]   free [751dd - 75200]
      [    0.000000]   free [753dd - 75400]
      [    0.000000]   free [755dd - 75600]
      [    0.000000]   free [757dd - 75800]
      [    0.000000]   free [759dd - 75a00]
      [    0.000000]   free [75bdd - 7bf5f]
      [    0.000000]   free [7f730 - 7f750]
      [    0.000000]   free [100000 - 2080000]
      [    0.000000]   total free 1f87170
      [   93.301474] Placing 64MB software IO TLB between ffff880075bdd000 - ffff880079bdd000
      [   93.311814] software IO TLB at phys 0x75bdd000 - 0x79bdd000
      
      with this patch will get: before swiotlb try get bootmem
      [    0.000000] nid=1 start=0 end=2080000 aligned=1
      [    0.000000]   free [a - 96]
      [    0.000000]   free [702 - 1000]
      [    0.000000]   free [359f - 3600]
      [    0.000000]   free [37de - 3800]
      [    0.000000]   free [39dd - 3a00]
      [    0.000000]   free [3bdd - 3c00]
      [    0.000000]   free [3ddd - 3e00]
      [    0.000000]   free [3fdd - 4000]
      [    0.000000]   free [41dd - 4200]
      [    0.000000]   free [43dd - 4400]
      [    0.000000]   free [45dd - 4600]
      [    0.000000]   free [47dd - 4800]
      [    0.000000]   free [49dd - 4a00]
      [    0.000000]   free [4bdd - 4c00]
      [    0.000000]   free [4ddd - 4e00]
      [    0.000000]   free [4fdd - 5000]
      [    0.000000]   free [51dd - 5200]
      [    0.000000]   free [53dd - 5400]
      [    0.000000]   free [55dd - 7bf5f]
      [    0.000000]   free [7f730 - 7f750]
      [    0.000000]   free [100428 - 100600]
      [    0.000000]   free [13ea01 - 13ec00]
      [    0.000000]   free [170800 - 2080000]
      [    0.000000]   total free 1f87170
      
      [   92.689485] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
      [   92.699799] Placing 64MB software IO TLB between ffff8800055dd000 - ffff8800095dd000
      [   92.710916] software IO TLB at phys 0x55dd000 - 0x95dd000
      
      so will get enough space below 4G, aka pfn 0x100000
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-15-git-send-email-yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      cef625ee
    • Y
      x86: Call early_res_to_bootmem one time · 1842f90c
      Yinghai Lu 提交于
      Simplify setup_node_mem: don't use bootmem from other node, instead
      just find_e820_area in early_node_mem.
      
      This keeps the boundary between early_res and boot mem more clear, and
      lets us only call early_res_to_bootmem() one time instead of for all
      nodes.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <1265793639-15071-12-git-send-email-yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      1842f90c
  8. 23 11月, 2009 2 次提交
    • Y
      x86, numa: Use near(er) online node instead of roundrobin for NUMA · d9c2d5ac
      Yinghai Lu 提交于
      CPU to node mapping is set via the following sequence:
      
       1. numa_init_array(): Set up roundrobin from cpu to online node
      
       2. init_cpu_to_node(): Set that according to apicid_to_node[]
      			according to srat only handle the node that
      			is online, and leave other cpu on node
      			without ram (aka not online) to still
      			roundrobin.
      
      3. later call srat_detect_node for Intel/AMD, will use first_online
         node or nearby node.
      
      Problem is that setup_per_cpu_areas() is not called between 2 and 3,
      the per_cpu for cpu on node with ram is on different node, and could
      put that on node with two hops away.
      
      So try to optimize this and add find_near_online_node() and call
      init_cpu_to_node().
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d9c2d5ac
    • Y
      x86, numa, bootmem: Only free bootmem on NUMA failure path · 021428ad
      Yinghai Lu 提交于
      In the NUMA bootmem setup failure path we freed nodedata_phys
      incorrectly.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4B07A739.3030104@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      021428ad
  9. 13 10月, 2009 3 次提交
    • D
      x86: Interleave emulated nodes over physical nodes · adc19389
      David Rientjes 提交于
      Add interleaved NUMA emulation support
      
      This patch interleaves emulated nodes over the system's physical
      nodes. This is required for interleave optimizations since
      mempolicies, for example, operate by iterating over a nodemask and
      act without knowledge of node distances.  It can also be used for
      testing memory latencies and NUMA bugs in the kernel.
      
      There're a couple of ways to do this:
      
       - divide the number of emulated nodes by the number of physical
         nodes and allocate the result on each physical node, or
      
       - allocate each successive emulated node on a different physical
         node until all memory is exhausted.
      
      The disadvantage of the first option is, depending on the asymmetry
      in node capacities of each physical node, emulated nodes may
      substantially differ in size on a particular physical node compared
      to another.
      
      The disadvantage of the second option is, also depending on the
      asymmetry in node capacities of each physical node, there may be
      more emulated nodes allocated on a single physical node as another.
      
      This patch implements the second option; we sacrifice the
      possibility that we may have slightly more emulated nodes on a
      particular physical node compared to another in lieu of node size
      asymmetry.
      
       [ Note that "node capacity" of a physical node is not only a
         function of its addressable range, but also is affected by
         subtracting out the amount of reserved memory over that range.
         NUMA emulation only deals with available, non-reserved memory
         quantities. ]
      
      We ensure there is at least a minimal amount of available memory
      allocated to each node.  We also make sure that at least this
      amount of available memory is available in ZONE_DMA32 for any node
      that includes both ZONE_DMA32 and ZONE_NORMAL.
      
      This patch also cleans the emulation code up by no longer passing
      the statically allocated struct bootnode array among the various
      functions. This init.data array is not allocated on the stack since
      it may be very large and thus it may be accessed at file scope.
      
      The WARN_ON() for nodes_cover_memory() when faking proximity
      domains is removed since it relies on successive nodes always
      having greater start addresses than previous nodes; with
      interleaving this is no longer always true.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251519150.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      adc19389
    • D
      x86: Export srat physical topology · 8716273c
      David Rientjes 提交于
      This is the counterpart to "x86: export k8 physical topology" for
      SRAT. It is not as invasive because the acpi code already seperates
      node setup into detection and registration steps, with the
      exception of registering e820 active regions in
      acpi_numa_memory_affinity_init().  This is now moved to
      acpi_scan_nodes() if NUMA emulation is disabled or deferred.
      
      acpi_numa_init() now returns a value which specifies whether an
      underlying SRAT was located.  If so, that topology can be used by
      the emulation code to interleave emulated nodes over physical nodes
      or to register the nodes for ACPI.
      
      acpi_get_nodes() may now be used to export the srat physical
      topology of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8716273c
    • D
      x86: Export k8 physical topology · 8ee2debc
      David Rientjes 提交于
      To eventually interleave emulated nodes over physical nodes, we
      need to know the physical topology of the machine without actually
      registering it.  This does the k8 node setup in two parts:
      detection and registration.  NUMA emulation can then used the
      physical topology detected to setup the address ranges of emulated
      nodes accordingly.  If emulation isn't used, the k8 nodes are
      registered as normal.
      
      Two formals are added to the x86 NUMA setup functions: `acpi' and
      `k8'. These represent whether ACPI or K8 NUMA has been detected;
      both cannot be true at the same time.  This specifies to the NUMA
      emulation code whether an underlying physical NUMA topology exists
      and which interface to use.
      
      This patch deals solely with separating the k8 setup path into
      Northbridge detection and registration steps and leaves the ACPI
      changes for a subsequent patch.  The `acpi' formal is added here,
      however, to avoid touching all the header files again in the next
      patch.
      
      This approach also ensures emulated nodes will not span physical
      nodes so the true memory latency is not misrepresented.
      
      k8_get_nodes() may now be used to export the k8 physical topology
      of the machine for NUMA emulation.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Len Brown <len.brown@intel.com>
      LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8ee2debc
  10. 18 5月, 2009 2 次提交
    • Y
      x86, mm: Fix node_possible_map logic · 7c43769a
      Yinghai Lu 提交于
      Recently there were some changes to the meaning of node_possible_map,
      and it is quite strange:
      
      - the node without memory would be set in node_possible_map
      - but some node with less NODE_MIN_SIZE will be kicked out of node_possible_map.
      
      fix it by adding strict_setup_node_bootmem().
      
      Also, remove unparse_node().
      
      so result will be:
      
      1. cpu_to_node() will return online node only (nearest one)
      2. apicid_to_node() still returns the node that could be not online but is set
         in node_possible_map.
      3. node_possible_map will include nodes that mem on it are less NODE_MIN_SIZE
      
      v2: after move_cpus_to_node change.
      
      [ Impact: get node_possible_map right ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NJack Steiner <steiner@sgi.com>
      LKML-Reference: <4A0C49BE.6080800@kernel.org>
      [ v3: various small cleanups and comment clarifications ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7c43769a
    • Y
      mm, x86: remove MEMORY_HOTPLUG_RESERVE related code · 888a589f
      Yinghai Lu 提交于
      after:
      
       | commit b263295d
       | Author: Christoph Lameter <clameter@sgi.com>
       | Date:   Wed Jan 30 13:30:47 2008 +0100
       |
       |    x86: 64-bit, make sparsemem vmemmap the only memory model
      
      we don't have MEMORY_HOTPLUG_RESERVE anymore.
      
      Historically, x86-64 had an architecture-specific method for memory hotplug
      whereby it scanned the SRAT for physical memory ranges that could be
      potentially used for memory hot-add later. By reserving those ranges
      without physical memory, the memmap would be allocated and left dormant
      until needed. This depended on the DISCONTIG memory model which has been
      removed so the code implementing HOTPLUG_RESERVE is now dead.
      
      This patch removes the dead code used by MEMORY_HOTPLUG_RESERVE.
      
      (Changelog authored by Mel.)
      
      v2: updated changelog, and remove hotadd= in doc
      
      [ Impact: remove dead code ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Workflow-found-OK-by: NAndrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <4A0C4910.7090508@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      888a589f
  11. 11 5月, 2009 1 次提交
  12. 23 4月, 2009 1 次提交
    • Y
      x86: check boundary in setup_node_bootmem() · 4c31e92b
      Yinghai Lu 提交于
      Commit dc098551 ("x86/uv: fix init of memory-less nodes") causes a
      two sockets system (where node-1 doesn't have RAM installed) to crash.
      
      That commit makes node_possible include cpu nodes that do not have memory.
      So check boundary in setup_node_bootmem().
      
      [ Impact: fix boot crash on RAM-less NUMA node system ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Jack Steiner <steiner@sgi.com>
      LKML-Reference: <49EF89DF.9090404@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4c31e92b
  13. 13 3月, 2009 4 次提交
  14. 19 2月, 2009 1 次提交
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  15. 09 2月, 2009 1 次提交
  16. 27 1月, 2009 1 次提交
  17. 17 12月, 2008 1 次提交
  18. 26 7月, 2008 1 次提交
  19. 25 7月, 2008 1 次提交
  20. 22 7月, 2008 1 次提交
  21. 08 7月, 2008 6 次提交
    • Y
      x86: remove end_pfn in 64bit · c987d12f
      Yinghai Lu 提交于
      and use max_pfn directly.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c987d12f
    • Y
      x86: introduce initmem_init for 64 bit · 1f75d7e3
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f75d7e3
    • T
      x86: numa_64.c fix shadowed variable · 886533a3
      Thomas Gleixner 提交于
      sparse mutters:
      arch/x86/mm/numa_64.c:195:27: warning: symbol 'end_pfn' shadows an earlier one
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      886533a3
    • T
      x86: numa_64.c make local variables static · 864fc31e
      Thomas Gleixner 提交于
      plat_node_bdata, cmdline, nodemap_addr, nodemap_size are local to
      numa_64.c. Make them static
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      864fc31e
    • M
      x86: remove the static 256k node_to_cpumask_map · 9f248bde
      Mike Travis 提交于
        * Consolidate node_to_cpumask operations and remove the 256k
          byte node_to_cpumask_map.  This is done by allocating the
          node_to_cpumask_map array after the number of possible nodes
          (nr_node_ids) is known.
      
        * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
          been increased.  It now shows faults when calling node_to_cpumask()
          and node_to_cpumask_ptr().
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9f248bde
    • M
      x86: cleanup early per cpu variables/accesses v4 · 23ca4bba
      Mike Travis 提交于
        * Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
          used by some per_cpu variables that are initialized and accessed
          before there are per_cpu areas allocated.
      
          ["Early" in respect to per_cpu variables is "earlier than the per_cpu
          areas have been setup".]
      
          This patchset adds these new macros:
      
      	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
      	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
      	DECLARE_EARLY_PER_CPU(_type, _name)
      
      	early_per_cpu_ptr(_name)
      	early_per_cpu_map(_name, _idx)
      	early_per_cpu(_name, _cpu)
      
          The DEFINE macro defines the per_cpu variable as well as the early
          map and pointer.  It also initializes the per_cpu variable and map
          elements to "_initvalue".  The early_* macros provide access to
          the initial map (usually setup during system init) and the early
          pointer.  This pointer is initialized to point to the early map
          but is then NULL'ed when the actual per_cpu areas are setup.  After
          that the per_cpu variable is the correct access to the variable.
      
          The early_per_cpu() macro is not very efficient but does show how to
          access the variable if you have a function that can be called both
          "early" and "late".  It tests the early ptr to be NULL, and if not
          then it's still valid.  Otherwise, the per_cpu variable is used
          instead:
      
      	#define early_per_cpu(_name, _cpu) 			\
      		(early_per_cpu_ptr(_name) ?			\
      			early_per_cpu_ptr(_name)[_cpu] :	\
      			per_cpu(_name, _cpu))
      
          A better method is to actually check the pointer manually.  In the
          case below, numa_set_node can be called both "early" and "late":
      
      	void __cpuinit numa_set_node(int cpu, int node)
      	{
      	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
      
      	    if (cpu_to_node_map)
      		    cpu_to_node_map[cpu] = node;
      	    else
      		    per_cpu(x86_cpu_to_node_map, cpu) = node;
      	}
      
        * Add a flag "arch_provides_topology_pointers" that indicates pointers
          to topology cpumask_t maps are available.  Otherwise, use the function
          returning the cpumask_t value.  This is useful if cpumask_t set size
          is very large to avoid copying data on to/off of the stack.
      
        * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
          the non-debug case has been optimized a bit.
      
        * Remove an unreferenced compiler warning in drivers/base/topology.c
      
        * Clean up #ifdef in setup.c
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      23ca4bba
  22. 25 5月, 2008 1 次提交
  23. 27 4月, 2008 1 次提交
    • Y
      x86_64: fix setup_node_bootmem to support big mem excluding with memmap · 1a27fc0a
      Yinghai Lu 提交于
      typical case: four sockets system, every node has 4g ram, and we are using:
      
      	memmap=10g$4g
      
      to mask out memory on node1 and node2
      
      when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
      
      if it can not get memory from the same node with find_e820_area(), it will
      use alloc_bootmem to get buff from previous nodes.
      
      so check it and print out some info about it.
      
      need to move early_res_to_bootmem into every setup_node_bootmem.
      and it takes range that node has. otherwise alloc_bootmem could return addr
      that reserved early.
      
      depends on "mm: make reserve_bootmem can crossed the nodes".
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a27fc0a