1. 30 1月, 2008 12 次提交
  2. 20 10月, 2007 1 次提交
    • M
      x86: convert cpu_to_apicid to be a per cpu variable · 71fff5e6
      Mike Travis 提交于
      This patch converts the x86_cpu_to_apicid array to be a per cpu
      variable. This saves sizeof(apicid) * NR unused cpus.  Access is mostly
      from startup and CPU HOTPLUG functions.
      
      MP_processor_info() is one of the functions that require access to the
      x86_cpu_to_apicid array before the per_cpu data area is setup.  For this
      case, a pointer to the __initdata array is initialized in setup_arch()
      and removed in smp_prepare_cpus() after the per_cpu data area is
      initialized.
      
      A second change is included to change the initial array value of ARCH
      i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      71fff5e6
  3. 18 10月, 2007 2 次提交
    • M
      x86: fix cpu_to_node references · 98c9e27a
      Mike Travis 提交于
      In x86_64 and i386 architectures most arrays that are sized using
      NR_CPUS lay in local memory on node 0.  Not only will most (99%?) of the
      systems not use all the slots in these arrays, particularly when NR_CPUS
      is increased to accommodate future very high cpu count systems, but a
      number of cache lines are passed unnecessarily on the system bus when
      these arrays are referenced by cpus on other nodes.
      
      Typically, the values in these arrays are referenced by the cpu
      accessing it's own values, though when passing IPI interrupts, the cpu
      does access the data relevant to the targeted cpu/node.  Of course, if
      the referencing cpu is not on node 0, then the reference will still
      require cross node exchanges of cache lines.  A common use of this is
      for an interrupt service routine to pass the interrupt to other cpus
      local to that node.
      
      Ideally, all the elements in these arrays should be moved to the per_cpu
      data area.  In some cases (such as x86_cpu_to_apicid) the array is
      referenced before the per_cpu data areas are setup.  In this case, a
      static array is declared in the __initdata area and initialized by the
      booting cpu (BSP).  The values are then moved to the per_cpu area after
      it is initialized and the original static array is freed with the rest
      of the __initdata.
      
      This patch:
      
      Fix four instances where cpu_to_node is referenced by array instead of
      via the cpu_to_node macro.  This is preparation to moving it to the
      per_cpu data area.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      98c9e27a
    • Y
      x86: 0 -> NULL, for arch/x86_64 · 83e83d54
      Yoann Padioleau 提交于
      When comparing a pointer, it's clearer to compare it to NULL than to 0.
      
      [ tglx: arch/x86 adaptation ]
      Signed-off-by: NYoann Padioleau <padator@wanadoo.fr>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: ak@suse.de
      Cc: discuss@x86-64.org
      Cc: akpm@linux-foundation.org
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      83e83d54
  4. 11 10月, 2007 2 次提交
  5. 22 7月, 2007 3 次提交
  6. 03 5月, 2007 4 次提交
    • S
      [PATCH] x86-64: set node_possible_map at runtime - try 2 · e3f1caee
      Suresh Siddha 提交于
      Set the node_possible_map at runtime on x86_64.  On a non NUMA system,
      num_possible_nodes() will now say '1'.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Eric Dumazet <dada1@cosmosbay.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      e3f1caee
    • D
      [PATCH] x86-64: fixed size remaining fake nodes · 382591d5
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to split the remaining system
      memory into nodes of fixed size.  Any leftover memory is allocated to a final
      node unless the command-line ends with a comma.
      
      For example:
        numa=fake=2*512,*128	gives two 512M nodes and the remaining system
      			memory is split into nodes of 128M each.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the size of the remaining nodes to be allocated is
      known based on their capacity for resource management.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      382591d5
    • D
      [PATCH] x86-64: split remaining fake nodes equally · 14694d73
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to split the remaining
      system memory into equal-sized nodes.
      
      For example:
      numa=fake=2*512,4*	gives two 512M nodes and the remaining system
      			memory is split into four approximately equal
      			chunks.
      
      This is beneficial for systems where the exact size of RAM is unknown or not
      necessarily relevant, but the granularity with which nodes shall be allocated
      is known.
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      14694d73
    • D
      [PATCH] x86-64: configurable fake numa node sizes · 8b8ca80e
      David Rientjes 提交于
      Extends the numa=fake x86_64 command-line option to allow for configurable
      node sizes.  These nodes can be used in conjunction with cpusets for coarse
      memory resource management.
      
      The old command-line option is still supported:
        numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
      		actual machine.
      
      But now you may configure your system for the node sizes of your choice:
        numa=fake=2*512,1024,2*256
      		gives two 512M nodes, one 1024M node, two 256M nodes, and
      		the rest of system memory to a sixth node.
      
      The existing hash function is maintained to support the various node sizes
      that are possible with this implementation.
      
      Each node of the same size receives roughly the same amount of available
      pages, regardless of any reserved memory with its address range.  The total
      available pages on the system is calculated and divided by the number of equal
      nodes to allocate.  These nodes are then dynamically allocated and their
      borders extended until such time as their number of available pages reaches
      the required size.
      
      Configurable node sizes are recommended when used in conjunction with cpusets
      for memory control because it eliminates the overhead associated with scanning
      the zonelists of many smaller full nodes on page_alloc().
      
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      8b8ca80e
  7. 13 2月, 2007 4 次提交
  8. 12 10月, 2006 1 次提交
    • M
      [PATCH] mm: use symbolic names instead of indices for zone initialisation · 6391af17
      Mel Gorman 提交于
      Arch-independent zone-sizing is using indices instead of symbolic names to
      offset within an array related to zones (max_zone_pfns).  The unintended
      impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
      of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set.  As a result, the
      the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.
      
      The following patch properly initialises the max_zone_pfns[] array and uses
      symbolic names instead of indices in each architecture using
      arch-independent zone-sizing.  Two users have successfully booted their
      powerpcs with it (one an ibook G4).  It has also been boot tested on x86,
      x86_64, ppc64 and ia64.  Please merge for 2.6.19-rc2.
      
      Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
      first fix.  Additional credit to Johannes Berg and Andreas Schwab for
      reporting the problem and testing on powerpc.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6391af17
  9. 27 9月, 2006 1 次提交
  10. 26 9月, 2006 2 次提交
  11. 23 4月, 2006 1 次提交
  12. 10 4月, 2006 2 次提交
    • A
      [PATCH] x86_64: Handle empty PXMs that only contain hotplug memory · a8062231
      Andi Kleen 提交于
      The node setup code would try to allocate the node metadata in the node
      itself, but that fails if there is no memory in there.
      
      This can happen with memory hotplug when the hotplug area defines an so
      far empty node.
      
      Now use bootmem to try to allocate the mem_map in other nodes.
      
      And if it fails don't panic, but just ignore the node.
      
      To make this work I added a new __alloc_bootmem_nopanic function that
      does what its name implies.
      
      TBD should try to use nearby nodes here.  Currently we just use any.
      It's hard to do it better because bootmem doesn't have proper fallback
      lists yet.
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a8062231
    • A
      [PATCH] x86_64: Reserve SRAT hotadd memory on x86-64 · 68a3a7fe
      Andi Kleen 提交于
      From: Keith Mannthey, Andi Kleen
      
      Implement memory hotadd without sparsemem. The memory in the SRAT
      hotadd area is just preserved instead and can be activated later.
      
      There are a few restrictions:
      - Only one continuous hotadd area allowed per node
      
      The main problem is dealing with the many buggy SRAT tables
      that are out there. The strategy here is to reject anything
      suspicious.
      
      Originally from Keith Mannthey, with several hacks and changes by AK
      and also contributions from Andrew Morton
      
      [ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:
      
       1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.
      
          Rebuilding zonelist is necessary when the system has just memory <
          4G at boot, and hot add memory > 4G.  because x86_64 has DMA32,
          ZONE_NORAML is not included into zonelist at boot time if system
          doesn't have memory >4G at boot.
      
          [AK: should just force the higher zones at boot time when SRAT tells us]
      
       2) zone and node's spanned_pages and present_pages are not incremented.
          They should be.
      
          For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
          from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
          possible 1T +memory.  (Microsoft requires "write all possible memory
          in SRAT") When we reserve memmap for possible 1T memory, Linux will
          not work well in +minimum 4G configuraion ;)
      
          [AK: needs limiting to 5-10% of max memory]
       ]
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      68a3a7fe
  13. 28 3月, 2006 1 次提交
  14. 26 3月, 2006 3 次提交
  15. 16 2月, 2006 1 次提交