1. 24 12月, 2010 5 次提交
    • D
      x86, numa: Fix cpu to node mapping for sparse node ids · a387e95a
      David Rientjes 提交于
      NUMA boot code assumes that physical node ids start at 0, but the DIMMs
      that the apic id represents may not be reachable.  If this is the case,
      node 0 is never online and cpus never end up getting appropriately
      assigned to a node.  This causes the cpumask of all online nodes to be
      empty and machines crash with kernel code assuming online nodes have
      valid cpus.
      
      The fix is to appropriately map all the address ranges for physical nodes
      and ensure the cpu to node mapping function checks all possible nodes (up
      to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the
      number of physical nodes, for valid address ranges.
      
      This requires no longer "compressing" the address ranges of nodes in the
      physical node map from 0-N, but rather leave indices in physnodes[] to
      represent the actual node id of the physical node.  Accordingly, the
      topology exported by both amd_get_nodes() and acpi_get_nodes() no longer
      must return the number of nodes to iterate through; all such iterations
      will now be to MAX_NUMNODES.
      
      This change also passes the end address of system RAM (which may be
      different from normal operation if mem= is specified on the command line)
      before the physnodes[] array is populated.  ACPI parsed nodes are
      truncated to fit within the address range that respect the mem=
      boundaries and even some physical nodes may become unreachable in such
      cases.
      
      When NUMA emulation does succeed, any apicid to node mapping that exists
      for unreachable nodes are given default values so that proximity domains
      can still be assigned.  This is important for node_distance() to
      function as desired.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a387e95a
    • D
      x86, numa: Fake node-to-cpumask for NUMA emulation · c1c3443c
      David Rientjes 提交于
      It's necessary to fake the node-to-cpumask mapping so that an emulated
      node ID returns a cpumask that includes all cpus that have affinity to
      the memory it represents.
      
      This is a little intrusive because it requires knowledge of the physical
      topology of the system.  setup_physnodes() gives us that information, but
      since NUMA emulation ends up altering the physnodes array, it's necessary
      to reset it before cpus are brought online.
      
      Accordingly, the physnodes array is moved out of init.data and into
      cpuinit.data since it will be needed on cpuup callbacks.
      
      This works regardless of whether numa=fake is used on the command line,
      or the setup of the fake node succeeds or fails.  The physnodes array
      always contains the physical topology of the machine if CONFIG_NUMA_EMU
      is enabled and can be used to setup the correct node-to-cpumask mappings
      in all cases since setup_physnodes() is called whenever the array needs
      to be repopulated with the correct data.
      
      To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are
      rewritten for CONFIG_NUMA_EMU so that we first find the physical node to
      which each cpu has local affinity, then iterate through all online nodes
      to find the emulated nodes that have local affinity to that physical
      node, and then finally map the cpu to each of those emulated nodes.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c1c3443c
    • D
      x86, numa: Fake apicid and pxm mappings for NUMA emulation · f51bf307
      David Rientjes 提交于
      This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge
      platforms.  The goal is to fake the apicid-to-node mappings for NUMA
      emulation so the physical topology of the machine is correctly maintained
      within the kernel.
      
      This change also fakes proximity domains for both ACPI and k8 code so the
      physical distance between emulated nodes is maintained via
      node_distance().  This exports the correct distances via
      /sys/devices/system/node/.../distance based on the underlying topology.
      
      A new helper function, fake_physnodes(), is introduced to correctly
      invoke the correct NUMA code to fake these two mappings based on the
      system type.  If there is no underlying NUMA configuration, all cpus are
      mapped to node 0 for local distance.
      
      Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's
      prototype can be removed from the header file for such a configuration.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f51bf307
    • D
      x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU · 4e76f4e6
      David Rientjes 提交于
      Both acpi_get_nodes() and amd_get_nodes() are only necessary when
      CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is
      disabled.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4e76f4e6
    • D
      x86, numa: Reduce minimum fake node size to 32M · 34dc9e74
      David Rientjes 提交于
      This patch changes the minimum fake node size from 64MB to 32MB so it is
      possible to test NUMA code at a greater scale on smaller machines
      (64 nodes on a 2G machine, 1024 nodes on 32G machine with
      CONFIG_NODES_SHIFT=10).
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221700590.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      34dc9e74
  2. 18 11月, 2010 3 次提交
  3. 16 11月, 2010 27 次提交
  4. 15 11月, 2010 5 次提交