1. 02 5月, 2011 25 次提交
    • T
      x86, NUMA: Make 32bit use common NUMA init path · bd6709a9
      Tejun Heo 提交于
      With both _numa_init() methods converted and the rest of init code
      adjusted, numa_32.c now can switch from the 32bit only init code to
      the common one in numa.c.
      
      * Shim get_memcfg_*()'s are dropped and initmem_init() calls
        x86_numa_init(), which is updated to handle NUMAQ.
      
      * All boilerplate operations including node range limiting, pgdat
        alloc/init are handled by numa_init().  32bit only implementation is
        removed.
      
      * 32bit numa_add_memblk(), numa_set_distance() and
        memory_add_physaddr_to_nid() removed and common versions in
        numa_32.c enabled for 32bit.
      
      This change causes the following behavior changes.
      
      * NODE_DATA()->node_start_pfn/node_spanned_pages properly initialized
        for 32bit too.
      
      * Much more sanity checks and configuration cleanups.
      
      * Proper handling of node distances.
      
      * The same NUMA init messages as 64bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      bd6709a9
    • T
      x86, NUMA: Initialize and use remap allocator from setup_node_bootmem() · 7888e96b
      Tejun Heo 提交于
      setup_node_bootmem() is taken from 64bit and doesn't use remap
      allocator.  It's about to be shared with 32bit so add support for it.
      If NODE_DATA is remapped, it's noted in the debug message and node
      locality check is skipped as the __pa() of the remapped address
      doesn't reflect the actual physical address.
      
      On 64bit, remap allocator becomes noop and doesn't affect the
      behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      7888e96b
    • T
      x86-32, NUMA: Add @start and @end to init_alloc_remap() · 99cca492
      Tejun Heo 提交于
      Instead of dereferencing node_start/end_pfn[] directly, make
      init_alloc_remap() take @start and @end and let the caller be
      responsible for making sure the range is sane.  This is to prepare for
      use from unified NUMA init code.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      99cca492
    • T
      x86, NUMA: Remove long 64bit assumption from numa.c · 38f3e1ca
      Tejun Heo 提交于
      Code moved from numa_64.c has assumption that long is 64bit in several
      places.  This patch removes the assumption by using {s|u}64_t
      explicity, using PFN_PHYS() for page number -> addr conversions and
      adjusting printf formats.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      38f3e1ca
    • T
      x86, NUMA: Enable build of generic NUMA init code on 32bit · 744baba0
      Tejun Heo 提交于
      Generic NUMA init code was moved to numa.c from numa_64.c but is still
      guaraded by CONFIG_X86_64.  This patch removes the compile guard and
      enables compiling on 32bit.
      
      * numa_add_memblk() and numa_set_distance() clash with the shim
        implementation in numa_32.c and are left out.
      
      * memory_add_physaddr_to_nid() clashes with 32bit implementation and
        is left out.
      
      * MAX_DMA_PFN definition in dma.h moved out of !CONFIG_X86_32.
      
      * node_data definition in numa_32.c removed in favor of the one in
        numa.c.
      
      There are places where ulong is assumed to be 64bit.  The next patch
      will fix them up.  Note that although the code is compiled it isn't
      used yet and this patch doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      744baba0
    • T
      x86, NUMA: Move NUMA init logic from numa_64.c to numa.c · a4106eae
      Tejun Heo 提交于
      Move the generic 64bit NUMA init machinery from numa_64.c to numa.c.
      
      * node_data[], numa_mem_info and numa_distance
      * numa_add_memblk[_to](), numa_remove_memblk[_from]()
      * numa_set_distance() and friends
      * numa_init() and all the numa_meminfo handling helpers called from it
      * dummy_numa_init()
      * memory_add_physaddr_to_nid()
      
      A new function x86_numa_init() is added and the content of
      numa_64.c::initmem_init() is moved into it.  initmem_init() now simply
      calls x86_numa_init().
      
      Constants and numa_off declaration are moved from numa_{32|64}.h to
      numa.h.
      
      This is code reorganization and doesn't involve any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      a4106eae
    • T
      x86-32, NUMA: Update numaq to use new NUMA init protocol · 299a180a
      Tejun Heo 提交于
      Update numaq such that it calls numa_add_memblk() and sets
      numa_nodes_parsed instead of directly diddling with NUMA states.  The
      original get_memcfg_numaq() is renamed to numaq_numa_init() and new
      get_memcfg_numaq() is created in numa_32.c.
      
      The shim numa_add_memblk() implementation handles node_start/end_pfn[]
      and node_set_online() for nodes with memory.  The new
      get_memcfg_numaq() exactly the same with get_memcfg_from_srat() other
      than calling the numaq init function.  Things get_memcfgs_numaq() do
      are not strictly necessary for numaq but added for consistency and to
      help unifying NUMA init handling.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      299a180a
    • T
      x86-32, NUMA: Replace srat_32.c with srat.c · 5acd91ab
      Tejun Heo 提交于
      SRAT support implementation in srat_32.c and srat.c are generally
      similar; however, there are some differences.
      
      First of all, 64bit implementation supports more types of SRAT
      entries.  64bit supports x2apic, affinity, memory and SLIT.  32bit
      only supports processor and memory.
      
      Most other differences stem from different initialization protocols
      employed by 64bit and 32bit NUMA init paths.
      
      On 64bit,
      
      * Mappings among PXM, node and apicid are directly done in each SRAT
        entry callback.
      
      * Memory affinity information is passed to numa_add_memblk() which
        takes care of all interfacing with NUMA init.
      
      * Doesn't directly initialize NUMA configurations.  All the
        information is recorded in numa_nodes_parsed and memblks.
      
      On 32bit,
      
      * Checks numa_off.
      
      * Things go through one more level of indirection via private tables
        but eventually end up initializing the same mappings.
      
      * node_start/end_pfn[] are initialized and
        memblock_x86_register_active_regions() is called for each memory
        chunk.
      
      * node_set_online() is called for each online node.
      
      * sort_node_map() is called.
      
      There are also other minor differences in sanity checking and messages
      but taking 64bit version should be good enough.
      
      This patch drops the 32bit specific implementation and makes the 64bit
      implementation common for both 32 and 64bit.
      
      The init protocol differences are dealt with in two places - the
      numa_add_memblk() shim added in the previous patch and new temporary
      numa_32.c:get_memcfg_from_srat() which wraps invocation of
      x86_acpi_numa_init().
      
      The shim numa_add_memblk() handles the folowings.
      
      * node_start/end_pfn[] initialization.
      
      * node_set_online() for memory nodes.
      
      * Invocation of memblock_x86_register_active_regions().
      
      The shim get_memcfg_from_srat() handles the followings.
      
      * numa_off check.
      
      * node_set_online() for CPU nodes.
      
      * sort_node_map() invocation.
      
      * Clearing of numa_nodes_parsed and active_ranges on failure.
      
      The shims are temporary and will be removed as the generic NUMA init
      path in 32bit is replaced with 64bit one.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      5acd91ab
    • T
      x86-32, NUMA: implement temporary NUMA init shims · b0d31080
      Tejun Heo 提交于
      To help transition to common NUMA init, implement temporary 32bit
      shims for numa_add_memblk() and numa_set_distance().
      numa_add_memblk() registers the memblk and adjusts
      node_start/end_pfn[].  numa_set_distance() is noop.
      
      These shims will allow using 64bit NUMA init functions on 32bit and
      gradual transition to common NUMA init path.
      
      For detailed description, please read description of commits which
      make use of the shim functions.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      b0d31080
    • T
      x86, NUMA: Move numa_nodes_parsed to numa.[hc] · e6df595b
      Tejun Heo 提交于
      Move numa_nodes_parsed from numa_64.[hc] to numa.[hc] to prepare for
      NUMA init path unification.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      e6df595b
    • T
      x86-32, NUMA: Move get_memcfg_numa() into numa_32.c · daf4f480
      Tejun Heo 提交于
      There's no reason get_memcfg_numa() to be implemented inline in
      mmzone_32.h.  Move it to numa_32.c and also make
      get_memcfg_numa_flag() static.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      daf4f480
    • T
      x86, NUMA: make srat.c 32bit safe · eca9ad31
      Tejun Heo 提交于
      Make srat.c 32bit safe by removing the assumption that unsigned long
      is 64bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      eca9ad31
    • T
      x86, NUMA: rename srat_64.c to srat.c · 7b2600f8
      Tejun Heo 提交于
      Rename srat_64.c to srat.c.  This is to prepare for unification of
      NUMA init paths between 32 and 64bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      7b2600f8
    • T
      x86, NUMA: trivial cleanups · 1201e10a
      Tejun Heo 提交于
      * Kill no longer used struct bootnode.
      
      * Kill dangling declaration of pxm_to_nid() in numa_32.h.
      
      * Make setup_node_bootmem() static.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      1201e10a
    • T
      x86-32, NUMA: use sparse_memory_present_with_active_regions() · 797390d8
      Tejun Heo 提交于
      Instead of calling memory_present() for each region from NUMA init,
      call sparse_memory_present_with_active_regions() from paging_init()
      similarly to x86-64.
      
      For flat and numaq, this results in exactly the same memory_present()
      calls.  For srat, if there are multiple memory chunks for a node,
      after this change, memory_present() will be called separately for each
      chunk instead of being called once to encompass the whole range, which
      doesn't cause any harm and actually is the better behavior.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      797390d8
    • T
      x86-32, NUMA: Make apic->x86_32_numa_cpu_node() optional · 84914ed0
      Tejun Heo 提交于
      NUMAQ is the only meaningful user of this callback and
      setup_local_APIC() the only callsite.  Stop torturing everyone else by
      making the callback optional and removing all the boilerplate
      implementations and assignments.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      84914ed0
    • T
      x86, NUMA: Unify 32/64bit numa_cpu_node() implementation · 6bd26273
      Tejun Heo 提交于
      Currently, the only meaningful user of apic->x86_32_numa_cpu_node() is
      NUMAQ which returns valid mapping only after CPU is initialized during
      SMP bringup; thus, the previous patch to set apicid -> node in
      setup_local_APIC() makes __apicid_to_node[] always contain the correct
      mapping whether custom apic->x86_32_numa_cpu_node() is used or not.
      
      So, there is no reason to keep separate 32bit implementation.  We can
      always consult __apicid_to_node[].  Move 64bit implementation from
      numa_64.c to numa.c and remove 32bit implementation from numa_32.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      6bd26273
    • T
      x86-32, NUMA: Automatically set apicid -> node in setup_local_APIC() · c4b90c11
      Tejun Heo 提交于
      Some x86-32 NUMA implementations (NUMAQ) don't initialize apicid ->
      node mapping using set_apicid_to_node() during NUMA init but implement
      custom apic->x86_32_numa_cpu_node() instead.
      
      This patch automatically initializes the default apic -> node mapping
      table from apic->x86_32_numa_cpu_node() from setup_local_APIC() such
      that the mapping table is in sync with the actual mapping.
      
      As the table isn't used by custom implementations, this doesn't make
      any difference at this point.  This is in preparation of unifying
      numa_cpu_node() between x86-32 and 64.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      c4b90c11
    • T
      x86-64, NUMA: simplify nodedata allocation · acd26d61
      Tejun Heo 提交于
      With top-down memblock allocation, the allocation range limits in
      ealry_node_mem() can be simplified - try node-local first, then any
      node but in any case don't allocate below DMA limit.
      
      Remove early_node_mem() and implement simplified allocation directly
      in setup_node_bootmem().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      acd26d61
    • T
      x86-64, NUMA: trivial cleanups for setup_node_bootmem() · ebe685f2
      Tejun Heo 提交于
      Make the following trivial changes in preparation for further updates.
      
      * nodeid -> nid, nid -> tnid
      * use nd_ prefix for nodedata related variables
      * remove start/end_pfn and use start/end directly
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      ebe685f2
    • T
      x86-64, NUMA: Simplify hotadd memory handling · 9688678a
      Tejun Heo 提交于
      The only special handling NUMA needs to do for hotadd memory is
      determining the node for the hotadd memory given the address of it and
      there's nothing specific to specific config method used.
      
      srat_64.c does somewhat elaborate error checking on
      ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements
      memory_add_physaddr_to_nid() which determines the node for given
      hotadd address.
      
      This is almost completely redundant.  All the information is already
      available to the generic NUMA code which already performs all the
      sanity checking and merging.  All that's necessary is not using
      __initdata from numa_meminfo and providing a function which uses it to
      map address to node.
      
      Drop the specific implementation from srat_64.c and add generic
      memory_add_physaddr_to_nid() in numa_64.c, which is enabled if
      CONFIG_MEMORY_HOTPLUG is set.  Other than dropping the code, srat_64.c
      doesn't need any change as it already calls numa_add_memblk() for hot
      pluggable regions which is enough.
      
      While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to
      CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      9688678a
    • T
      Merge branch 'x86/urgent' into x86-mm · ba67cf5c
      Tejun Heo 提交于
      Merge reason: Pick up the following two fix commits.
      
        2be19102: x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo()
        765af22d: x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
      
      Scheduled NUMA init 32/64bit unification changes depend on these.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ba67cf5c
    • T
      Merge branch 'x86/numa' into x86-mm · aff36486
      Tejun Heo 提交于
      Merge reason: Pick up x86-32 remap allocator cleanup changes - 14
      commits, 3fe14ab5^..993ba158.
      
        3fe14ab5: x86-32, numa: Fix failure condition check in alloc_remap()
        993ba158: x86-32, numa: Update remap allocator comments
      
      Scheduled NUMA init 32/64bit unification changes depend on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      aff36486
    • Y
      x86, NUMA: Fix empty memblk detection in numa_cleanup_meminfo() · 2be19102
      Yinghai Lu 提交于
      numa_cleanup_meminfo() trims each memblk between low (0) and
      high (max_pfn) limits and discards empty ones.  However, the
      emptiness detection incorrectly used equality test.  If the
      start of a memblk is higher than max_pfn, it is empty but fails
      the equality test and doesn't get discarded.
      
      The condition triggers when max_pfn is lower than start of a
      NUMA node and results in memory misconfiguration - leading to
      WARN_ON()s and other funnies.  The bug was discovered in devel
      branch where 32bit too uses this code path for NUMA init.  If a
      node is above the addressing limit, max_pfn ends up lower than
      the node triggering this problem.
      
      The failure hasn't been observed on x86-64 but is still possible
      with broken hardware e820/NUMA info.  As the fix is very low
      risk, it would be better to apply it even for 64bit.
      
      Fix it by using >= instead of ==.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      [ Extracted the actual fix from the original patch and rewrote patch description. ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      2be19102
    • B
      x86, AMD: Fix APIC timer erratum 400 affecting K8 Rev.A-E processors · e20a2d20
      Boris Ostrovsky 提交于
      Older AMD K8 processors (Revisions A-E) are affected by erratum
      400 (APIC timer interrupts don't occur in C states greater than
      C1). This, for example, means that X86_FEATURE_ARAT flag should
      not be set for these parts.
      
      This addresses regression introduced by commit
      b87cf80a ("x86, AMD: Set ARAT
      feature on AMD processors") where the system may become
      unresponsive until external interrupt (such as keyboard input)
      occurs. This results, for example, in time not being reported
      correctly, lack of progress on the system and other lockups.
      Reported-by: NJoerg-Volker Peetz <jvpeetz@web.de>
      Tested-by: NJoerg-Volker Peetz <jvpeetz@web.de>
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NBoris Ostrovsky <Boris.Ostrovsky@amd.com>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1304113663-6586-1-git-send-email-ostr@amd64.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e20a2d20
  2. 01 5月, 2011 1 次提交
  3. 30 4月, 2011 6 次提交
  4. 29 4月, 2011 8 次提交