1. 15 1月, 2011 1 次提交
    • J
      x86: tsc: Fix calibration refinement conditionals to avoid divide by zero · 62627bec
      John Stultz 提交于
      Konrad Wilk reported that the new delayed calibration crashes with a
      divide by zero on Xen. The reason is that Xen sets the pmtimer
      address, but reading from it returns 0xffffff. That results in the
      ref_start and ref_stop value being the same, so the delta is zero
      which causes the divide by zero later in the calculation.
      
      The conditional (!hpet && !ref_start && !ref_stop) which sanity checks
      the calibration reference values doesn't really make sense. If the
      refs are null, but hpet is on, we still want to break out.
      
      The div by zero would be possible to trigger by chance if both reads
      from the hardware provided the exact same value (due to hardware
      wrapping).
      
      So checking if both the ref values are the same should handle if we
      don't have hardware (both null) or if they are the same value (either by
      invalid hardware, or by chance), avoiding the div by zero issue.
      
      [ tglx: Applied the same fix to native_calibrate_tsc() where this
        	check was copied from ]
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      LKML-Reference: <1295024788-15619-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      62627bec
  2. 11 1月, 2011 5 次提交
  3. 10 1月, 2011 1 次提交
    • P
      x86, lapic-timer: Increase the max_delta to 31 bits · 4aed89d6
      Pierre Tardy 提交于
      Latest atom socs(penwell) does not have hpet timer.
      
      As their local APIC timer is clocked at 400KHZ, and the current
      code limit their Initial Counter register to 23 bits, they
      cannot sleep more than 1.34 seconds which leads to ~2 spurious
      wakeup per second (1 per thread)
      
      These SOCs support 32bit timer so we change the max_delta to at
      least 31bits. So we can at least sleep for 300 seconds.
      
      We could not find any previous chip errata where lapic would
      only have 23 bit precision As powertop is suggesting to activate
      HPET to "sleep longer", this could mean this problem is already
      known.
      
      Problem is here since very first implementation of lapic timer
      as a clock event e9e2cdb4 [PATCH] clockevents: i386 drivers.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NPierre Tardy <pierre.tardy@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      LKML-Reference: <1294327409-19426-1-git-send-email-pierre.tardy@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4aed89d6
  4. 09 1月, 2011 1 次提交
  5. 07 1月, 2011 1 次提交
    • D
      x86, numa: Fix CONFIG_DEBUG_PER_CPU_MAPS without NUMA emulation · d906f0eb
      David Rientjes 提交于
      "x86, numa: Fake node-to-cpumask for NUMA emulation" broke the
      build when CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU
      is not.  This is because it is possible to map a cpu to multiple
      nodes when NUMA emulation is used; the patch required a physical
      node address table to find those nodes that was only available
      when CONFIG_NUMA_EMU was enabled.
      
      This extracts the common debug functionality to its own function
      for CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether
      CONFIG_NUMA_EMU is set or not.
      
      NUMA emulation will now iterate over the set of possible nodes
      for each cpu and call the new debug function whereas only the
      cpu's node will be used without NUMA emulation enabled.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <alpine.DEB.2.00.1012301053590.12995@chino.kir.corp.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d906f0eb
  6. 05 1月, 2011 4 次提交
    • H
      x86, NMI: Add touch_nmi_watchdog to io_check_error delay · 74d91e3c
      Huang Ying 提交于
      Prevent the long delay in io_check_error making NMI watchdog
      timeout.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74d91e3c
    • D
      x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time · 554ec063
      Dongdong Deng 提交于
      The spin_lock_debug/rcu_cpu_stall detector uses
      trigger_all_cpu_backtrace() to dump cpu backtrace.
      Therefore it is possible that trigger_all_cpu_backtrace()
      could be called at the same time on different CPUs, which
      triggers and 'unknown reason NMI' warning. The following case
      illustrates the problem:
      
            CPU1                    CPU2                     ...   CPU N
                             trigger_all_cpu_backtrace()
                             set "backtrace_mask" to cpu mask
                                     |
      generate NMI interrupts  generate NMI interrupts       ...
          \                          |                               /
           \                         |                              /
      
      The "backtrace_mask" will be cleaned by the first NMI interrupt
      at nmi_watchdog_tick(), then the following NMI interrupts
      generated by other cpus's arch_trigger_all_cpu_backtrace() will
      be taken as unknown reason NMI interrupts.
      
      This patch uses a test_and_set to avoid the problem, and stop
      the arch_trigger_all_cpu_backtrace() from calling to avoid
      dumping a double cpu backtrace info when there is already a
      trigger_all_cpu_backtrace() in progress.
      Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
      Reviewed-by: NBruce Ashfield <bruce.ashfield@windriver.com>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      554ec063
    • D
      x86: Only call smp_processor_id in non-preempt cases · 9ab181fa
      Don Zickus 提交于
      There are some paths that walk the die_chain with preemption on.
      Make sure we are in an NMI call before we start doing anything.
      
      This was triggered by do_general_protection calling notify_die
      with DIE_GPF.
      Reported-by: NJan Kiszka <jan.kiszka@web.de>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ab181fa
    • Y
      x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion · cb2ded37
      Yinghai Lu 提交于
      Found one x2apic pre-enabled system, x2apic_mode suddenly get
      corrupted after register some cpus, when compiled
      CONFIG_NR_CPUS=255 instead of 512.
      
      It turns out that generic_processor_info() ==> phyid_set(apicid,
      phys_cpu_present_map) causes the problem.
      
      phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
      system some cpus have an apic id > 255.
      
      The variable after phys_cpu_present_map may get corrupted
      silently:
      
       ffffffff828e8420 B phys_cpu_present_map
       ffffffff828e8440 B apic_verbosity
       ffffffff828e8444 B local_apic_timer_c2_ok
       ffffffff828e8448 B disable_apic
       ffffffff828e844c B x2apic_mode
       ffffffff828e8450 B x2apic_disabled
       ffffffff828e8454 B num_processors
       ...
      
      Actually phys_cpu_present_map is referenced via apic id, instead
      index. We should use MAX_LOCAL_APIC instead MAX_APICS.
      
      For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
      on 64-bit:
      
      	text		data		bss		dec		filename
      	21696943	4193748		12787712	38678403	vmlinux.before
      	21696943	4193748		12791808	38682499	vmlinux.after
      
      No change on 32bit.
      
      Finally we can remove MAX_APCIS that was rather confusing.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      LKML-Reference: <4D23BD9C.3070102@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cb2ded37
  7. 04 1月, 2011 5 次提交
  8. 03 1月, 2011 5 次提交
    • A
      ARM: pxa: fix page table corruption on resume · 24c78557
      Aric D. Blumer 提交于
      Before this patch, the following error would sometimes occur after a
      resume on pxa3xx:
      
          /path/to/mm/memory.c:144: bad pmd 8040542e.
      
      The problem was that a temporary page table mapping was being improperly
      restored.
      
      The PXA3xx resume code creates a temporary mapping of resume_turn_on_mmu
      to avoid a prefetch abort.  The pxa3xx_resume_after_mmu code requires
      that the r1 register holding the address of this mapping not be
      modified, however, resume_turn_on_mmu does modify it. It is mostly
      correct in that r1 receives the base table address, but it may also
      get other bits in 13:0.  This results in pxa3xx_resume_after_mmu
      restoring the original mapping to the wrong place, corrupting memory
      and leaving the temporary mapping in place.
      Signed-off-by: NMatt Reimer <mreimer@sdgsystems.com>
      Signed-off-by: NEric Miao <eric.y.miao@gmail.com>
      24c78557
    • M
      ARM: it8152: add IT8152_LAST_IRQ definition to fix build error · 823a2df2
      Mike Rapoport 提交于
      The commit 6ac6b817 (ARM: pxa: encode
      IRQ number into .nr_irqs) removed definition of ITE_LAST_IRQ which
      caused the following build error:
      
      CC      arch/arm/common/it8152.o
      arch/arm/common/it8152.c: In function 'it8152_init_irq':
      arch/arm/common/it8152.c:86: error: 'IT8152_LAST_IRQ' undeclared (first use in this function)
      arch/arm/common/it8152.c:86: error: (Each undeclared identifier is reported only once
      arch/arm/common/it8152.c:86: error: for each function it appears in.)
      make[2]: *** [arch/arm/common/it8152.o] Error 1
      
      Defining the IT8152_LAST_IRQ in the arch/arm/include/hardware/it8152.c
      fixes the build.
      Signed-off-by: NMike Rapoport <mike@compulab.co.il>
      Signed-off-by: NEric Miao <eric.y.miao@gmail.com>
      823a2df2
    • L
      ARM: pxa: PXA_ESERIES depends on FB_W100. · 82427de2
      Lennert Buytenhek 提交于
      As arch/arm/mach-pxa/eseries.c references w100fb_gpio_{read,write}()
      directly.
      Signed-off-by: NLennert Buytenhek <buytenh@secretlab.ca>
      Signed-off-by: NEric Miao <eric.y.miao@gmail.com>
      82427de2
    • R
      arch/x86/oprofile/op_model_amd.c: Perform initialisation on a single CPU · c7c25802
      Robert Richter 提交于
      Disable preemption in init_ibs(). The function only checks the
      ibs capabilities and sets up pci devices (if necessary). It runs
      only on one cpu but operates with the local APIC and some MSRs,
      thus it is better to disable preemption.
      
      [    7.034377] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/483
      [    7.034385] caller is setup_APIC_eilvt+0x155/0x180
      [    7.034389] Pid: 483, comm: modprobe Not tainted 2.6.37-rc1-20101110+ #1
      [    7.034392] Call Trace:
      [    7.034400]  [<ffffffff812a2b72>] debug_smp_processor_id+0xd2/0xf0
      [    7.034404]  [<ffffffff8101e985>] setup_APIC_eilvt+0x155/0x180
      [ ... ]
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=22812
      
      Reported-by: <atswartz@gmail.com>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Cc: oprofile-list@lists.sourceforge.net <oprofile-list@lists.sourceforge.net>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Dan Carpenter <error27@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@kernel.org>         [2.6.37.x]
      LKML-Reference: <20110103111514.GM4739@erda.amd.com>
      [ small cleanups ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7c25802
    • A
      ARM: 6605/1: Add missing include "asm/memory.h" · 7c0ab43e
      Axel Lin 提交于
      This patch fixes below build error by adding the missing asm/memory.h,
      which is needed for arch_is_coherent().
      
      $ make pxa3xx_defconfig; make
        CC      init/do_mounts_rd.o
      In file included from include/linux/list_bl.h:5,
                       from include/linux/rculist_bl.h:7,
                       from include/linux/dcache.h:7,
                       from include/linux/fs.h:381,
                       from init/do_mounts_rd.c:3:
      include/linux/bit_spinlock.h: In function 'bit_spin_unlock':
      include/linux/bit_spinlock.h:61: error: implicit declaration of function 'arch_is_coherent'
      make[1]: *** [init/do_mounts_rd.o] Error 1
      make: *** [init] Error 2
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Acked-by: NPeter Huewe <peterhuewe@gmx.de>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      7c0ab43e
  9. 02 1月, 2011 1 次提交
  10. 29 12月, 2010 2 次提交
  11. 28 12月, 2010 1 次提交
    • C
      x86, paravirt: Use native_halt on a halt, not native_safe_halt · c8217b83
      Cliff Wickman 提交于
      halt() should use native_halt()
      safe_halt() uses native_safe_halt()
      
      If CONFIG_PARAVIRT=y, halt() is defined in arch/x86/include/asm/paravirt.h as
      
      static inline void halt(void)
      {
              PVOP_VCALL0(pv_irq_ops.safe_halt);
      }
      
      Otherwise (no CONFIG_PARAVIRT) halt() in arch/x86/include/asm/irqflags.h is
      
      static inline void halt(void)
      {
              native_halt();
      }
      
      So it looks to me like the CONFIG_PARAVIRT case of using native_safe_halt()
      for a halt() is an oversight.
      Am I missing something?
      
      It probably hasn't shown up as a problem because the local apic is disabled
      on a shutdown or restart.  But if we disable interrupts and call halt()
      we shouldn't expect that the halt() will re-enable interrupts.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      LKML-Reference: <E1PSBcz-0001g1-FM@eag09.americas.sgi.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c8217b83
  12. 27 12月, 2010 1 次提交
    • J
      x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() · 5cdd2de0
      Jesper Juhl 提交于
      In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
      we have  this:
      
      	while (leftover) {
      		...
      		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
      		    microcode_sanity_check(mc) < 0) {
      			vfree(mc);
      			break;
      		}
      		...
      	}
      
      	if (mc)
      		vfree(mc);
      
      This will cause a double free of 'mc'. This patch fixes that by
      just  removing the vfree() call in the loop since 'mc' will be
      freed nicely just  after we break out of the loop.
      
      There's also a second change in the patch. I noticed a lot of
      checks for  pointers being NULL before passing them to vfree().
      That's completely  redundant since vfree() deals gracefully with
      being passed a NULL pointer.  Removing the redundant checks
      yields a nice size decrease for the object  file.
      
      Size before the patch:
         text    data     bss     dec     hex filename
         4578     240    1032    5850    16da arch/x86/kernel/microcode_intel.o
      Size after the patch:
         text    data     bss     dec     hex filename
         4489     240     984    5713    1651 arch/x86/kernel/microcode_intel.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NTigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Shaohua Li <shaohua.li@intel.com>
      LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cdd2de0
  13. 24 12月, 2010 12 次提交
    • T
      ARM: 6540/1: Stop irqsoff trace on return to user · d13e5edd
      Todd Android Poynor 提交于
      If the irqsoff tracer is in use, stop tracing the interrupt disable
      interval when returning to userspace.  Tracing userspace execution time
      as interrupts disabled time is not helpful for kernel performance
      analysis purposes.  Only do so if the irqsoff tracer is enabled, to
      avoid overhead for lockdep, which doesn't care.
      Signed-off-by: NTodd Poynor <toddpoynor@google.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      d13e5edd
    • S
      ARM: 6536/1: Add missing SZ_{32,64,128} · 537de3a6
      Stephen Warren 提交于
      ... and also remove misleading comment stating that this header is
      auto-generated.
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Acked-by: NUwe Kleine-Knig <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      537de3a6
    • P
      sh: Fix up SH7201 clkfwk build. · 27f1accc
      Paul Mundt 提交于
      The master clock initialization for SH7201 was wholly bogus. Users of the
      legacy API must initialize the clock rate through the struct clk itself
      rather than returning the clock frequency. Given that the init function
      itself is void, returning the frequency isn't terribly effective.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      27f1accc
    • P
      sh: mach-se: Fix up SE7206 build. · 27434f0a
      Paul Mundt 提交于
      With some recent tidying of duplicate register definitions the se7206 IRQ
      code broke:
      
      arch/sh/boards/mach-se/7206/irq.c: error: 'INTC_ICR' undeclared (first use in this function)
      arch/sh/boards/mach-se/7206/irq.c: error: (Each undeclared identifier is reported only once
      arch/sh/boards/mach-se/7206/irq.c: error: for each function it appears in.)
      
      Fix it up.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      27434f0a
    • P
      sh: Fix up SH4-202 clkfwk build. · 638fa4aa
      Paul Mundt 提交于
      Some of the SH4-202 code was overlooked in the set_rate() API conversion,
      resulting in:
      
      arch/sh/kernel/cpu/sh4/clock-sh4-202.c: error: too many arguments to function 'clk->ops->set_rate'
      
      Fix it up.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      638fa4aa
    • D
      x86, numa: Fix cpu to node mapping for sparse node ids · a387e95a
      David Rientjes 提交于
      NUMA boot code assumes that physical node ids start at 0, but the DIMMs
      that the apic id represents may not be reachable.  If this is the case,
      node 0 is never online and cpus never end up getting appropriately
      assigned to a node.  This causes the cpumask of all online nodes to be
      empty and machines crash with kernel code assuming online nodes have
      valid cpus.
      
      The fix is to appropriately map all the address ranges for physical nodes
      and ensure the cpu to node mapping function checks all possible nodes (up
      to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the
      number of physical nodes, for valid address ranges.
      
      This requires no longer "compressing" the address ranges of nodes in the
      physical node map from 0-N, but rather leave indices in physnodes[] to
      represent the actual node id of the physical node.  Accordingly, the
      topology exported by both amd_get_nodes() and acpi_get_nodes() no longer
      must return the number of nodes to iterate through; all such iterations
      will now be to MAX_NUMNODES.
      
      This change also passes the end address of system RAM (which may be
      different from normal operation if mem= is specified on the command line)
      before the physnodes[] array is populated.  ACPI parsed nodes are
      truncated to fit within the address range that respect the mem=
      boundaries and even some physical nodes may become unreachable in such
      cases.
      
      When NUMA emulation does succeed, any apicid to node mapping that exists
      for unreachable nodes are given default values so that proximity domains
      can still be assigned.  This is important for node_distance() to
      function as desired.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a387e95a
    • D
      x86, numa: Fake node-to-cpumask for NUMA emulation · c1c3443c
      David Rientjes 提交于
      It's necessary to fake the node-to-cpumask mapping so that an emulated
      node ID returns a cpumask that includes all cpus that have affinity to
      the memory it represents.
      
      This is a little intrusive because it requires knowledge of the physical
      topology of the system.  setup_physnodes() gives us that information, but
      since NUMA emulation ends up altering the physnodes array, it's necessary
      to reset it before cpus are brought online.
      
      Accordingly, the physnodes array is moved out of init.data and into
      cpuinit.data since it will be needed on cpuup callbacks.
      
      This works regardless of whether numa=fake is used on the command line,
      or the setup of the fake node succeeds or fails.  The physnodes array
      always contains the physical topology of the machine if CONFIG_NUMA_EMU
      is enabled and can be used to setup the correct node-to-cpumask mappings
      in all cases since setup_physnodes() is called whenever the array needs
      to be repopulated with the correct data.
      
      To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are
      rewritten for CONFIG_NUMA_EMU so that we first find the physical node to
      which each cpu has local affinity, then iterate through all online nodes
      to find the emulated nodes that have local affinity to that physical
      node, and then finally map the cpu to each of those emulated nodes.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c1c3443c
    • D
      x86, numa: Fake apicid and pxm mappings for NUMA emulation · f51bf307
      David Rientjes 提交于
      This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge
      platforms.  The goal is to fake the apicid-to-node mappings for NUMA
      emulation so the physical topology of the machine is correctly maintained
      within the kernel.
      
      This change also fakes proximity domains for both ACPI and k8 code so the
      physical distance between emulated nodes is maintained via
      node_distance().  This exports the correct distances via
      /sys/devices/system/node/.../distance based on the underlying topology.
      
      A new helper function, fake_physnodes(), is introduced to correctly
      invoke the correct NUMA code to fake these two mappings based on the
      system type.  If there is no underlying NUMA configuration, all cpus are
      mapped to node 0 for local distance.
      
      Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's
      prototype can be removed from the header file for such a configuration.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f51bf307
    • D
      x86, numa: Avoid compiling NUMA emulation functions without CONFIG_NUMA_EMU · 4e76f4e6
      David Rientjes 提交于
      Both acpi_get_nodes() and amd_get_nodes() are only necessary when
      CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is
      disabled.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4e76f4e6
    • D
      x86, numa: Reduce minimum fake node size to 32M · 34dc9e74
      David Rientjes 提交于
      This patch changes the minimum fake node size from 64MB to 32MB so it is
      possible to test NUMA code at a greater scale on smaller machines
      (64 nodes on a 2G machine, 1024 nodes on 32G machine with
      CONFIG_NODES_SHIFT=10).
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      LKML-Reference: <alpine.DEB.2.00.1012221700590.3701@chino.kir.corp.google.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      34dc9e74
    • Y
      x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation · d3bd0588
      Yinghai Lu 提交于
      Recent Intel new system have different order in MADT, aka will list all thread0
      at first, then all thread1.
      But SRAT table still old order, it will list cpus in one socket all together.
      
      If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
      to put some cpus apic id to node mapping into apicid_to_node[].
      
      for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
      
      [    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
      [    9.235021] divide error: 0000 [#1] SMP
      [    9.235315] last sysfs file:
      [    9.235481] CPU 1
      [    9.235592] Modules linked in:
      [    9.245398]
      [    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
      [    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
      ...
      [    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
      [    9.665356]  RSP <ffff88103f8d1c40>
      [    9.665568] ---[ end trace 2296156d35fdfc87 ]---
      
      So let just parse all cpu entries in SRAT.
      
      Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
      apicid_to_node[].
      
      it fixes following bug too.
      https://bugzilla.kernel.org/show_bug.cgi?id=22662
      
      -v2: expand to 32bit according to hpa
         need to add MAX_LOCAL_APIC for 32bit
      Reported-and-Tested-by: NWu Fengguang <fengguang.wu@intel.com>
      Reported-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Tested-by: NMyron Stowe <myron.stowe@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D0AD486.9020704@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d3bd0588
    • Y
      x86, acpi: Add MAX_LOCAL_APIC for 32bit · 56d91f13
      Yinghai Lu 提交于
      We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
      of local apics.
      
      Also apic_version[] array should use MAX_LOCAL_APICs.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D0AD464.2020408@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      56d91f13