1. 05 1月, 2011 4 次提交
    • H
      x86, NMI: Add touch_nmi_watchdog to io_check_error delay · 74d91e3c
      Huang Ying 提交于
      Prevent the long delay in io_check_error making NMI watchdog
      timeout.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      LKML-Reference: <1294198689-15447-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      74d91e3c
    • D
      x86: Avoid calling arch_trigger_all_cpu_backtrace() at the same time · 554ec063
      Dongdong Deng 提交于
      The spin_lock_debug/rcu_cpu_stall detector uses
      trigger_all_cpu_backtrace() to dump cpu backtrace.
      Therefore it is possible that trigger_all_cpu_backtrace()
      could be called at the same time on different CPUs, which
      triggers and 'unknown reason NMI' warning. The following case
      illustrates the problem:
      
            CPU1                    CPU2                     ...   CPU N
                             trigger_all_cpu_backtrace()
                             set "backtrace_mask" to cpu mask
                                     |
      generate NMI interrupts  generate NMI interrupts       ...
          \                          |                               /
           \                         |                              /
      
      The "backtrace_mask" will be cleaned by the first NMI interrupt
      at nmi_watchdog_tick(), then the following NMI interrupts
      generated by other cpus's arch_trigger_all_cpu_backtrace() will
      be taken as unknown reason NMI interrupts.
      
      This patch uses a test_and_set to avoid the problem, and stop
      the arch_trigger_all_cpu_backtrace() from calling to avoid
      dumping a double cpu backtrace info when there is already a
      trigger_all_cpu_backtrace() in progress.
      Signed-off-by: NDongdong Deng <dongdong.deng@windriver.com>
      Reviewed-by: NBruce Ashfield <bruce.ashfield@windriver.com>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1294198689-15447-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      554ec063
    • D
      x86: Only call smp_processor_id in non-preempt cases · 9ab181fa
      Don Zickus 提交于
      There are some paths that walk the die_chain with preemption on.
      Make sure we are in an NMI call before we start doing anything.
      
      This was triggered by do_general_protection calling notify_die
      with DIE_GPF.
      Reported-by: NJan Kiszka <jan.kiszka@web.de>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      LKML-Reference: <1294198689-15447-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9ab181fa
    • Y
      x86: Fix APIC ID sizing bug on larger systems, clean up MAX_APICS confusion · cb2ded37
      Yinghai Lu 提交于
      Found one x2apic pre-enabled system, x2apic_mode suddenly get
      corrupted after register some cpus, when compiled
      CONFIG_NR_CPUS=255 instead of 512.
      
      It turns out that generic_processor_info() ==> phyid_set(apicid,
      phys_cpu_present_map) causes the problem.
      
      phys_cpu_present_map is sized by MAX_APICS bits, and pre-enabled
      system some cpus have an apic id > 255.
      
      The variable after phys_cpu_present_map may get corrupted
      silently:
      
       ffffffff828e8420 B phys_cpu_present_map
       ffffffff828e8440 B apic_verbosity
       ffffffff828e8444 B local_apic_timer_c2_ok
       ffffffff828e8448 B disable_apic
       ffffffff828e844c B x2apic_mode
       ffffffff828e8450 B x2apic_disabled
       ffffffff828e8454 B num_processors
       ...
      
      Actually phys_cpu_present_map is referenced via apic id, instead
      index. We should use MAX_LOCAL_APIC instead MAX_APICS.
      
      For 64-bit it will be 32768 in all cases. BSS will increase by 4k bytes
      on 64-bit:
      
      	text		data		bss		dec		filename
      	21696943	4193748		12787712	38678403	vmlinux.before
      	21696943	4193748		12791808	38682499	vmlinux.after
      
      No change on 32bit.
      
      Finally we can remove MAX_APCIS that was rather confusing.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      LKML-Reference: <4D23BD9C.3070102@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cb2ded37
  2. 04 1月, 2011 3 次提交
    • R
      x86, mm: Initialize initial_page_table before paravirt jumps · d50d8fe1
      Rusty Russell 提交于
      v2.6.36-rc8-54-gb40827fa (x86-32, mm: Add an initial page table
      for core bootstrapping) made x86 boot using initial_page_table
      and broke lguest.
      
      For 2.6.37 we simply cut & paste the initialization code into
      lguest (da32dac1 "lguest: populate initial_page_table"), now
      we fix it properly by doing that initialization before the
      paravirt jump.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Cc: lguest <lguest@ozlabs.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <201101041720.54535.rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d50d8fe1
    • T
      perf: Clean up power events by introducing new, more generic ones · 25e41933
      Thomas Renninger 提交于
      Add these new power trace events:
      
       power:cpu_idle
       power:cpu_frequency
       power:machine_suspend
      
      The old C-state/idle accounting events:
        power:power_start
        power:power_end
      
      Have now a replacement (but we are still keeping the old
      tracepoints for compatibility):
      
        power:cpu_idle
      
      and
        power:power_frequency
      
      is replaced with:
        power:cpu_frequency
      
      power:machine_suspend is newly introduced.
      
      Jean Pihet has a patch integrated into the generic layer
      (kernel/power/suspend.c) which will make use of it.
      
      the type= field got removed from both, it was never
      used and the type is differed by the event type itself.
      
      perf timechart userspace tool gets adjusted in a separate patch.
      Signed-off-by: NThomas Renninger <trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NJean Pihet <jean.pihet@newoldbits.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: rjw@sisk.pl
      LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
      25e41933
    • R
      x86, hwmon: Add core threshold notification to therm_throt.c · 9e76a97e
      R, Durgadoss 提交于
      This patch adds code to therm_throt.c to notify core thermal threshold
      events. These thresholds are supported by the IA32_THERM_INTERRUPT register.
      The status/log for the same is monitored using the IA32_THERM_STATUS register.
      The necessary #defines are in msr-index.h. A call back is added to mce.h, to
      further notify the thermal stack, about the threshold events.
      Signed-off-by: NDurgadoss R <durgadoss.r@intel.com>
      LKML-Reference: <D6D887BA8C9DFF48B5233887EF04654105C1251710@bgsmsx502.gar.corp.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9e76a97e
  3. 30 12月, 2010 3 次提交
  4. 27 12月, 2010 1 次提交
    • J
      x86/microcode: Fix double vfree() and remove redundant pointer checks before vfree() · 5cdd2de0
      Jesper Juhl 提交于
      In arch/x86/kernel/microcode_intel.c::generic_load_microcode()
      we have  this:
      
      	while (leftover) {
      		...
      		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
      		    microcode_sanity_check(mc) < 0) {
      			vfree(mc);
      			break;
      		}
      		...
      	}
      
      	if (mc)
      		vfree(mc);
      
      This will cause a double free of 'mc'. This patch fixes that by
      just  removing the vfree() call in the loop since 'mc' will be
      freed nicely just  after we break out of the loop.
      
      There's also a second change in the patch. I noticed a lot of
      checks for  pointers being NULL before passing them to vfree().
      That's completely  redundant since vfree() deals gracefully with
      being passed a NULL pointer.  Removing the redundant checks
      yields a nice size decrease for the object  file.
      
      Size before the patch:
         text    data     bss     dec     hex filename
         4578     240    1032    5850    16da arch/x86/kernel/microcode_intel.o
      Size after the patch:
         text    data     bss     dec     hex filename
         4489     240     984    5713    1651 arch/x86/kernel/microcode_intel.o
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Acked-by: NTigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Shaohua Li <shaohua.li@intel.com>
      LKML-Reference: <alpine.LNX.2.00.1012251946100.10759@swampdragon.chaosbits.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5cdd2de0
  5. 24 12月, 2010 2 次提交
    • Y
      x86, acpi: Parse all SRAT cpu entries even above the cpu number limitation · d3bd0588
      Yinghai Lu 提交于
      Recent Intel new system have different order in MADT, aka will list all thread0
      at first, then all thread1.
      But SRAT table still old order, it will list cpus in one socket all together.
      
      If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
      to put some cpus apic id to node mapping into apicid_to_node[].
      
      for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
      
      [    9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
      [    9.235021] divide error: 0000 [#1] SMP
      [    9.235315] last sysfs file:
      [    9.235481] CPU 1
      [    9.235592] Modules linked in:
      [    9.245398]
      [    9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274      /Sun Fire x4800
      [    9.265415] RIP: 0010:[<ffffffff81075a8f>]  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
      ...
      [    9.645938] RIP  [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
      [    9.665356]  RSP <ffff88103f8d1c40>
      [    9.665568] ---[ end trace 2296156d35fdfc87 ]---
      
      So let just parse all cpu entries in SRAT.
      
      Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
      apicid_to_node[].
      
      it fixes following bug too.
      https://bugzilla.kernel.org/show_bug.cgi?id=22662
      
      -v2: expand to 32bit according to hpa
         need to add MAX_LOCAL_APIC for 32bit
      Reported-and-Tested-by: NWu Fengguang <fengguang.wu@intel.com>
      Reported-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Tested-by: NMyron Stowe <myron.stowe@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D0AD486.9020704@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d3bd0588
    • Y
      x86, acpi: Add MAX_LOCAL_APIC for 32bit · 56d91f13
      Yinghai Lu 提交于
      We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
      of local apics.
      
      Also apic_version[] array should use MAX_LOCAL_APICs.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D0AD464.2020408@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      56d91f13
  6. 23 12月, 2010 1 次提交
    • D
      x86, nmi_watchdog: Remove ARCH_HAS_NMI_WATCHDOG and rely on CONFIG_HARDLOCKUP_DETECTOR · 4a7863cc
      Don Zickus 提交于
      The x86 arch has shifted its use of the nmi_watchdog from a
      local implementation to the global one provide by
      kernel/watchdog.c.  This shift has caused a whole bunch of
      compile problems under different config options.  I attempt to
      simplify things with the patch below.
      
      In order to simplify things, I had to come to terms with the
      meaning of two terms ARCH_HAS_NMI_WATCHDOG and
      CONFIG_HARDLOCKUP_DETECTOR.  Basically they mean the same thing,
      the former on a local level and the latter on a global level.
      
      With the old x86 nmi watchdog gone, there is no need to rely on
      defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
      make sense any more.  x86 will now use the global
      implementation.
      
      The changes below do a few things.  First it changes the few
      places that relied on ARCH_HAS_NMI_WATCHDOG to use
      CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
      anyway, so nothing unusual here).  Those pieces of code were
      relying more on local apic functionality the nmi watchdog
      functionality, so the change should make sense.
      
      Second, I removed the x86 implementation of
      touch_nmi_watchdog().  It isn't need now, instead x86 will rely
      on kernel/watchdog.c's implementation.
      
      Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
      x86.  And tweaked the include/linux/nmi.h file to tell users to
      look for an externally defined touch_nmi_watchdog in the case of
      ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
      changes removes some of the ugliness in that file.
      
      Finally, I added a Kconfig dependency for
      CONFIG_HARDLOCKUP_DETECTOR that said you can't have
      ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR.  You can
      only have one nmi_watchdog.
      
      Tested with
      ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
      configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
      (various broken configs)
      
      Hopefully, after this patch I won't get any more compile broken
      emails. :-)
      
      v3:
        changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
        prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4a7863cc
  7. 22 12月, 2010 2 次提交
  8. 18 12月, 2010 5 次提交
  9. 17 12月, 2010 2 次提交
  10. 16 12月, 2010 4 次提交
    • P
      perf, x86: Provide a PEBS capable cycle event · 7639dae0
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7639dae0
    • P
      perf: Dynamic pmu types · 2e80a82a
      Peter Zijlstra 提交于
      Extend the perf_pmu_register() interface to allow for named and
      dynamic pmu types.
      
      Because we need to support the existing static types we cannot use
      dynamic types for everything, hence provide a type argument.
      
      If we want to enumerate the PMUs they need a name, provide one.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20101117222056.259707703@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e80a82a
    • P
      perf, x86: Detect broken BIOSes that corrupt the PMU · 4407204c
      Peter Zijlstra 提交于
      Some BIOSes use PMU resources, which can cause various bugs:
      
       - Non-working or erratic PMU based statistics - the PMU can end up
         counting the wrong thing, resulting in misleading statistics
      
       - Profiling can stop working or it can profile the wrong thing
      
       - A non-working or erratic NMI watchdog that cannot be relied on
      
       - The kernel may disturb whatever thing the BIOS tries to use the
         PMU for - possibly causing hardware malfunction in extreme cases.
      
       - ... and other forms of potential misbehavior
      
      Various forms of such misbehavior has been observed in practice - there are
      BIOSes that just corrupt the PMU state, consequences be damned.
      
      The PMU is a CPU resource that is handled by the kernel and the BIOS
      stealing+corrupting it is not acceptable nor robust, so we detect it,
      warn about it and further refuse to touch the PMU ourselves.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4407204c
    • R
      lguest: populate initial_page_table · da32dac1
      Rusty Russell 提交于
      Two x86 patches broke lguest:
      1) v2.6.35-492-g72d7c3b3, which changed x86 to use the memblock allocator.
      
      In lguest, the host places linear page tables at the top of mem, which
      used to be enough to get us up to the swapper_pg_dir page tables.  With
      the first patch, the direct mapping tables used that memory:
      
      Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
      After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
      
      I initially fixed this by lying about the amount of memory we had, so
      the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
      
      2) v2.6.36-rc8-54-gb40827fa, which made x86 boot use initial_page_table.
      
      This was initialized in a part of head_32.S which isn't executed by
      lguest; it is then copied into swapper_pg_dir.  So we have to initialize
      it; and anyway we switch to it before we blatt the old tables, so that
      fixes the previous damage as well.
      
      For the moment, I cut & pasted the code into lguest's boot code, but
      next merge window I will merge them.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      To: x86@kernel.org
      da32dac1
  11. 14 12月, 2010 5 次提交
  12. 13 12月, 2010 2 次提交
    • T
      x86: HPET: Chose a paranoid safe value for the ETIME check · f1c18071
      Thomas Gleixner 提交于
      commit 995bd3bb (x86: Hpet: Avoid the comparator readback penalty)
      chose 8 HPET cycles as a safe value for the ETIME check, as we had the
      confirmation that the posted write to the comparator register is
      delayed by two HPET clock cycles on Intel chipsets which showed
      readback problems.
      
      After that patch hit mainline we got reports from machines with newer
      AMD chipsets which seem to have an even longer delay. See
      http://thread.gmane.org/gmane.linux.kernel/1054283 and
      http://thread.gmane.org/gmane.linux.kernel/1069458 for further
      information.
      
      Boris tried to come up with an ACPI based selection of the minimum
      HPET cycles, but this failed on a couple of test machines. And of
      course we did not get any useful information from the hardware folks.
      
      For now our only option is to chose a paranoid high and safe value for
      the minimum HPET cycles used by the ETIME check. Adjust the minimum ns
      value for the HPET clockevent accordingly.
      Reported-Bistected-and-Tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <alpine.LFD.2.00.1012131222420.2653@localhost6.localdomain6>
      Cc: Simon Kirby <sim@hostway.ca>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andreas Herrmann <Andreas.Herrmann3@amd.com>
      Cc: John Stultz <johnstul@us.ibm.com>
      f1c18071
    • T
      x86: Check tsc available/disabled in the delayed init function · a8760eca
      Thomas Gleixner 提交于
      The delayed TSC init function does not check whether the system has no
      TSC or TSC is disabled at the kernel command line, which results in a
      crash in the work queue based extended calibration due to division by
      zero because the basic calibration never happened.
      
      Add the missing checks and do not touch TSC when not available or
      disabled.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: John Stultz <johnstul@us.ibm.com>
      a8760eca
  13. 10 12月, 2010 6 次提交