1. 21 2月, 2009 1 次提交
    • H
      x86, mce: remove incorrect __cpuinit for mce_cpu_features() · cc3ca220
      H. Peter Anvin 提交于
      Impact: Bug fix on UP
      
      Checkin 6ec68bff:
          x86, mce: reinitialize per cpu features on resume
      
      introduced a call to mce_cpu_features() in the resume path, in order
      for the MCE machinery to get properly reinitialized after a resume.
      However, this function (and its successors) was flagged __cpuinit,
      which becomes __init on UP configurations (on SMP suspend/resume
      requires CPU hotplug and so this would not be seen.)
      
      Remove the offending __cpuinit annotations for mce_cpu_features() and
      its successor functions.
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      cc3ca220
  2. 20 2月, 2009 2 次提交
    • I
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar 提交于
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07a66d7c
    • A
      x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b
      Alok N Kataria 提交于
      Impact: fix time warps under vmware
      
      Similar to the check for TSC going backwards in the TSC clocksource,
      we also need this check for VMI clocksource.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: Zachary Amsden <zach@vmware.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      48ffc70b
  3. 19 2月, 2009 1 次提交
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  4. 18 2月, 2009 4 次提交
    • A
      x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown · 07db1c14
      Andi Kleen 提交于
      Impact: Bugfix
      
      The ifdef for the apic clear on shutdown for the 64bit intel thermal
      vector was incorrect and never triggered. Fix that.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      07db1c14
    • A
      x86, mce: use force_sig_info to kill process in machine check · 380851bc
      Andi Kleen 提交于
      Impact: bug fix (with tolerant == 3)
      
      do_exit cannot be called directly from the exception handler because
      it can sleep and the exception handler runs on the exception stack.
      Use force_sig() instead.
      
      Based on a earlier patch by Ying Huang who debugged the problem.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      380851bc
    • A
      x86, mce: reinitialize per cpu features on resume · 6ec68bff
      Andi Kleen 提交于
      Impact: Bug fix
      
      This fixes a long standing bug in the machine check code. On resume the
      boot CPU wouldn't get its vendor specific state like thermal handling
      reinitialized. This means the boot cpu wouldn't ever get any thermal
      events reported again.
      
      Call the respective initialization functions on resume
      
      v2: Remove ancient init because they don't have a resume device anyways.
          Pointed out by Thomas Gleixner.
      v3: Now fix the Subject too to reflect v2 change
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6ec68bff
    • P
      x86, rcu: fix strange load average and ksoftirqd behavior · bf51935f
      Paul E. McKenney 提交于
      Damien Wyart reported high ksoftirqd CPU usage (20%) on an
      otherwise idle system.
      
      The function-graph trace Damien provided:
      
      >   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
      >   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      > [ . . . ]
      
      Shows rcu_check_callbacks() being invoked way too often. It should be called
      once per jiffy, and here it is called no less than 22 times in about
      3.5 milliseconds, meaning one call every 160 microseconds or so.
      
      Why do we need to call rcu_pending() and rcu_check_callbacks() from the
      idle loop of 32-bit x86, especially given that no other architecture does
      this?
      
      The following patch removes the call to rcu_pending() and
      rcu_check_callbacks() from the x86 32-bit idle loop in order to
      reduce the softirq load on idle systems.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf51935f
  5. 16 2月, 2009 2 次提交
    • R
      cpumask: fix powernow-k8: partial revert of 2fdf66b4 · a0abd520
      Rusty Russell 提交于
      Impact: fix powernow-k8 when acpi=off (or other error).
      
      There was a spurious change introduced into powernow-k8 in this patch:
      so that we try to "restore" the cpus_allowed we never saved.  We revert
      that file.
      
      See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when
      acpi=off" from Yinghai for the bug report.
      
      Cc: Mike Travis <travis@sgi.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      a0abd520
    • P
      trace: mmiotrace to the tracer menu in Kconfig · 6bc5c366
      Pekka Paalanen 提交于
      Impact: cosmetic change in Kconfig menu layout
      
      This patch was originally suggested by Peter Zijlstra, but seems it
      was forgotten.
      
      CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
      directly under the Kernel hacking / debugging menu in the kernel
      configuration system. They were present only for x86 and x86_64.
      
      Other tracers that use the ftrace tracing framework are in their own
      sub-menu. This patch moves the mmiotrace configuration options there.
      Since the Kconfig file, where the tracer menu is, is not architecture
      specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
      x86/x86_64. CONFIG_MMIOTRACE now depends on it.
      Signed-off-by: NPekka Paalanen <pq@iki.fi>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bc5c366
  6. 15 2月, 2009 10 次提交
  7. 13 2月, 2009 4 次提交
    • J
      x86, hpet: fix for LS21 + HPET = boot hang · b13e2464
      john stultz 提交于
      Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
      systems that had the HPET enabled in the BIOS, resulting in boot hangs
      for x86_64.
      
      Specifically commit b8ce3359, which
      merges the i386 and x86_64 HPET code.
      
      Prior to this commit, when we setup the HPET timers in x86_64, we did
      the following:
      
      	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                          HPET_TN_32BIT, HPET_T0_CFG);
      
      However after the i386/x86_64 HPET merge, we do the following:
      
      	cfg = hpet_readl(HPET_Tn_CFG(timer));
      	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
      			HPET_TN_SETVAL | HPET_TN_32BIT;
      	hpet_writel(cfg, HPET_Tn_CFG(timer));
      
      However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
      boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
      causes the periodic interrupt to be not so periodic, and that results in
      the boot time hang I reported earlier in the delay calibration.
      
      My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b13e2464
    • T
      x86: CPA avoid repeated lazy mmu flush · 7ad9de6a
      Thomas Gleixner 提交于
      Impact: Flush the lazy MMU only once
      
      Pending mmu updates only need to be flushed once to bring the
      in-memory pagetable state up to date.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7ad9de6a
    • T
      x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context · 34b0900d
      Thomas Gleixner 提交于
      Impact: Catch cases where lazy MMU state is active in a preemtible context
      
      arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
      the checks in enter/leave will never trigger. Put the preemtible()
      check into arch_flush_lazy_mmu_cpu() to catch such cases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      34b0900d
    • J
      x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption · d85cf93d
      Jeremy Fitzhardinge 提交于
      Impact: avoid access to percpu vars in preempible context
      
      They are intended to be used whenever there's the possibility
      that there's some stale state which is going to be overwritten
      with a queued update, or to force a state change when we may be
      in lazy mode.  Either way, we could end up calling it with
      preemption enabled, so wrap the functions in their own little
      preempt-disable section so they can be safely called in any
      context (though preemption should never be enabled if we're actually
      in a lazy state).
      
      (Move out of line to avoid #include dependencies.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d85cf93d
  8. 12 2月, 2009 2 次提交
  9. 11 2月, 2009 3 次提交
  10. 10 2月, 2009 3 次提交
  11. 09 2月, 2009 5 次提交
  12. 07 2月, 2009 1 次提交
    • R
      x86-64: fix int $0x80 -ENOSYS return · c09249f8
      Roland McGrath 提交于
      One of my past fixes to this code introduced a different new bug.
      When using 32-bit "int $0x80" entry for a bogus syscall number,
      the return value is not correctly set to -ENOSYS.  This only happens
      when neither syscall-audit nor syscall tracing is enabled (i.e., never
      seen if auditd ever started).  Test program:
      
      	/* gcc -o int80-badsys -m32 -g int80-badsys.c
      	   Run on x86-64 kernel.
      	   Note to reproduce the bug you need auditd never to have started.  */
      
      	#include <errno.h>
      	#include <stdio.h>
      
      	int
      	main (void)
      	{
      	  long res;
      	  asm ("int $0x80" : "=a" (res) : "0" (99999));
      	  printf ("bad syscall returns %ld\n", res);
      	  return res != -ENOSYS;
      	}
      
      The fix makes the int $0x80 path match the sysenter and syscall paths.
      Reported-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      c09249f8
  13. 06 2月, 2009 2 次提交