1. 19 8月, 2010 2 次提交
    • J
      x86-32: Separate 1:1 pagetables from swapper_pg_dir · fd89a137
      Joerg Roedel 提交于
      This patch fixes machine crashes which occur when heavily exercising the
      CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
      AMD Erratum 383 and result in a fatal machine check exception. Here's
      the scenario:
      
      1. On 32-bit, the swapper_pg_dir page table is used as the initial page
      table for booting a secondary CPU.
      
      2. To make this work, swapper_pg_dir needs a direct mapping of physical
      memory in it (the low mappings). By adding those low, large page (2M)
      mappings (PAE kernel), we create the necessary conditions for Erratum
      383 to occur.
      
      3. Other CPUs which do not participate in the off- and onlining game may
      use swapper_pg_dir while the low mappings are present (when leave_mm is
      called). For all steps below, the CPU referred to is a CPU that is using
      swapper_pg_dir, and not the CPU which is being onlined.
      
      4. The presence of the low mappings in swapper_pg_dir can result
      in TLB entries for addresses below __PAGE_OFFSET to be established
      speculatively. These TLB entries are marked global and large.
      
      5. When the CPU with such TLB entry switches to another page table, this
      TLB entry remains because it is global.
      
      6. The process then generates an access to an address covered by the
      above TLB entry but there is a permission mismatch - the TLB entry
      covers a large global page not accessible to userspace.
      
      7. Due to this permission mismatch a new 4kb, user TLB entry gets
      established. Further, Erratum 383 provides for a small window of time
      where both TLB entries are present. This results in an uncorrectable
      machine check exception signalling a TLB multimatch which panics the
      machine.
      
      There are two ways to fix this issue:
      
              1. Always do a global TLB flush when a new cr3 is loaded and the
              old page table was swapper_pg_dir. I consider this a hack hard
              to understand and with performance implications
      
              2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
              does.
      
      This patch implements solution 2. It introduces a trampoline_pg_dir
      which has the same layout as swapper_pg_dir with low_mappings. This page
      table is used as the initial page table of the booting CPU. Later in the
      bringup process, it switches to swapper_pg_dir and does a global TLB
      flush. This fixes the crashes in our test cases.
      
      -v2: switch to swapper_pg_dir right after entering start_secondary() so
      that we are able to access percpu data which might not be mapped in the
      trampoline page table.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <20100816123833.GB28147@aftab>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      fd89a137
    • H
      x86, cpu: Fix regression in AMD errata checking code · 07a7795c
      Hans Rosenfeld 提交于
      A bug in the family-model-stepping matching code caused the presence of
      errata to go undetected when OSVW was not used. This causes hangs on
      some K8 systems because the E400 workaround is not enabled.
      Signed-off-by: NHans Rosenfeld <hans.rosenfeld@amd.com>
      LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      07a7795c
  2. 15 8月, 2010 2 次提交
  3. 14 8月, 2010 2 次提交
  4. 13 8月, 2010 4 次提交
  5. 12 8月, 2010 2 次提交
  6. 11 8月, 2010 4 次提交
  7. 10 8月, 2010 3 次提交
    • S
      x86, ia64, smp: use workqueues unconditionally during do_boot_cpu() · d7a7c573
      Suresh Siddha 提交于
      Workqueues are now initialized as part of the early_initcall().  So they
      are available for use during cold boot process aswell.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7a7c573
    • A
      gcc-4.6: mm: fix unused but set warnings · 4e60c86b
      Andi Kleen 提交于
      No real bugs, just some dead code and some fixups.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e60c86b
    • C
      kmap_atomic: make kunmap_atomic() harder to misuse · 597781f3
      Cesar Eduardo Barros 提交于
      kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
      list[1] ("Follow common convention and you'll get it wrong"), except in
      some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].
      
      kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
      takes a pointer to within the page itself.  This seems to once in a while
      trip people up (the convention they are following is the one from
      kunmap()).
      
      Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
      ("The compiler/linker won't let you get it wrong").  This is done by
      refusing to build if the type of its first argument is a pointer to a
      struct page.
      
      The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
      (which is what you would call in case for some strange reason calling it
      with a pointer to a struct page is not incorrect in your code).
      
      The previous version of this patch was compile tested on x86-64.
      
      [1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
      [2] In these cases, it is at level 5, "Do it right or it will always
          break at runtime."
      [3] At least mips and powerpc look very similar, and sparc also seems to
          share a common ancestor with both; there seems to be quite some
          degree of copy-and-paste coding here. The include/asm/highmem.h file
          for these three archs mention x86 CPUs at its top.
      [4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
      [5] As an aside, could someone tell me why mn10300 uses unsigned long as
          the first parameter of kunmap_atomic() instead of void *?
      Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
      Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
      Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
      Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
      Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
      Cc: Helge Deller <deller@gmx.de> (arch/parisc)
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
      Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
      Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
      Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
      Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
      Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
      Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
      Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      597781f3
  8. 09 8月, 2010 2 次提交
  9. 08 8月, 2010 1 次提交
  10. 07 8月, 2010 1 次提交
  11. 06 8月, 2010 1 次提交
    • E
      x86, apic: Map the local apic when parsing the MP table. · 5989cd6a
      Eric W. Biederman 提交于
      This fixes a regression in 2.6.35 from 2.6.34, that is
      present for select models of Intel cpus when people are
      using an MP table.
      
      The commit cf7500c0
      "x86, ioapic: In mpparse use mp_register_ioapic" started
      calling mp_register_ioapic from MP_ioapic_info.  An extremely
      simple change that was obviously correct.  Unfortunately
      mp_register_ioapic did just a little more than the previous
      hand crafted code and so we gained this call path.
      
      The problem call path is:
      MP_ioapic_info()
        mp_register_ioapic()
         io_apic_unique_id()
           io_apic_get_unique_id()
             get_physical_broadcast()
               modern_apic()
                 lapic_get_version()
                   apic_read(APIC_LVR)
      
      Which turned out to be a problem because the local apic
      was not mapped, at that point, unlike the similar point
      in the ACPI parsing code.
      
      This problem is fixed by mapping the local apic when
      parsing the mptable as soon as we reasonably can.
      
      Looking at the number of places we setup the fixmap for
      the local apic, I see some serious simplification opportunities.
      For the moment except for not duplicating the setting up of the
      fixmap in init_apic_mappings, I have not acted on them.
      
      The regression from 2.6.34 is tracked in bug
      https://bugzilla.kernel.org/show_bug.cgi?id=16173
      
      Cc: <stable@kernel.org> 2.6.35
      Reported-by: NDavid Hill <hilld@binarystorm.net>
      Reported-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
      Tested-by: NTvrtko Ursulin <tvrtko.ursulin@sophos.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      LKML-Reference: <m1eiee86jg.fsf_-_@fess.ebiederm.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5989cd6a
  12. 05 8月, 2010 8 次提交
  13. 04 8月, 2010 8 次提交
    • F
      x86, hwmon: Package Level Thermal/Power: power limit · 0199114c
      Fenghua Yu 提交于
      Power limit notification feature is published in Intel 64 and IA-32
      Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.
      
      It is implemented first on Intel Sandy Bridge platform.
      
      The patch handles notification interrupt. Interrupt handler dumps power limit
      information in log_buf, logs the event in mce log, and increases the event
      counters (core_power_limit and package_power_limit). Upper level applications
      could use the data to detect system health or diagnose functionality/performance
      issues.
      
      In the future, the event could be handled in a more fancy way.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0199114c
    • F
      x86, hwmon: Package Level Thermal/Power: thermal throttling handler · 55d435a2
      Fenghua Yu 提交于
      Add package level thermal throttle interrupt support. The interrupt handler
      increases package level thermal throttle count. It also logs the event in MCE
      log.
      
      The package level thermal throttle interrupt happens across threads in a
      package. Each thread handles the interrupt individually. User level application
      is supposed to retrieve correct event count and log based on package/thread
      topology. This is the same situation for core level interrupt handler. In the
      future, interrupt may be reported only per package or per core.
      
      core_throttle_count and package_throttle_count are used for user interface.
      Previously only throttle_count is used for core throttle count. If you think
      new core_throttle_count name breaks user interface, I can change this part.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      55d435a2
    • F
      x86, hwmon: Package Level Thermal/Power: pkgtemp hwmon driver · cb84b194
      Fenghua Yu 提交于
      This patch adds a hwmon driver for package level thermal control. The driver
      dumps package level thermal information through sysfs interface so that upper
      level application (e.g. lm_sensor) can retrive the information.
      
      Instead of having the package level hwmon code in coretemp, I write a seperate
      driver pkgtemp because:
      
      First, package level thermal sensors include not only sensors for each core,
      but also sensors for uncore, memory controller or other components in the
      package. Logically it will be clear to have a seperate hwmon driver for package
      level hwmon to monitor wider range of sensors in a package. Merging package
      thermal driver into core thermal driver doesn't make sense and may mislead.
      
      Secondly, merging the two drivers together may cause coding mess. It's easier
      to include various package level sensors info if more sensor information is
      implemented. Coretemp code needs to consider a lot of legacy machine cases.
      Pkgtemp code only considers platform starting from Sandy Bridge.
      
      On a 1Sx4Cx2T Sandy Bridge platform, lm-sensors dumps the pkgtemp and coretemp:
      
      pkgtemp-isa-0000
      Adapter: ISA adapter
      physical id 0: +33.0°C  (high = +79.0°C, crit = +99.0°C)
      
      coretemp-isa-0000
      Adapter: ISA adapter
      Core 0:      +32.0°C  (high = +79.0°C, crit = +99.0°C)
      
      coretemp-isa-0001
      Adapter: ISA adapter
      Core 1:      +32.0°C  (high = +79.0°C, crit = +99.0°C)
      
      coretemp-isa-0002
      Adapter: ISA adapter
      Core 2:      +32.0°C  (high = +79.0°C, crit = +99.0°C)
      
      coretemp-isa-0003
      Adapter: ISA adapter
      Core 3:      +32.0°C  (high = +79.0°C, crit = +99.0°C)
      
      [ hpa: folded v3 patch removing improper global variable "SHOW" ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      LKML-Reference: <1280448826-12004-3-git-send-email-fenghua.yu@intel.com>
      Reviewed-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cb84b194
    • D
      [CPUFREQ] Remove pointless printk from p4-clockmod. · 9d1f44ee
      Dave Jones 提交于
      The only machines this is triggering on should be supported by
      acpi-cpufreq or acpi's internal throttling.
      Signed-off-by: NDave Jones <davej@redhat.com>
      9d1f44ee
    • H
      [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c · 307069cf
      Holger Freyther 提交于
      Use __cpuinit instead of __init for the cpufreq_driver
      init function like it is done in powernow-k8.c.
      
      This is removing the warning generated when compiling with
      the CONFIG_DEBUG_SECTION_MISMATCH=y option.
      Signed-off-by: NHolger Hans Peter Freyther <holger@moiji-mobile.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      307069cf
    • H
      [CPUFREQ] Fix section mismatch for longhaul_cpu_init. · 2530573e
      Holger Freyther 提交于
      Use __cpuinit instead of __init for the cpufreq_driver
      init function like it is done in powernow-k8.c. Use the
      __cpuinitdata for data used by the routines marked as __cpuinit.
      
      This is removing the warning generated when compiling with
      the CONFIG_DEBUG_SECTION_MISMATCH=y option.
      Signed-off-by: NHolger Hans Peter Freyther <holger@moiji-mobile.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      2530573e
    • H
      [CPUFREQ] Fix section mismatch for longrun_cpu_init. · 7e2d8112
      Holger Freyther 提交于
      Use __cpuinit instead of __init for the cpufreq_driver
      init function like it is done in powernow-k8.c.
      
      This is removing the warning generated when compiling with
      the CONFIG_DEBUG_SECTION_MISMATCH=y option.
      Signed-off-by: NHolger Hans Peter Freyther <holger@moiji-mobile.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      7e2d8112
    • B
      [CPUFREQ] powernow-k8: Fix misleading variable naming · b30d3304
      Borislav Petkov 提交于
      rdmsr() takes the lower 32 bits as a second argument and the high 32 as
      a third. Fix the names accordingly since they were swapped.
      
      There should be no functionality change resulting from this patch.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NDave Jones <davej@redhat.com>
      b30d3304