1. 08 10月, 2010 1 次提交
  2. 24 9月, 2010 2 次提交
  3. 15 9月, 2010 4 次提交
    • R
      x86-64, compat: Retruncate rax after ia32 syscall entry tracing · eefdca04
      Roland McGrath 提交于
      In commit d4d67150, we reopened an old hole for a 64-bit ptracer touching a
      32-bit tracee in system call entry.  A %rax value set via ptrace at the
      entry tracing stop gets used whole as a 32-bit syscall number, while we
      only check the low 32 bits for validity.
      
      Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
      in addition to testing the full 64 bits as has already been added.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      eefdca04
    • H
      x86-64, compat: Test %rax for the syscall number, not %eax · 36d001c7
      H. Peter Anvin 提交于
      On 64 bits, we always, by necessity, jump through the system call
      table via %rax.  For 32-bit system calls, in theory the system call
      number is stored in %eax, and the code was testing %eax for a valid
      system call number.  At one point we loaded the stored value back from
      the stack to enforce zero-extension, but that was removed in checkin
      d4d67150.  An actual 32-bit process
      will not be able to introduce a non-zero-extended number, but it can
      happen via ptrace.
      
      Instead of re-introducing the zero-extension, test what we are
      actually going to use, i.e. %rax.  This only adds a handful of REX
      prefixes to the code.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@kernel.org>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      36d001c7
    • H
      compat: Make compat_alloc_user_space() incorporate the access_ok() · c41d68a5
      H. Peter Anvin 提交于
      compat_alloc_user_space() expects the caller to independently call
      access_ok() to verify the returned area.  A missing call could
      introduce problems on some architectures.
      
      This patch incorporates the access_ok() check into
      compat_alloc_user_space() and also adds a sanity check on the length.
      The existing compat_alloc_user_space() implementations are renamed
      arch_compat_alloc_user_space() and are used as part of the
      implementation of the new global function.
      
      This patch assumes NULL will cause __get_user()/__put_user() to either
      fail or access userspace on all architectures.  This should be
      followed by checking the return value of compat_access_user_space()
      for NULL in the callers, at which time the access_ok() in the callers
      can also be removed.
      Reported-by: NBen Hawkes <hawkes@sota.gen.nz>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NChris Metcalf <cmetcalf@tilera.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: <stable@kernel.org>
      c41d68a5
    • T
      x86: hpet: Work around hardware stupidity · 54ff7e59
      Thomas Gleixner 提交于
      This more or less reverts commits 08be9796 (x86: Force HPET
      readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
      read back to affected ATI chipsets) to the status of commit 8da854cb
      (x86, hpet: Erratum workaround for read after write of HPET
      comparator).
      
      The delta to commit 8da854cb is mostly comments and the change from
      WARN_ONCE to printk_once as we know the call path of this function
      already.
      
      This needs really in depth explanation:
      
      First of all the HPET design is a complete failure. Having a counter
      compare register which generates an interrupt on matching values
      forces the software to do at least one superfluous readback of the
      counter register.
      
      While it is nice in theory to program "absolute" time events it is
      practically useless because the timer runs at some absurd frequency
      which can never be matched to real world units. So we are forced to
      calculate a relative delta and this forces a readout of the actual
      counter value, adding the delta and programming the compare
      register. When the delta is small enough we run into the danger that
      we program a compare value which is already in the past. Due to the
      compare for equal nature of HPET we need to read back the counter
      value after writing the compare rehgister (btw. this is necessary for
      absolute timeouts as well) to make sure that we did not miss the timer
      event. We try to work around that by setting the minimum delta to a
      value which is larger than the theoretical time which elapses between
      the counter readout and the compare register write, but that's only
      true in theory. A NMI or SMI which hits between the readout and the
      write can easily push us beyond that limit. This would result in
      waiting for the next HPET timer interrupt until the 32bit wraparound
      of the counter happens which takes about 306 seconds.
      
      So we designed the next event function to look like:
      
         match = read_cnt() + delta;
         write_compare_ref(match);
         return read_cnt() < match ? 0 : -ETIME;
      
      At some point we got into trouble with certain ATI chipsets. Even the
      above "safe" procedure failed. The reason was that the write to the
      compare register was delayed probably for performance reasons. The
      theory was that they wanted to avoid the synchronization of the write
      with the HPET clock, which is understandable. So the write does not
      hit the compare register directly instead it goes to some intermediate
      register which is copied to the real compare register in sync with the
      HPET clock. That opens another window for hitting the dreaded "wait
      for a wraparound" problem.
      
      To work around that "optimization" we added a read back of the compare
      register which either enforced the update of the just written value or
      just delayed the readout of the counter enough to avoid the issue. We
      unfortunately never got any affirmative info from ATI/AMD about this.
      
      One thing is sure, that we nuked the performance "optimization" that
      way completely and I'm pretty sure that the result is worse than
      before some HW folks came up with those.
      
      Just for paranoia reasons I added a check whether the read back
      compare register value was the same as the value we wrote right
      before. That paranoia check triggered a couple of years after it was
      added on an Intel ICH9 chipset. Venki added a workaround (commit
      8da854cb) which was reading the compare register twice when the first
      check failed. We considered this to be a penalty in general and
      restricted the readback (thus the wasted CPU cycles) to the known to
      be affected ATI chipsets.
      
      This turned out to be a utterly wrong decision. 2.6.35 testers
      experienced massive problems and finally one of them bisected it down
      to commit 30a564be which spured some further investigation.
      
      Finally we got confirmation that the write to the compare register can
      be delayed by up to two HPET clock cycles which explains the problems
      nicely. All we can do about this is to go back to Venki's initial
      workaround in a slightly modified version.
      
      Just for the record I need to say, that all of this could have been
      avoided if hardware designers and of course the HPET committee would
      have thought about the consequences for a split second. It's out of my
      comprehension why designing a working timer is so hard. There are two
      ways to achieve it:
      
       1) Use a counter wrap around aware compare_reg <= counter_reg
          implementation instead of the easy compare_reg == counter_reg
      
          Downsides:
      
      	- It needs more silicon.
      
      	- It needs a readout of the counter to apply a relative
      	  timeout. This is necessary as the counter does not run in
      	  any useful (and adjustable) frequency and there is no
      	  guarantee that the counter which is used for timer events is
      	  the same which is used for reading the actual time (and
      	  therefor for calculating the delta)
      
          Upsides:
      
      	- None
      
        2) Use a simple down counter for relative timer events
      
          Downsides:
      
      	- Absolute timeouts are not possible, which is not a problem
      	  at all in the context of an OS and the expected
      	  max. latencies/jitter (also see Downsides of #1)
      
         Upsides:
      
      	- It needs less or equal silicon.
      
      	- It works ALWAYS
      
      	- It is way faster than a compare register based solution (One
      	  write versus one write plus at least one and up to four
      	  reads)
      
      I would not be so grumpy about all of this, if I would not have been
      ignored for many years when pointing out these flaws to various
      hardware folks. I really hate timers (at least those which seem to be
      designed by janitors).
      
      Though finally we got a reasonable explanation plus a solution and I
      want to thank all the folks involved in chasing it down and providing
      valuable input to this.
      Bisected-by: NNix <nix@esperi.org.uk>
      Reported-by: NArtur Skawina <art.08.09@gmail.com>
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Reported-by: NJohn Drescher <drescherjm@gmail.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: stable@kernel.org
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      54ff7e59
  4. 14 9月, 2010 2 次提交
  5. 11 9月, 2010 2 次提交
  6. 10 9月, 2010 1 次提交
  7. 09 9月, 2010 3 次提交
  8. 05 9月, 2010 2 次提交
    • A
      x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks · 1389298f
      Andreas Herrmann 提交于
      kobject_add_internal failed for threshold_bank2 with -EEXIST,
      don't try to register things with the same name in the same
      directory:
      
        Pid: 1, comm: swapper Tainted: G        W  2.6.31 #1
        Call Trace:
        [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180
        [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b
        [<ffffffff81161793>] ? kobject_init+0x42/0x82
        [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63
        [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259
        [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8
        [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80
        [<ffffffff81646458>] ? threshold_init_device+0x0/0x80
        [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143
        [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2
        [<ffffffff8100c8da>] ? child_rip+0xa/0x20
        [<ffffffff81641254>] ? kernel_init+0x0/0x1a2
        [<ffffffff8100c8d0>] ? child_rip+0x0/0x20
        kobject_create_and_add: kobject_add error: -17
      
      (Probably the for_each_cpu loop should be entirely removed.)
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      LKML-Reference: <20100827092006.GB5348@loge.amd.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1389298f
    • F
      x86: Fix the address space annotations of iomap_atomic_prot_pfn() · cc1a8e52
      Francisco Jerez 提交于
      This patch fixes the sparse warnings when the return pointer of
      iomap_atomic_prot_pfn() is used as an argument of iowrite32()
      and friends.
      Signed-off-by: NFrancisco Jerez <currojerez@riseup.net>
      LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc1a8e52
  9. 03 9月, 2010 3 次提交
    • R
      perf, x86: Try to handle unknown nmis with an enabled PMU · 4177c42a
      Robert Richter 提交于
      When the PMU is enabled it is valid to have unhandled nmis, two
      events could trigger 'simultaneously' raising two back-to-back
      NMIs. If the first NMI handles both, the latter will be empty
      and daze the CPU.
      
      The solution to avoid an 'unknown nmi' massage in this case was
      simply to stop the nmi handler chain when the PMU is enabled by
      stating the nmi was handled. This has the drawback that a) we
      can not detect unknown nmis anymore, and b) subsequent nmi
      handlers are not called.
      
      This patch addresses this. Now, we check this unknown NMI if it
      could be a PMU back-to-back NMI. Otherwise we pass it and let
      the kernel handle the unknown nmi.
      
      This is a debug log:
      
       cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
       cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
       cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
       cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
       cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
       cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
       cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
       cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
       cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
       cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
       cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
       cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
       cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
       cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
       cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
       cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
       Uhhuh. NMI received for unknown reason 00 on CPU 6.
       Do you have a strange power saving mode enabled?
       Dazed and confused, but trying to continue
      
      Deltas:
      
       nmi #32334 340186
       nmi #32335 1327704
       nmi #32336 1819      <<<< back-to-back nmi [1]
       nmi #32337 85961
       nmi #32338 284507
       nmi #32339 1578809
       nmi #32340 217616
       nmi #32341 1798      <<<< back-to-back nmi [2]
       nmi #32342 240913
       nmi #32343 1512809
       nmi #32344 116672
       nmi #32345 412453
       nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
       nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one
       counter nmi #32348 1773      <<<< 3rd nmi (back-to-back)
       handling no counter! [3]
      
      For  back-to-back nmi detection there are the following rules:
      
      The PMU nmi handler was handling more than one counter and no
      counter was handled in the subsequent nmi (see [1] and [2]
      above).
      
      There is another case if there are two subsequent back-to-back
      nmis [3]. The 2nd is detected as back-to-back because the first
      handled more than one counter. If the second handles one counter
      and the 3rd handles nothing, we drop the 3rd nmi because it
      could be a back-to-back nmi.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      [ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4177c42a
    • P
      perf, x86: Fix handle_irq return values · de725dec
      Peter Zijlstra 提交于
      Now that we rely on the number of handled overflows, ensure all
      handle_irq implementations actually return the right number.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: robert.richter@amd.com
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      de725dec
    • D
      perf, x86: Fix accidentally ack'ing a second event on intel perf counter · 2e556b5b
      Don Zickus 提交于
      During testing of a patch to stop having the perf subsytem
      swallow nmis, it was uncovered that Nehalem boxes were randomly
      getting unknown nmis when using the perf tool.
      
      Moving the ack'ing of the PMI closer to when we get the status
      allows the hardware to properly re-set the PMU bit signaling
      another PMI was triggered during the processing of the first
      PMI.  This allows the new logic for dealing with the
      shortcomings of multiple PMIs to handle the extra NMI by
      'eat'ing it later.
      
      Now one can wonder why are we getting a second PMI when we
      disable all the PMUs in the begining of the NMI handler to
      prevent such a case, for that I do not know.  But I know the fix
      below helps deal with this quirk.
      
      Tested on multiple Nehalems where the problem was occuring.
      With the patch, the code now loops a second time to handle the
      second PMI (whereas before it was not).
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: peterz@infradead.org
      Cc: robert.richter@amd.com
      Cc: gorcunov@gmail.com
      Cc: fweisbec@gmail.com
      Cc: ying.huang@intel.com
      Cc: ming.m.lin@intel.com
      Cc: eranian@google.com
      LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e556b5b
  10. 02 9月, 2010 1 次提交
    • R
      oprofile, x86: fix init_sysfs() function stub · 269f45c2
      Robert Richter 提交于
      The use of the return value of init_sysfs() with commit
      
       10f0412f oprofile, x86: fix init_sysfs error handling
      
      discovered the following build error for !CONFIG_PM:
      
       .../linux/arch/x86/oprofile/nmi_int.c: In function ‘op_nmi_init’:
       .../linux/arch/x86/oprofile/nmi_int.c:784: error: expected expression before ‘do’
       make[2]: *** [arch/x86/oprofile/nmi_int.o] Error 1
       make[1]: *** [arch/x86/oprofile] Error 2
      
      This patch fixes this.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: stable@kernel.org
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      269f45c2
  11. 31 8月, 2010 1 次提交
  12. 25 8月, 2010 2 次提交
  13. 23 8月, 2010 2 次提交
  14. 22 8月, 2010 1 次提交
  15. 21 8月, 2010 1 次提交
  16. 20 8月, 2010 3 次提交
    • S
      x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states · cd7240c0
      Suresh Siddha 提交于
      TSC's get reset after suspend/resume (even on cpu's with invariant TSC
      which runs at a constant rate across ACPI P-, C- and T-states). And in
      some systems BIOS seem to reinit TSC to arbitrary large value (still
      sync'd across cpu's) during resume.
      
      This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
      than rq->age_stamp (introduced in 2.6.32). This leads to a big value
      returned by scale_rt_power() and the resulting big group power set by the
      update_group_power() is causing improper load balancing between busy and
      idle cpu's after suspend/resume.
      
      This resulted in multi-threaded workloads (like kernel-compilation) go
      slower after suspend/resume cycle on core i5 laptops.
      
      Fix this by recomputing cyc2ns_offset's during resume, so that
      sched_clock() continues from the point where it was left off during
      suspend.
      Reported-by: NFlorian Pritz <flo@xssn.at>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <stable@kernel.org> # [v2.6.32+]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd7240c0
    • D
      x86, apic: Fix apic=debug boot crash · 05e40760
      Daniel Kiper 提交于
      Fix a boot crash when apic=debug is used and the APIC is
      not properly initialized.
      
      This issue appears during Xen Dom0 kernel boot but the
      fix is generic and the crash could occur on real hardware
      as well.
      Signed-off-by: NDaniel Kiper <dkiper@net-space.pl>
      Cc: xen-devel@lists.xensource.com
      Cc: konrad.wilk@oracle.com
      Cc: jeremy@goop.org
      Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x
      LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      05e40760
    • B
      x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues · d7c53c9e
      Borislav Petkov 提交于
      When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
      Stuck ??" message due to multiple cores concurrently accessing the
      cpu_callin_mask, among others.
      
      Since these codepaths are not protected from concurrent access due to
      the fact that there's no sane reason for making an already complex
      code unnecessarily more complex - we hit the issue only when insanely
      switching cores off- and online - serialize hotplugging cores on the
      sysfs level and be done with it.
      
      [ v2.1: fix !HOTPLUG_CPU build ]
      
      Cc: <stable@kernel.org>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <20100819181029.GC17171@aftab>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d7c53c9e
  17. 19 8月, 2010 4 次提交
    • K
      kprobes/x86: Fix the return address of multiple kretprobes · 737480a0
      KUMANO Syuhei 提交于
      Fix the return address of subsequent kretprobes when multiple
      kretprobes are set on the same function.
      
      For example:
      
       # cd /sys/kernel/debug/tracing
       # echo "r:event1 sys_symlink" > kprobe_events
       # echo "r:event2 sys_symlink" >> kprobe_events
       # echo 1 > events/kprobes/enable
       # ln -s /tmp/foo /tmp/bar
      
      (without this patch)
      
       # cat trace
                    ln-897   [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
                    ln-897   [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
      
      (with this patch)
      
       # cat trace
                    ln-740   [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
                    ln-740   [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)
      Signed-off-by: NKUMANO Syuhei <kumano.prog@gmail.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      LKML-Reference: <1281853084.3254.11.camel@camp10-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      737480a0
    • H
      x86-32: Fix dummy trampoline-related inline stubs · 8848a910
      H. Peter Anvin 提交于
      Fix dummy inline stubs for trampoline-related functions when no
      trampolines exist (until we get rid of the no-trampoline case
      entirely.)
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Joerg Roedel <joerg.roedel@amd.com>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      LKML-Reference: <4C6C294D.3030404@zytor.com>
      8848a910
    • J
      x86-32: Separate 1:1 pagetables from swapper_pg_dir · fd89a137
      Joerg Roedel 提交于
      This patch fixes machine crashes which occur when heavily exercising the
      CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
      AMD Erratum 383 and result in a fatal machine check exception. Here's
      the scenario:
      
      1. On 32-bit, the swapper_pg_dir page table is used as the initial page
      table for booting a secondary CPU.
      
      2. To make this work, swapper_pg_dir needs a direct mapping of physical
      memory in it (the low mappings). By adding those low, large page (2M)
      mappings (PAE kernel), we create the necessary conditions for Erratum
      383 to occur.
      
      3. Other CPUs which do not participate in the off- and onlining game may
      use swapper_pg_dir while the low mappings are present (when leave_mm is
      called). For all steps below, the CPU referred to is a CPU that is using
      swapper_pg_dir, and not the CPU which is being onlined.
      
      4. The presence of the low mappings in swapper_pg_dir can result
      in TLB entries for addresses below __PAGE_OFFSET to be established
      speculatively. These TLB entries are marked global and large.
      
      5. When the CPU with such TLB entry switches to another page table, this
      TLB entry remains because it is global.
      
      6. The process then generates an access to an address covered by the
      above TLB entry but there is a permission mismatch - the TLB entry
      covers a large global page not accessible to userspace.
      
      7. Due to this permission mismatch a new 4kb, user TLB entry gets
      established. Further, Erratum 383 provides for a small window of time
      where both TLB entries are present. This results in an uncorrectable
      machine check exception signalling a TLB multimatch which panics the
      machine.
      
      There are two ways to fix this issue:
      
              1. Always do a global TLB flush when a new cr3 is loaded and the
              old page table was swapper_pg_dir. I consider this a hack hard
              to understand and with performance implications
      
              2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
              does.
      
      This patch implements solution 2. It introduces a trampoline_pg_dir
      which has the same layout as swapper_pg_dir with low_mappings. This page
      table is used as the initial page table of the booting CPU. Later in the
      bringup process, it switches to swapper_pg_dir and does a global TLB
      flush. This fixes the crashes in our test cases.
      
      -v2: switch to swapper_pg_dir right after entering start_secondary() so
      that we are able to access percpu data which might not be mapped in the
      trampoline page table.
      Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
      LKML-Reference: <20100816123833.GB28147@aftab>
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      fd89a137
    • H
      x86, cpu: Fix regression in AMD errata checking code · 07a7795c
      Hans Rosenfeld 提交于
      A bug in the family-model-stepping matching code caused the presence of
      errata to go undetected when OSVW was not used. This causes hangs on
      some K8 systems because the E400 workaround is not enabled.
      Signed-off-by: NHans Rosenfeld <hans.rosenfeld@amd.com>
      LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      07a7795c
  18. 18 8月, 2010 3 次提交
  19. 17 8月, 2010 2 次提交