1. 20 2月, 2009 4 次提交
    • A
      x86, mce: separate correct machine check poller and fatal exception handler · b79109c3
      Andi Kleen 提交于
      Impact: cleanup, performance enhancement
      
      The machine check poller is diverging more and more from the fatal
      exception handler. Instead of adding more special cases separate the code
      paths completely. The corrected poll path is actually quite simple,
      and this doesn't result in much code duplication.
      
      This makes both handlers much easier to read and results in
      cleaner code flow.  The exception handler now only needs to care
      about uncorrected errors, which also simplifies the handling of multiple
      errors. The corrected poller also now always runs in standard interrupt
      context and does not need to do anything special to handle NMI context.
      
      Minor behaviour changes:
      - MCG status is now not cleared on polling.
      - Only the banks which had corrected errors get cleared on polling
      - The exception handler only clears banks with errors now
      
      v2: Forward port to new patch order. Add "uc" argument.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b79109c3
    • A
      x86, mce: factor out duplicated struct mce setup into one function · b5f2fa4e
      Andi Kleen 提交于
      Impact: cleanup
      
      This merely factors out duplicated code to set up
      the initial struct mce state into a single function.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      b5f2fa4e
    • A
      x86, mce: implement dynamic machine check banks support · 0d7482e3
      Andi Kleen 提交于
      Impact: cleanup; making code future proof; memory saving on small systems
      
      This patch replaces the hardcoded max number of machine check banks with 
      dynamic allocation depending on what the CPU reports. The sysfs
      data structures and the banks array are dynamically allocated.
      
      There is still a hard bank limit (128) because the mcelog protocol uses
      banks >= 128 as pseudo banks to escape other events. But we expect
      that 128 banks is beyond any reasonable CPU for now.
      
      This supersedes an earlier patch by Venki, but it solves the problem
      more completely by making the limit fully dynamic (up to the 128
      boundary).
      
      This saves some memory on machines with less than 6 banks because
      they won't need sysdevs for unused ones and also allows to 
      use sysfs to control these banks on possible future CPUs with
      more than 6 banks.
      
      This is an updated patch addressing Venki's comments.  I also added in
      another patch from Thomas which fixed the error allocation path (that
      patch was previously separated)
      
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0d7482e3
    • A
      x86, mce: enable machine checks in 64-bit defconfig · e35849e9
      Andi Kleen 提交于
      Impact: Low priority fix
      
      The 32-bit defconfig already had it enabled. And it's a pretty
      fundamental feature, so better enable it on 64 bits too.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e35849e9
  2. 18 2月, 2009 11 次提交
    • H
      x86, mce: fix a race condition in mce_read() · ef41df43
      Huang Ying 提交于
      Impact: bugfix
      
      Considering the situation as follow:
      
      before: mcelog.next == 1, mcelog.entry[0].finished = 1
      
      +--------------------------------------------------------------------------
      R                   W1                  W2                  W3
      
      read mcelog.next (1)
                          mcelog.next++ (2)
                          (working on entry 1,
                          finished == 0)
      
      mcelog.next = 0
                                              mcelog.next++ (1)
                                              (working on entry 0)
                                                                 mcelog.next++ (2)
                                                                 (working on entry 1)
                              <----------------- race ---------------->
                          (done on entry 1,
                          finished = 1)
                                                                 (done on entry 1,
                                                                 finished = 1)
      
      To fix the race condition, a cmpxchg loop is added to mce_read() to
      ensure no new MCE record can be added between mcelog.next reading and
      mcelog.next = 0.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ef41df43
    • A
      x86, mce: disable machine checks on offlined CPUs · d6b75584
      Andi Kleen 提交于
      Impact: Lower priority bug fix
      
      Offlined CPUs could still get machine checks, but the machine check handler
      cannot handle them properly, leading to an unconditional crash. Disable
      machine checks on CPUs that are going down.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d6b75584
    • A
      x86, mce: don't set up mce sysdev devices with mce=off · 5b4408fd
      Andi Kleen 提交于
      Impact: bug fix, in this case the resume handler shouldn't run which
      	avoids incorrectly reenabling machine checks on resume
      
      When MCEs are completely disabled on the command line don't set
      up the sysdev devices for them either.
      
      Includes a comment fix from Thomas Gleixner.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5b4408fd
    • A
      x86, mce: switch machine check polling to per CPU timer · 52d168e2
      Andi Kleen 提交于
      Impact: Higher priority bug fix
      
      The machine check poller runs a single timer and then broadcasted an
      IPI to all CPUs to check them. This leads to unnecessary
      synchronization between CPUs. The original CPU running the timer has
      to wait potentially a long time for all other CPUs answering. This is
      also real time unfriendly and in general inefficient.
      
      This was especially a problem on systems with a lot of events where
      the poller run with a higher frequency after processing some events.
      There could be more and more CPU time wasted with this, to
      the point of significantly slowing down machines.
      
      The machine check polling is actually fully independent per CPU, so
      there's no reason to not just do this all with per CPU timers.  This
      patch implements that.
      
      Also switch the poller also to use standard timers instead of work
      queues. It was using work queues to be able to execute a user program
      on a event, but mce_notify_user() handles this case now with a
      separate callback. So instead always run the poll code in in a
      standard per CPU timer, which means that in the common case of not
      having to execute a trigger there will be less overhead.
      
      This allows to clean up the initialization significantly, because
      standard timers are already up when machine checks get init'ed.  No
      multiple initialization functions.
      
      Thanks to Thomas Gleixner for some help.
      
      Cc: thockin@google.com
      v2: Use del_timer_sync() on cpu shutdown and don't try to handle
      migrated timers.
      v3: Add WARN_ON for timer running on unexpected CPU
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      52d168e2
    • A
      x86, mce: always use separate work queue to run trigger · 9bd98405
      Andi Kleen 提交于
      Impact: Needed for bug fix in next patch
      
      This relaxes the requirement that mce_notify_user has to run in process
      context. Useful for future changes, but also leads to cleaner
      behaviour now. Now instead mce_notify_user can be called directly
      from interrupt (but not NMI) context.
      
      The work queue only uses a single global work struct, which can be done safely
      because it is always free to reuse before the trigger function is executed.
      This way no events can be lost.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      9bd98405
    • A
      x86, mce: don't disable machine checks during code patching · 123aa76e
      Andi Kleen 提交于
      Impact: low priority bug fix
      
      This removes part of a a patch I added myself some time ago. After some
      consideration the patch was a bad idea. In particular it stopped machine check
      exceptions during code patching.
      
      To quote the comment:
      
              * MCEs only happen when something got corrupted and in this
              * case we must do something about the corruption.
              * Ignoring it is worse than a unlikely patching race.
              * Also machine checks tend to be broadcast and if one CPU
              * goes into machine check the others follow quickly, so we don't
              * expect a machine check to cause undue problems during to code
              * patching.
      
      So undo the machine check related parts of
      8f4e956b NMIs are still disabled.
      
      This only removes code, the only additions are a new comment.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      123aa76e
    • A
      x86, mce: disable machine checks on suspend · 973a2dd1
      Andi Kleen 提交于
      Impact: Bug fix
      
      During suspend it is not reliable to process machine check
      exceptions, because CPUs disappear but can still get machine check
      broadcasts.  Also the system is slightly more likely to
      machine check them, but the handler is typically not a position
      to handle them in a meaningfull way.
      
      So disable them during suspend and enable them during resume.
      
      Also make sure they are always disabled on hot-unplugged CPUs.
      
      This new code assumes that suspend always hotunplugs all
      non BP CPUs.
      
      v2: Remove the WARN_ONs Thomas objected to.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      973a2dd1
    • A
      x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown · 07db1c14
      Andi Kleen 提交于
      Impact: Bugfix
      
      The ifdef for the apic clear on shutdown for the 64bit intel thermal
      vector was incorrect and never triggered. Fix that.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      07db1c14
    • A
      x86, mce: use force_sig_info to kill process in machine check · 380851bc
      Andi Kleen 提交于
      Impact: bug fix (with tolerant == 3)
      
      do_exit cannot be called directly from the exception handler because
      it can sleep and the exception handler runs on the exception stack.
      Use force_sig() instead.
      
      Based on a earlier patch by Ying Huang who debugged the problem.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      380851bc
    • A
      x86, mce: reinitialize per cpu features on resume · 6ec68bff
      Andi Kleen 提交于
      Impact: Bug fix
      
      This fixes a long standing bug in the machine check code. On resume the
      boot CPU wouldn't get its vendor specific state like thermal handling
      reinitialized. This means the boot cpu wouldn't ever get any thermal
      events reported again.
      
      Call the respective initialization functions on resume
      
      v2: Remove ancient init because they don't have a resume device anyways.
          Pointed out by Thomas Gleixner.
      v3: Now fix the Subject too to reflect v2 change
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6ec68bff
    • P
      x86, rcu: fix strange load average and ksoftirqd behavior · bf51935f
      Paul E. McKenney 提交于
      Damien Wyart reported high ksoftirqd CPU usage (20%) on an
      otherwise idle system.
      
      The function-graph trace Damien provided:
      
      >   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
      >   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
      >   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      >   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
      > [ . . . ]
      
      Shows rcu_check_callbacks() being invoked way too often. It should be called
      once per jiffy, and here it is called no less than 22 times in about
      3.5 milliseconds, meaning one call every 160 microseconds or so.
      
      Why do we need to call rcu_pending() and rcu_check_callbacks() from the
      idle loop of 32-bit x86, especially given that no other architecture does
      this?
      
      The following patch removes the call to rcu_pending() and
      rcu_check_callbacks() from the x86 32-bit idle loop in order to
      reduce the softirq load on idle systems.
      Reported-by: NDamien Wyart <damien.wyart@free.fr>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bf51935f
  3. 15 2月, 2009 2 次提交
    • T
      x86, vm86: fix preemption bug · be716615
      Thomas Gleixner 提交于
      Commit 3d2a71a5 ("x86, traps: converge
      do_debug handlers") changed the preemption disable logic of do_debug()
      so vm86_handle_trap() is called with preemption disabled resulting in:
      
       BUG: sleeping function called from invalid context at include/linux/kernel.h:155
       in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
       Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
       Call Trace:
        [<c050d669>] copy_to_user+0x33/0x108
        [<c04181f4>] save_v86_state+0x65/0x149
        [<c0418531>] handle_vm86_trap+0x20/0x8f
        [<c064e345>] do_debug+0x15b/0x1a4
        [<c064df1f>] debug_stack_correct+0x27/0x2c
        [<c040365b>] sysenter_do_call+0x12/0x2f
       BUG: scheduling while atomic: dosemu.bin/3005/0x10000001
      
      Restore the original calling convention and reenable preemption before
      calling handle_vm86_trap().
      Reported-by: NMichal Suchanek <hramrach@centrum.cz>
      Cc: stable@kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      be716615
    • C
      x86, olpc: fix model detection without OFW · e49590b6
      Chris Ball 提交于
      Impact: fix "garbled display, laptop is unusable" bug
      
      Commit e51a1ac2 ("x86, olpc: fix endian
      bug in openfirmware workaround") breaks model comparison on OLPC; the value
      0xc2 needs to be scaled up by olpc_board().
      
      The pre-patch version was wrong, but accidentally worked anyway
      (big-endian 0xc2 is big enough to satisfy all other board revisions,
      but little endian 0xc2 is not).
      Signed-off-by: NChris Ball <cjb@laptop.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NAndres Salomon <dilinger@queued.net>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e49590b6
  4. 13 2月, 2009 4 次提交
    • J
      x86, hpet: fix for LS21 + HPET = boot hang · b13e2464
      john stultz 提交于
      Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
      systems that had the HPET enabled in the BIOS, resulting in boot hangs
      for x86_64.
      
      Specifically commit b8ce3359, which
      merges the i386 and x86_64 HPET code.
      
      Prior to this commit, when we setup the HPET timers in x86_64, we did
      the following:
      
      	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                          HPET_TN_32BIT, HPET_T0_CFG);
      
      However after the i386/x86_64 HPET merge, we do the following:
      
      	cfg = hpet_readl(HPET_Tn_CFG(timer));
      	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
      			HPET_TN_SETVAL | HPET_TN_32BIT;
      	hpet_writel(cfg, HPET_Tn_CFG(timer));
      
      However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
      boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
      causes the periodic interrupt to be not so periodic, and that results in
      the boot time hang I reported earlier in the delay calibration.
      
      My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b13e2464
    • T
      x86: CPA avoid repeated lazy mmu flush · 7ad9de6a
      Thomas Gleixner 提交于
      Impact: Flush the lazy MMU only once
      
      Pending mmu updates only need to be flushed once to bring the
      in-memory pagetable state up to date.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7ad9de6a
    • T
      x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context · 34b0900d
      Thomas Gleixner 提交于
      Impact: Catch cases where lazy MMU state is active in a preemtible context
      
      arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
      the checks in enter/leave will never trigger. Put the preemtible()
      check into arch_flush_lazy_mmu_cpu() to catch such cases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      34b0900d
    • J
      x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption · d85cf93d
      Jeremy Fitzhardinge 提交于
      Impact: avoid access to percpu vars in preempible context
      
      They are intended to be used whenever there's the possibility
      that there's some stale state which is going to be overwritten
      with a queued update, or to force a state change when we may be
      in lazy mode.  Either way, we could end up calling it with
      preemption enabled, so wrap the functions in their own little
      preempt-disable section so they can be safely called in any
      context (though preemption should never be enabled if we're actually
      in a lazy state).
      
      (Move out of line to avoid #include dependencies.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d85cf93d
  5. 12 2月, 2009 2 次提交
  6. 11 2月, 2009 2 次提交
    • M
      x86, ptrace, mm: fix double-free on race · 9f339e70
      Markus Metzger 提交于
      Ptrace_detach() races with __ptrace_unlink() if the traced task is
      reaped while detaching. This might cause a double-free of the BTS
      buffer.
      
      Change the ptrace_detach() path to only do the memory accounting in
      ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
      which will be called from __ptrace_unlink().
      
      The fix follows a proposal from Oleg Nesterov.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9f339e70
    • O
      ptrace, x86: fix the usage of ptrace_fork() · 06eb23b1
      Oleg Nesterov 提交于
      I noticed by pure accident we have ptrace_fork() and friends. This was
      added by "x86, bts: add fork and exit handling", commit
      bf53de90.
      
      I can't test this, ds_request_bts() returns -EOPNOTSUPP, but I strongly
      believe this needs the fix. I think something like this program
      
      	int main(void)
      	{
      		int pid = fork();
      
      		if (!pid) {
      			ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      			kill(getpid(), SIGSTOP);
      			fork();
      		} else {
      			struct ptrace_bts_config bts = {
      				.flags = PTRACE_BTS_O_ALLOC,
      				.size  = 4 * 4096,
      			};
      
      			wait(NULL);
      
      			ptrace(PTRACE_SETOPTIONS, pid, NULL, PTRACE_O_TRACEFORK);
      			ptrace(PTRACE_BTS_CONFIG, pid, &bts, sizeof(bts));
      			ptrace(PTRACE_CONT, pid, NULL, NULL);
      
      			sleep(1);
      		}
      
      		return 0;
      	}
      
      should crash the kernel.
      
      If the task is traced by its natural parent ptrace_reparented() returns 0
      but we should clear ->btsxxx anyway.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      06eb23b1
  7. 10 2月, 2009 2 次提交
    • C
      i8327: fix outb() parameter order · b52af409
      Clemens Ladisch 提交于
      In i8237A_resume(), when resetting the DMA controller, the parameters to
      dma_outb() were mixed up.
      Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
      [ cleaned up the file a tiny bit. ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b52af409
    • T
      x86: fix math_emu register frame access · d315760f
      Tejun Heo 提交于
      do_device_not_available() is the handler for #NM and it declares that
      it takes a unsigned long and calls math_emu(), which takes a long
      argument and surprisingly expects the stack frame starting at the zero
      argument would match struct math_emu_info, which isn't true regardless
      of configuration in the current code.
      
      This patch makes do_device_not_available() take struct pt_regs like
      other exception handlers and initialize struct math_emu_info with
      pointer to it and pass pointer to the math_emu_info to math_emulate()
      like normal C functions do.  This way, unless gcc makes a copy of
      struct pt_regs in do_device_not_available(), the register frame is
      correctly accessed regardless of kernel configuration or compiler
      used.
      
      This doesn't fix all math_emu problems but it at least gets it
      somewhat working.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d315760f
  8. 09 2月, 2009 5 次提交
  9. 05 2月, 2009 5 次提交
  10. 04 2月, 2009 3 次提交