1. 30 1月, 2008 4 次提交
  2. 25 1月, 2008 1 次提交
  3. 17 11月, 2007 1 次提交
    • A
      x86: fix cpu-hotplug regression · 90367556
      Andreas Herrmann 提交于
      Commit d435d862
      ("cpu hotplug: mce: fix cpu hotplug error handling")
      changed the error handling in mce_cpu_callback.
      
      In cases where not all CPUs are brought up during
      boot (e.g. using maxcpus and additional_cpus parameters)
      mce_cpu_callback now returns NOTFIY_BAD because
      for such CPUs cpu_data is not completely filled when
      the notifier is called. Thus mce_create_device fails right
      at its beginning:
      
              if (!mce_available(&cpu_data[cpu]))
                      return -EIO;
      
      As a quick fix I suggest to check boot_cpu_data for MCE.
      
      To reproduce this regression:
      
      (1) boot with maxcpus=2 addtional_cpus=2 on a 4 CPU x86-64 system
      (2) # echo 1 >/sys/devices/system/cpu/cpu2/online
        -bash: echo: write error: Invalid argument
      
      dmesg shows:
      
      _cpu_up: attempt to bring up CPU 2 failed
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      90367556
  4. 15 11月, 2007 1 次提交
    • A
      x86: don't call mce_create_device on CPU_UP_PREPARE · bae19fe0
      Andreas Herrmann 提交于
      Fix regression introduced with d435d862
      ("cpu hotplug: mce: fix cpu hotplug error handling").
      
      A CPU which was not brought up during boot (using maxcpus and
      additional_cpus parameters) couldn't be onlined anymore.  For such a CPU it
      seemed that MCE was not supported during CPU_UP_PREPARE-time which caused
      mce_cpu_callback to return NOTIFY_BAD to notifier_call_chain.  To fix this
      we:
      
       - call mce_create_device for CPU_ONLINE event (instead of CPU_UP_PREPARE),
       - avoid mce_remove_device() for the CPU that is not correctly initialized
         by mce_create_device() failure,
       - make mce_cpu_callback always return NOTIFY_OK for CPU_ONLINE event.
         Because CPU_ONLINE callback return value is always ignored.
      
      [akinobu.mita@gmail.com: avoid mce_remove_device() for not initialized device]
      [akinobu.mita@gmail.com: make mce_cpu_callback always return NOTIFY_OK]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bae19fe0
  5. 24 10月, 2007 2 次提交
  6. 20 10月, 2007 2 次提交
  7. 19 10月, 2007 1 次提交
    • A
      cpu hotplug: mce: fix cpu hotplug error handling · d435d862
      Akinobu Mita 提交于
      - Clear kobject in percpu device_mce before calling sysdev_register() with
      
        Because mce_create_device() may fail and it leaves kobject filled with
        junk. It will be the problem when mce_create_device() will be called
        next time.
      
      - Fix error handling in mce_create_device()
      
        Error handling should not do sysdev_remove_file() with not yet added
        attributes.
      
      - Don't register hotcpu notifier when mce_create_device() returns error
      
      - Do mce_create_device() in CPU_UP_PREPARE instead of CPU_ONLINE
      
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d435d862
  8. 18 10月, 2007 1 次提交
  9. 17 10月, 2007 1 次提交
  10. 11 10月, 2007 2 次提交
  11. 23 7月, 2007 1 次提交
  12. 22 7月, 2007 4 次提交
    • V
      x86: round_jiffies() for i386 and x86-64 non-critical/corrected MCE polling · 22293e58
      Venki Pallipadi 提交于
      This helps to reduce the frequency at which the CPU must be taken out of a
      lower-power state.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Acked-by: NTim Hockin <thockin@hockin.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22293e58
    • T
      x86_64: mcelog tolerant level cleanup · bd78432c
      Tim Hockin 提交于
      Background:
       The MCE handler has several paths that it can take, depending on various
       conditions of the MCE status and the value of the 'tolerant' knob.  The
       exact semantics are not well defined and the code is a bit twisty.
      
      Description:
       This patch makes the MCE handler's behavior more clear by documenting the
       behavior for various 'tolerant' levels.  It also fixes or enhances
       several small things in the handler.  Specifically:
           * If RIPV is set it is not safe to restart, so set the 'no way out'
             flag rather than the 'kill it' flag.
           * Don't panic() on correctable MCEs.
           * If the _OVER bit is set *and* the _UC bit is set (meaning possibly
             dropped uncorrected errors), set the 'no way out' flag.
           * Use EIPV for testing whether an app can be killed (SIGBUS) rather
             than RIPV.  According to docs, EIPV indicates that the error is
             related to the IP, while RIPV simply means the IP is valid to
             restart from.
           * Don't clear the MCi_STATUS registers until after the panic() path.
             This leaves the status bits set after the panic() so clever BIOSes
             can find them (and dumb BIOSes can do nothing).
      
       This patch also calls nonseekable_open() in mce_open (as suggested by akpm).
      
      Result:
       Tolerant levels behave almost identically to how they always have, but
       not it's well defined.  There's a slightly higher chance of panic()ing
       when multiple errors happen (a good thing, IMHO).  If you take an MBE and
       panic(), the error status bits are not cleared.
      
      Alternatives:
       None.
      
      Testing:
       I used software to inject correctable and uncorrectable errors.  With
       tolerant = 3, the system usually survives.  With tolerant = 2, the system
       usually panic()s (PCC) but not always.  With tolerant = 1, the system
       always panic()s.  When the system panic()s, the BIOS is able to detect
       that the cause of death was an MC4.  I was not able to reproduce the
       case of a non-PCC error in userspace, with EIPV, with (tolerant < 3).
       That will be rare at best.
      Signed-off-by: NTim Hockin <thockin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd78432c
    • T
      x86_64: support poll() on /dev/mcelog · e02e68d3
      Tim Hockin 提交于
      Background:
       /dev/mcelog is typically polled manually.  This is less than optimal for
       situations where accurate accounting of MCEs is important.  Calling
       poll() on /dev/mcelog does not work.
      
      Description:
       This patch adds support for poll() to /dev/mcelog.  This results in
       immediate wakeup of user apps whenever the poller finds MCEs.  Because
       the exception handler can not take any locks, it can not call the wakeup
       itself.  Instead, it uses a thread_info flag (TIF_MCE_NOTIFY) which is
       caught at the next return from interrupt or exit from idle, calling the
       mce_user_notify() routine.  This patch also disables the "fake panic"
       path of the mce_panic(), because it results in printk()s in the exception
       handler and crashy systems.
      
       This patch also does some small cleanup for essentially unused variables,
       and moves the user notification into the body of the poller, so it is
       only called once per poll, rather than once per CPU.
      
      Result:
       Applications can now poll() on /dev/mcelog.  When an error is logged
       (whether through the poller or through an exception) the applications are
       woken up promptly.  This should not affect any previous behaviors.  If no
       MCEs are being logged, there is no overhead.
      
      Alternatives:
       I considered simply supporting poll() through the poller and not using
       TIF_MCE_NOTIFY at all.  However, the time between an uncorrectable error
       happening and the user application being notified is *the*most* critical
       window for us.  Many uncorrectable errors can be logged to the network if
       given a chance.
      
       I also considered doing the MCE poll directly from the idle notifier, but
       decided that was overkill.
      
      Testing:
       I used an error-injecting DIMM to create lots of correctable DRAM errors
       and verified that my user app is woken up in sync with the polling interval.
       I also used the northbridge to inject uncorrectable ECC errors, and
       verified (printk() to the rescue) that the notify routine is called and the
       user app does wake up.  I built with PREEMPT on and off, and verified
       that my machine survives MCEs.
      
      [wli@holomorphy.com: build fix]
      Signed-off-by: NTim Hockin <thockin@google.com>
      Signed-off-by: NWilliam Irwin <bill.irwin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e02e68d3
    • T
      x86_64: O_EXCL on /dev/mcelog · f528e7ba
      Tim Hockin 提交于
      Background:
       /dev/mcelog is a clear-on-read interface.  It is currently possible for
       multiple users to open and read() the device.  Users are protected from
       each other during any one read, but not across reads.
      
      Description:
       This patch adds support for O_EXCL to /dev/mcelog.  If a user opens the
       device with O_EXCL, no other user may open the device (EBUSY).  Likewise,
       any user that tries to open the device with O_EXCL while another user has
       the device will fail (EBUSY).
      
      Result:
       Applications can get exclusive access to /dev/mcelog.  Applications that
       do not care will be unchanged.
      
      Alternatives:
       A simpler choice would be to only allow one open() at all, regardless of
       O_EXCL.
      
      Testing:
       I wrote an application that opens /dev/mcelog with O_EXCL and observed
       that any other app that tried to open /dev/mcelog would fail until the
       exclusive app had closed the device.
      
      Caveats:
       None.
      Signed-off-by: NTim Hockin <thockin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f528e7ba
  13. 18 7月, 2007 1 次提交
    • J
      usermodehelper: Tidy up waiting · 86313c48
      Jeremy Fitzhardinge 提交于
      Rather than using a tri-state integer for the wait flag in
      call_usermodehelper_exec, define a proper enum, and use that.  I've
      preserved the integer values so that any callers I've missed should
      still work OK.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      86313c48
  14. 24 6月, 2007 1 次提交
    • J
      x86_64: fix misplaced `continue' in mce.c · 4f84e4be
      Joshua Wise 提交于
      Background:
        When a userspace application wants to know about machine check events, it
        opens /dev/mcelog and does a read(). Usually, we found that this interface
        works well, but in some cases, when the system was taking large numbers of
        machine check exceptions, the read() would hang. The system would output a
        soft-lockup warning, and the daemon reading from /dev/mcelog would suck up
        as much of a single CPU as it could spinning in system space.
      
      Description:
        This patch fixes this bug. In particular, there was a "continue" inside a
        timeout loop that presumably was intended to break out of the outer loop,
        but instead caused the inner loop to continue. This patch also makes the
        condition for the break-out a little more evident by changing a
        !time_before to a time_after_eq.
      
      Result:
        The read() no longer hangs in this test case.
      
      Testing:
        On my system, I could replicate the bug with the following command:
          # for i in `seq 15000`; do ./inject_sbe.sh; done
        where inject_sbe.sh contains commands to inject a single-bit error into the
        next memory write transaction.
      
      Patch:
        This patch is against git f1518a08.
      Signed-off-by: NJoshua Wise <jwise@google.com>
      Signed-off-by: NTim Hockin <thockin@google.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f84e4be
  15. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  16. 09 5月, 2007 1 次提交
    • C
      move die notifier handling to common code · 1eeb66a1
      Christoph Hellwig 提交于
      This patch moves the die notifier handling to common code.  Previous
      various architectures had exactly the same code for it.  Note that the new
      code is compiled unconditionally, this should be understood as an appel to
      the other architecture maintainer to implement support for it aswell (aka
      sprinkling a notify_die or two in the proper place)
      
      arm had a notifiy_die that did something totally different, I renamed it to
      arm_notify_die as part of the patch and made it static to the file it's
      declared and used at.  avr32 used to pass slightly less information through
      this interface and I brought it into line with the other architectures.
      
      [akpm@linux-foundation.org: build fix]
      [akpm@linux-foundation.org: fix vmalloc_sync_all bustage]
      [bryan.wu@analog.com: fix vmalloc_sync_all in nommu]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NBryan Wu <bryan.wu@analog.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1eeb66a1
  17. 03 5月, 2007 1 次提交
    • T
      [PATCH] x86-64: Dynamically adjust machine check interval · 8a336b0a
      Tim Hockin 提交于
      Background:
       We've found that MCEs (specifically DRAM SBEs) tend to come in bunches,
       especially when we are trying really hard to stress the system out.  The
       current MCE poller uses a static interval which does not care whether it
       has or has not found MCEs recently.
      
      Description:
       This patch makes the MCE poller adjust the polling interval dynamically.
       If we find an MCE, poll 2x faster (down to 10 ms).  When we stop finding
       MCEs, poll 2x slower (up to check_interval seconds).  The check_interval
       tunable becomes the max polling interval.  The "Machine check events
       logged" printk() is rate limited to the check_interval, which should be
       identical behavior to the old functionality.
      
      Result:
       If you start to take a lot of correctable errors (not exceptions), you
       log them faster and more accurately (less chance of overflowing the MCA
       registers).  If you don't take a lot of errors, you will see no change.
      
      Alternatives:
       I considered simply reducing the polling interval to 10 ms immediately
       and keeping it there as long as we continue to find errors.  This felt a
       bit heavy handed, but does perform significantly better for the default
       check_interval of 5 minutes (we're using a few seconds when testing for
       DRAM errors).  I could be convinced to go with this, if anyone felt it
       was not too aggressive.
      
      Testing:
       I used an error-injecting DIMM to create lots of correctable DRAM errors
       and verified that the polling interval accelerates.  The printk() only
       happens once per check_interval seconds.
      
      Patch:
       This patch is against 2.6.21-rc7.
      Signed-Off-By: NTim Hockin <thockin@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      8a336b0a
  18. 13 2月, 2007 2 次提交
  19. 08 12月, 2006 1 次提交
  20. 07 12月, 2006 1 次提交
  21. 22 11月, 2006 2 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
    • D
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells 提交于
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      52bad64d
  22. 26 9月, 2006 2 次提交
    • D
      [PATCH] x86: Refactor thermal throttle processing · 15d5f839
      Dmitriy Zavin 提交于
      Refactor the event processing (syslog messaging and rate limiting)
      into separate file therm_throt.c. This allows consistent reporting
      of CPU thermal throttle events.
      
      After ACK'ing the interrupt, if the event is current, the user
      (p4.c/mce_intel.c) calls therm_throt_process to log (and rate limit)
      the event. If that function returns 1, the user has the option to log
      things further (such as to mce_log in x86_64).
      
      AK: minor cleanup
      Signed-off-by: NDmitriy Zavin <dmitriyz@google.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      15d5f839
    • A
      [PATCH] Remove safe_smp_processor_id() · 151f8cc1
      Andi Kleen 提交于
      And replace all users with ordinary smp_processor_id.  The function
      was originally added to get some basic oops information out even
      if the GS register was corrupted. However that didn't
      work for some anymore because printk is needed to print the oops
      and it uses smp_processor_id() already. Also GS register corruptions
      are not particularly common anymore.
      
      This also helps the Xen port which would otherwise need to
      do this in a special way because it can't access the local APIC.
      
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      151f8cc1
  23. 01 8月, 2006 1 次提交
  24. 28 6月, 2006 2 次提交
  25. 27 6月, 2006 1 次提交
  26. 26 4月, 2006 1 次提交
  27. 10 4月, 2006 1 次提交