提交 · f94b61c2c9fdcc90773c49df9ccf9ede3ad0d7db · openanolis / cloud-kernel

04 6月, 2009 11 次提交

x86, mce: implement panic synchronization · f94b61c2

由 Andi Kleen 提交于 5月 27, 2009

In some circumstances multiple CPUs can enter mce_panic() in parallel.
This gives quite confused output because they will all dump the same
machine check buffer.

The other problem is that they would all panic in parallel, but not
process each other's shutdown IPIs because interrupts are disabled.

Detect this situation early on in mce_panic(). On the first CPU
entering will do the panic, the others will just wait to be killed.

For paranoia reasons in case the other CPU dies during the MCE I added
a 5 seconds timeout. If it expires each CPU will panic on its own again.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f94b61c2

x86, mce: implement bootstrapping for machine check wakeups · ccc3c319

由 Andi Kleen 提交于 5月 27, 2009

Machine checks support waking up the mcelog daemon quickly.

The original wake up code for this was pretty ugly, relying on
a idle notifier and a special process flag. The reason it did
it this way is that the machine check handler is not subject
to normal interrupt locking rules so it's not safe
to call wake_up().  Instead it set a process flag
and then either did the wakeup in the syscall return
or in the idle notifier.

This patch adds a new "bootstraping" method as replacement.

The idea is that the handler checks if it's in a state where
it is unsafe to call wake_up(). If it's safe it calls it directly.
When it's not safe -- that is it interrupted in a critical
section with interrupts disables -- it uses a new "self IPI" to trigger
an IPI to its own CPU. This can be done safely because IPI
triggers are atomic with some care. The IPI is raised
once the interrupts are reenabled and can then safely call
wake_up().

When APICs are disabled the event is just queued and will be picked up
eventually by the next polling timer. I think that's a reasonable
compromise, since it should only happen quite rarely.

Contains fixes from Ying Huang.

[ solve conflict on irqinit, make it work on 32bit (entry_arch.h) - HS ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ccc3c319

x86, mce: check early in exception handler if panic is needed · bd19a5e6

由 Andi Kleen 提交于 5月 27, 2009

The exception handler should behave differently if the exception is
fatal versus one that can be returned from.  In the first case it should
never clear any registers because these need to be preserved
for logging after the next boot. Otherwise it should clear them
on each CPU step by step so that other CPUs sharing the same bank don't
see duplicate events. Otherwise we risk reporting events multiple
times on any CPUs which have shared machine check banks, which
is a common problem on Intel Nehalem which has both SMT (two
CPU threads sharing banks) and shared machine check banks in the uncore.

Determine early in a special pass if any event requires a panic.
This uses the mce_severity() function added earlier.

This is needed for the next patch.

Also fixes a problem together with an earlier patch
that corrected events weren't logged on a fatal MCE.

[ Impact: Feature ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

bd19a5e6

x86, mce: add table driven machine check grading · 817f32d0

由 Andi Kleen 提交于 5月 27, 2009

The machine check grading (as in deciding what should be done for a given
register value) has to be done multiple times soon and it's also getting
more complicated.
So it makes sense to consolidate it into a single function. To get smaller
and more straight forward and possibly more extensible code I opted towards
a new table driven method. The various rules are put into a table
when is then executed by a very simple interpreter.

The grading engine is in a new file mce-severity.c. I also added a private
include file mce-internal.h, because mce.h is already a bit too cluttered.

This is dead code right now, but will be used in followon patches.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

817f32d0

x86, mce: remove TSC print heuristic · a0189c70

由 Andi Kleen 提交于 5月 27, 2009

Previously mce_panic used a simple heuristic to avoid printing
old so far unreported machine check events on a mce panic. This worked
by comparing the TSC value at the start of the machine check handler
with the event time stamp and only printing newer ones.

This has a couple of issues, in particular on systems where the TSC
is not fully synchronized between CPUs it could lose events or print
old ones.

It is also problematic with full system synchronization as it is
added by the next patch.

Remove the TSC heuristic and instead replace it with a simple heuristic
to print corrected errors first and after that uncorrected errors
and finally the worst machine check as determined by the machine
check handler.

This simplifies the code because there is no need to pass the
original TSC value around.

Contains fixes from Ying Huang

[ Impact: bug fix, cleanup ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Ying Huang <ying.huang@intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

a0189c70

x86, mce: log corrected errors when panicing · de8a84d8

由 Andi Kleen 提交于 5月 27, 2009

Normally the machine check handler ignores corrected errors and leaves
them to machine_check_poll(). But when panicing mcp won't run, so
log all errors.

Note: this can still miss some cases until the "early no way out"
patch later is applied too.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

de8a84d8

x86, mce: extend struct mce user interface with more information. · 8ee08347

由 Andi Kleen 提交于 5月 27, 2009

Experience has shown that struct mce which is used to pass an machine
check to the user space daemon currently a few limitations.  Also some
data which is useful to print at panic level is also missing.

This patch addresses most of them. The same information is also
printed out together with mce panic.

struct mce can be painlessly extended in a compatible way, the mcelog
user space code just ignores additional fields with a warning.

- It doesn't provide a wall time timestamp. There have been a few
  complaints about that. Fix that by adding a 64bit time_t

- It doesn't provide the exact CPU identification. This makes
  it awkward for mcelog to decode the event correctly, especially
  when there are variations in the supported MCE codes on different
  CPU models or when mcelog is running on a different host after a panic.
  Previously the administrator had to specify the correct CPU
  when mcelog ran on a different host, but with the more variation
  in machine checks now it's better to auto detect that.
  It's also useful for more detailed analysis of CPU events.
  Pass CPUID 1.EAX and the cpu vendor (as encoded in processor.h) instead.

- Socket ID and initial APIC ID are useful to report because they
  allow to identify the failing CPU in some (not all) cases.
  This is also especially useful for the panic situation.
  This addresses one of the complaints from Thomas Gleixner earlier.

- The MCG capabilities MSR needs to be reported for some advanced
  error processing in mcelog
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

8ee08347

x86, mce: support more than 256 CPUs in struct mce · d620c67f

由 Andi Kleen 提交于 5月 27, 2009

The old struct mce had a limitation to 256 CPUs. But x86 Linux supports
more than that now with x2apic. Add a new field extcpu to report the
extended number.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d620c67f

x86, mce: store record length into memory struct mce anchor · f6fb0ac0

由 Andi Kleen 提交于 5月 27, 2009

This makes it easier for tools who want to extract the mcelog out of
crash images or memory dumps to adapt to changing struct mce size.
The length field replaces padding, so it's fully compatible.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f6fb0ac0

x86, mce: add MCE poll count to /proc/interrupts · ca84f696

由 Andi Kleen 提交于 5月 27, 2009

Keep a count of the machine check polls (or CMCI events) in
/proc/interrupts.

Andi needs this for debugging, but it's also useful in general
to see what's going in by the kernel.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ca84f696

x86, mce: add machine check exception count in /proc/interrupts · 01ca79f1

由 Andi Kleen 提交于 5月 27, 2009

Useful for debugging, but it's also good general policy
to have a counter for all special interrupts there. This makes it easier
to diagnose where a CPU is spending its time.

[ Impact: feature, debugging tool ]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

01ca79f1

02 6月, 2009 3 次提交

Merge branch 'irq/numa' into x86/mce3 · 48b1fddb

由 H. Peter Anvin 提交于 6月 01, 2009

Merge reason: arch/x86/kernel/irqinit_{32,64}.c unified in irq/numa
and modified in x86/mce3; this merge resolves the conflict.

Conflicts:
	arch/x86/kernel/irqinit.c
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

48b1fddb

Merge branch 'x86/cpufeature' into irq/numa · ee4c24a5

由 Ingo Molnar 提交于 6月 01, 2009

Merge reason: irq/numa didnt build because this commit:

  2759c328: x86: don't call read_apic_id if !cpu_has_apic

Had a dependency on x86/cpufeature changes. Pull in that
(small) branch to fix the dependency.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ee4c24a5

Merge branch 'linus' into irq/numa · 3d58f48b

由 Ingo Molnar 提交于 6月 01, 2009

Conflicts:
	arch/mips/sibyte/bcm1480/irq.c
	arch/mips/sibyte/sb1250/irq.c

Merge reason: we gathered a few conflicts plus update to latest upstream fixes.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3d58f48b

01 6月, 2009 10 次提交

Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · d9244b5d

由 Linus Torvalds 提交于 6月 01, 2009

* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  hwmon: Update documentation on fan_max
  hwmon: (lm78) Add missing __devexit_p()

d9244b5d

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 · 65039a31

由 Linus Torvalds 提交于 6月 01, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
  sparc64: Fix section attribute warnings.
  sparc64: Fix SET_PERSONALITY to not clip bits outside of PER_MASK.

65039a31

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 6e429101

由 Linus Torvalds 提交于 6月 01, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  3c509: Add missing EISA IDs
  MAINTAINERS: take maintainership of the cpmac Ethernet driver
  net/firmare: Ignore .cis files
  ath1e: add new device id for asus hardware
  mlx4_en: Fix a kernel panic when waking tx queue
  rtl8187: add USB ID for Linksys WUSB54GC-EU v2 USB wifi dongle
  at76c50x-usb: avoid mutex deadlock in at76_dwork_hw_scan
  mac8390: fix build with NET_POLL_CONTROLLER
  cxgb3: link fault fixes
  cxgb3: fix dma mapping regression
  netfilter: nfnetlink_log: fix wrong skbuff size	calculation
  netfilter: xt_hashlimit does a wrong SEQ_SKIP
  bfin_mac: fix build error due to net_device_ops convert
  atlx: move modinfo data from atlx.h to atl1.c
  gianfar: fix babbling rx error event bug
  cls_cgroup: read classid atomically in classifier
  netfilter: nf_ct_dccp: add missing DCCP protocol changes in event cache
  netfilter: nf_ct_tcp: fix accepting invalid RST segments

6e429101

Merge git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/headers-check-2.6 · c4e51e46

由 Linus Torvalds 提交于 6月 01, 2009

* git://git.kernel.org/pub/scm/linux/kernel/git/jaswinder/headers-check-2.6:
  headers_check fix: linux/net_dropmon.h
  headers_check fix: linux/auto_fs.h

c4e51e46

hwmon: Update documentation on fan_max · d54d4624

由 Christian Engelmayer 提交于 6月 01, 2009

Add fan_max description.

Add fan limit alarm 'max_alarm' to the alarm section.
Signed-off-by: NChristian Engelmayer <christian.engelmayer@frequentis.com>
Acked-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

d54d4624

hwmon: (lm78) Add missing __devexit_p() · 39d8bbed

由 Mike Frysinger 提交于 6月 01, 2009

The remove function uses __devexit, so the .remove assignment needs
__devexit_p() to fix a build error with hotplug disabled.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

39d8bbed

3c509: Add missing EISA IDs · cf9f6e21

由 Maciej W. Rozycki 提交于 6月 01, 2009

Several EISA device IDs for 3c509 family network cards are missing from 
the driver, making the cards unusable in their EISA mode.  Here's a fix to 
add them based on the EISA configuration files distributed by 3Com and our 
eisa.ids database.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf9f6e21

MAINTAINERS: take maintainership of the cpmac Ethernet driver · 4371ee35

由 Florian Fainelli 提交于 6月 01, 2009

This patch adds me as the maintainer of the CPMAC (AR7)
Ethernet driver.
Signed-off-by: NFlorian Fainelli <florian@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4371ee35

headers_check fix: linux/net_dropmon.h · d280cc98

由 Jaswinder Singh Rajput 提交于 6月 01, 2009

fix the following 'make headers_check' warnings:

usr/include/linux/net_dropmon.h:7: found __[us]{8,16,32,64} type without #include <linux/types.h>
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>

d280cc98

headers_check fix: linux/auto_fs.h · 52bb25a6

由 Jaswinder Singh Rajput 提交于 6月 01, 2009

fix the following 'make headers_check' warnings:

usr/include/linux/auto_fs.h:17: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>

52bb25a6

31 5月, 2009 2 次提交

L
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 · 700d4558
由 Linus Torvalds 提交于 5月 30, 2009
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6:
  ide_pci_generic: add quirk for Netcell ATA RAID
```
700d4558

ide_pci_generic: add quirk for Netcell ATA RAID · c339dfdd

由 Bartlomiej Zolnierkiewicz 提交于 5月 30, 2009

We need to explicitly mark words 85-87 as valid ones since
firmware doesn't do it.

This should fix support for LBA48 and FLUSH CACHE [EXT] command
which stopped working after we applied more strict checking of
identify words in:

	commit 942dcd85
	("ide: idedisk_supports_lba48() -> ata_id_lba48_enabled()")

and

	commit 4b58f17d
	("ide: ide_id_has_flush_cache() -> ata_id_flush_enabled()")
Reported-and-tested-by: N"Trevor Hemsley" <trevor.hemsley@ntlworld.com>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>

c339dfdd

30 5月, 2009 14 次提交

L
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · b4566ac5
由 Linus Torvalds 提交于 5月 30, 2009
```
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function
```
b4566ac5

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 · 3b798a52

由 Linus Torvalds 提交于 5月 30, 2009

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  ACPI, i915: build fix (v2)
  acpi-cpufreq: fix printk typo and indentation
  ACPI processor: remove spurious newline from warning message
  drm/i915: acpi/video.c fix section mismatch warning
  ACPI: video: DMI workaround broken Acer 5315 BIOS enabling display brightness
  ACPI: video: DMI workaround broken eMachines E510 BIOS enabling display brightness
  ACPI: sanity check _PSS frequency to prevent cpufreq crash
  i7300_idle: allow testing on i5000-series hardware w/o re-compile
  PCI/ACPI: fix wrong ref count handling in acpi_pci_bind()
  cpuidle: fix AMD C1E suspend hang
  cpuidle: makes AMD C1E work in acpi_idle

3b798a52

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · 228b60ac

由 Linus Torvalds 提交于 5月 30, 2009

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
  fsldma: Fix compile warnings
  fsldma: fix memory leak on error path in fsl_dma_prep_memcpy()
  fsldma: snooping is not enabled for last entry in descriptor chain
  fsldma: fix infinite loop on multi-descriptor DMA chain completion
  fsldma: fix "DMA halt timeout!" errors
  fsldma: fix check on potential fdev->chan[] overflow
  fsldma: update mailling list address in MAINTAINERS

228b60ac

nilfs2: fix bh leak in nilfs_cpfile_delete_checkpoints function · 62013ab5

由 Ryusuke Konishi 提交于 5月 30, 2009

The nilfs_cpfile_delete_checkpoints() wrongly skips brelse() for the
header block of checkpoint file in case of errors.  This fixes the
leak bug.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

62013ab5

net/firmare: Ignore .cis files · cf4ae4e3

由 Matt Kraai 提交于 5月 29, 2009

Signed-off-by: NMatt Kraai <kraai@ftbfs.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf4ae4e3

ath1e: add new device id for asus hardware · bdb0e010

由 Greg Kroah-Hartman 提交于 5月 29, 2009

Gary Lin reports that a new device id needs to be added to the atl1e in
order to get some new Asus hardware to work properly.
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bdb0e010

mlx4_en: Fix a kernel panic when waking tx queue · 465440d2

由 Yevgeny Petrilin 提交于 5月 25, 2009

When the transmit queue gets full we enable interrupts for TX completions
There was a race that we handled the TX queue both from the interrupt context
and from the transmit function. Using "spin_trylock_irq()" ensures this
doesn't happen.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

465440d2

D

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 · e8573758
由 David S. Miller 提交于 5月 29, 2009

e8573758

Merge branches 'bugzilla-13121+', 'bugzilla-13233', 'redhat-bugzilla-500311',... · 6afec830

由 Len Brown 提交于 5月 29, 2009

Merge branches 'bugzilla-13121+', 'bugzilla-13233', 'redhat-bugzilla-500311', 'pci-bind-oops', 'misc-2.6.30' and 'i7300_idle' into release

6afec830

ACPI, i915: build fix (v2) · 31db5645

由 Len Brown 提交于 5月 29, 2009

drivers/built-in.o: In function `intel_opregion_init':
(.text+0x9d540): undefined reference to `acpi_video_register'

v2: move under DRM_I915 from DRM_I915_KMS
Signed-off-by: NLen Brown <len.brown@intel.com>
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>

31db5645

acpi-cpufreq: fix printk typo and indentation · 61c8c67e

由 Joe Perches 提交于 5月 26, 2009

Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLen Brown <len.brown@intel.com>

61c8c67e

ACPI processor: remove spurious newline from warning message · 21671b88

由 Frans Pop 提交于 5月 22, 2009

Commit 4973b22a ("ACPI processor: reset the throttling state once it's
invalid") introduced a new warning which prints a spurious newline.

The ACPI_WARNING macro that is used already takes care of adding a
newline, after adding ACPI_CA_VERSION to the message. Remove the newline
to avoid the message getting split into two lines.
Signed-off-by: NFrans Pop <elendil@planet.nl>
Signed-off-by: NLen Brown <len.brown@intel.com>

21671b88

drm/i915: acpi/video.c fix section mismatch warning · 1fc8d33a

由 Jaswinder Singh Rajput 提交于 5月 20, 2009

Currently acpi_video_exit() is exported as well as using __exit which causes:

WARNING: drivers/acpi/video.o(__ksymtab+0x0): Section mismatch in reference from the variable __ksymtab_acpi_video_exit to the function .exit.text:acpi_video_exit()
The symbol acpi_video_exit is exported and annotated __exit
Fix this by removing the __exit annotation of acpi_video_exit or drop the export.
Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

1fc8d33a

ACPI: video: DMI workaround broken Acer 5315 BIOS enabling display brightness · 93bcece2

由 Zhang Rui 提交于 5月 19, 2009

http://bugzilla.kernel.org/show_bug.cgi?id=13121Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

93bcece2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功