提交 · ee031c31d6381d004bfd386c2e45821211507499 · openanolis / cloud-kernel

25 2月, 2009 5 次提交

x86, mce, cmci: use polled banks bitmap in machine check poller · ee031c31

由 Andi Kleen 提交于 2月 12, 2009

Define a per cpu bitmap that contains the banks polled by the machine
check poller. This is needed for the CMCI code in the next patches
to be able to disable polling on specific banks.

The bank by default contains all banks, so there is no behaviour
change. Only future code will remove some banks from the polling
set.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ee031c31

x86, mce: replace machine check events logged interval with ratelimit · 8457c84d

由 Andi Kleen 提交于 2月 12, 2009

Impact: behavior change, use common code

Use a standard leaky bucket ratelimit for the machine check
warning print interval instead of waiting every check_interval.
Also decrease the limit to twice per minute.
This interacts better with threshold interrupts because
they can happen more often than check_interval.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

8457c84d

x86, mce, cmci: avoid potential reentry of threshold interrupt · f9695df4

由 Andi Kleen 提交于 2月 12, 2009

Impact: minor bugfix

The threshold handler on AMD (and soon on Intel) could be theoretically
reentered by the hardware. This could lead to corrupted events
because the machine check poll code assumes it is not reentered.

Move the APIC ACK to the end of the interrupt handler to let
the hardware avoid that.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

f9695df4

x86, mce, cmci: factor out threshold interrupt handler · b2762686

由 Andi Kleen 提交于 2月 12, 2009

Impact: cleanup; preparation for feature

The mce_amd_64 code has an own private MC threshold vector with an own
interrupt handler. Since Intel needs a similar handler
it makes sense to share the vector because both can not
be active at the same time.

I factored the common APIC handler code into a separate file which can
be used by both the Intel or AMD MC code.

This is needed for the next patch which adds an Intel specific
CMCI handler.

This patch should be a nop for AMD, it just moves some code
around.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

b2762686

x86, mce, cmci: export MAX_NR_BANKS · 41fdff32

由 Andi Kleen 提交于 2月 12, 2009

Impact: Cleanup (code movement)

Move MAX_NR_BANKS into mce.h because it's needed there
for followup patches.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

41fdff32

24 2月, 2009 1 次提交

x86, mce: remove invalid __cpuinit/__cpuexit annotations · ec5b3d32

由 H. Peter Anvin 提交于 2月 23, 2009

Impact: Bug fix when CPU hotplug is disabled

Correct the following broken __cpuinit/__cpuexit annotations:

- mce_cpu_features() is called from mce_resume(), and so cannot be
  __cpuinit.
- mce_disable_cpu() and mce_reenable_cpu() are called from
  mce_cpu_callback(), and so cannot be __cpuexit().

Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

ec5b3d32

23 2月, 2009 3 次提交

PM: Split up sysdev_[suspend|resume] from device_power_[down|up] · 770824bd

由 Rafael J. Wysocki 提交于 2月 22, 2009

Move the sysdev_suspend/resume from the callee to the callers, with
no real change in semantics, so that we can rework the disabling of
interrupts during suspend/hibernation.

This is based on an earlier patch from Linus.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

770824bd

x86: Add IRQF_TIMER to legacy x86 timer interrupt descriptors · 936577c6

由 Linus Torvalds 提交于 2月 22, 2009

Right now nobody cares, but the suspend/resume code will eventually want
to suspend device interrupts without suspending the timer, and will
depend on this flag to know.

The modern x86 timer infrastructure uses the local APIC timers and never
shows up as a device interrupt at all, so it isn't affected and doesn't
need any of this.

Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

936577c6

m68k: atari - Rename "mfp" to "st_mfp" · 3d92e8f3

由 Geert Uytterhoeven 提交于 2月 22, 2009

http://kisskb.ellerman.id.au/kisskb/buildresult/72115/:
| net/mac80211/ieee80211_i.h:327: error: syntax error before 'volatile'
| net/mac80211/ieee80211_i.h:350: error: syntax error before '}' token
| net/mac80211/ieee80211_i.h:455: error: field 'sta' has incomplete type
| distcc[19430] ERROR: compile net/mac80211/main.c on sprygo/32 failed

This is caused by

| # define mfp ((*(volatile struct MFP*)MFP_BAS))

in arch/m68k/include/asm/atarihw.h, which conflicts with the new "mfp" enum in
net/mac80211/ieee80211_i.h.

Rename "mfp" to "st_mfp", as it's a way too generic name for a global #define.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d92e8f3

22 2月, 2009 2 次提交

x86_64: Fix S3 fail path · 6defa2fe

由 Jiri Slaby 提交于 2月 15, 2009

As acpi_enter_sleep_state can fail, take this into account in
do_suspend_lowlevel and don't return to the do_suspend_lowlevel's
caller. This would break (currently) fpu status and preempt count.

Technically, this means use `call' instead of `jmp' and `jmp' to
the `resume_point' after the `call' (i.e. if
acpi_enter_sleep_state returns=fails). `resume_point' will handle
the restore of fpu and preempt count gracefully.
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

6defa2fe

x86_64: acpi/wakeup_64 cleanup · e6bd6760

由 Jiri Slaby 提交于 2月 15, 2009

- remove %ds re-set, it's already set in wakeup_long64
- remove double labels and alignment (ENTRY already adds both)
- use meaningful resume point labelname
- skip alignment while jumping from wakeup_long64 to the resume point
- remove .size, .type and unused labels
[v2]
- added ENDPROCs
Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
Acked-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NPavel Machek <pavel@suse.cz>
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

e6bd6760

21 2月, 2009 4 次提交

x86, mce: remove incorrect __cpuinit for mce_cpu_features() · cc3ca220

由 H. Peter Anvin 提交于 2月 20, 2009

Impact: Bug fix on UP

Checkin 6ec68bff:
    x86, mce: reinitialize per cpu features on resume

introduced a call to mce_cpu_features() in the resume path, in order
for the MCE machinery to get properly reinitialized after a resume.
However, this function (and its successors) was flagged __cpuinit,
which becomes __init on UP configurations (on SMP suspend/resume
requires CPU hotplug and so this would not be seen.)

Remove the offending __cpuinit annotations for mce_cpu_features() and
its successor functions.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

cc3ca220

mn10300: fix typo && -> || in arch/mn10300/unit-asb2305/pci.c · d9190913

由 Wei Yongjun 提交于 2月 20, 2009

Fix the typo && -> ||.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d9190913

mn10300: fix oprofile · 58bafe72

由 David Howells 提交于 2月 20, 2009

oprofile for MN10300 seems to have been broken by the advent of the new
tracing framework.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

58bafe72

uml: fix vde network backend in user mode linux · 41a9e64c

由 Luca Bigliardi 提交于 2月 20, 2009

* Replace kmalloc() with uml_kmalloc() (fix build failure)

* Remove unnecessary UM_KERN_INFO in printk() (don't display '<6>' while
  printing info)
Signed-off-by: NLuca Bigliardi <shammash@artha.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Reviewed-by: NWANG Cong <wangcong@zeuux.org>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41a9e64c

20 2月, 2009 14 次提交

x86: use the right protections for split-up pagetables · 07a66d7c

由 Ingo Molnar 提交于 2月 20, 2009

Steven Rostedt found a bug in where in his modified kernel
ftrace was unable to modify the kernel text, due to the PMD
itself having been marked read-only as well in
split_large_page().

The fix, suggested by Linus, is to not try to 'clone' the
reference protection of a huge-page, but to use the standard
(and permissive) page protection bits of KERNPG_TABLE.

The 'cloning' makes sense for the ptes but it's a confused and
incorrect concept at the page table level - because the
pagetable entry is a set of all ptes and hence cannot
'clone' any single protection attribute - the ptes can be any
mixture of protections.

With the permissive KERNPG_TABLE, even if the pte protections
get changed after this point (due to ftrace doing code-patching
or other similar activities like kprobes), the resulting combined
protections will still be correct and the pte's restrictive
(or permissive) protections will control it.

Also update the comment.

This bug was there for a long time but has not caused visible
problems before as it needs a rather large read-only area to
trigger. Steve possibly hacked his kernel with some really
large arrays or so. Anyway, the bug is definitely worth fixing.

[ Huang Ying also experienced problems in this area when writing
  the EFI code, but the real bug in split_large_page() was not
  realized back then. ]
Reported-by: NSteven Rostedt <rostedt@goodmis.org>
Reported-by: NHuang Ying <ying.huang@intel.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

07a66d7c

x86, vmi: TSC going backwards check in vmi clocksource · 48ffc70b

由 Alok N Kataria 提交于 2月 18, 2009

Impact: fix time warps under vmware

Similar to the check for TSC going backwards in the TSC clocksource,
we also need this check for VMI clocksource.
Signed-off-by: NAlok N Kataria <akataria@vmware.com>
Cc: Zachary Amsden <zach@vmware.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Cc: stable@kernel.org

48ffc70b

x86, mce: use %ll instead of %L for 64-bit numbers · f6d1826d

由 H. Peter Anvin 提交于 2月 19, 2009

Impact: Cleanup

The standard spelling of a printf pattern for long long is "ll", not
"L", which is for long double.
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

f6d1826d

x86, mce: separate correct machine check poller and fatal exception handler · b79109c3

由 Andi Kleen 提交于 2月 12, 2009

Impact: cleanup, performance enhancement

The machine check poller is diverging more and more from the fatal
exception handler. Instead of adding more special cases separate the code
paths completely. The corrected poll path is actually quite simple,
and this doesn't result in much code duplication.

This makes both handlers much easier to read and results in
cleaner code flow.  The exception handler now only needs to care
about uncorrected errors, which also simplifies the handling of multiple
errors. The corrected poller also now always runs in standard interrupt
context and does not need to do anything special to handle NMI context.

Minor behaviour changes:
- MCG status is now not cleared on polling.
- Only the banks which had corrected errors get cleared on polling
- The exception handler only clears banks with errors now

v2: Forward port to new patch order. Add "uc" argument.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b79109c3

x86, mce: factor out duplicated struct mce setup into one function · b5f2fa4e

由 Andi Kleen 提交于 2月 12, 2009

Impact: cleanup

This merely factors out duplicated code to set up
the initial struct mce state into a single function.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

b5f2fa4e

x86, mce: implement dynamic machine check banks support · 0d7482e3

由 Andi Kleen 提交于 2月 17, 2009

Impact: cleanup; making code future proof; memory saving on small systems

This patch replaces the hardcoded max number of machine check banks with 
dynamic allocation depending on what the CPU reports. The sysfs
data structures and the banks array are dynamically allocated.

There is still a hard bank limit (128) because the mcelog protocol uses
banks >= 128 as pseudo banks to escape other events. But we expect
that 128 banks is beyond any reasonable CPU for now.

This supersedes an earlier patch by Venki, but it solves the problem
more completely by making the limit fully dynamic (up to the 128
boundary).

This saves some memory on machines with less than 6 banks because
they won't need sysdevs for unused ones and also allows to 
use sysfs to control these banks on possible future CPUs with
more than 6 banks.

This is an updated patch addressing Venki's comments.  I also added in
another patch from Thomas which fixed the error allocation path (that
patch was previously separated)

Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

0d7482e3

x86, mce: enable machine checks in 64-bit defconfig · e35849e9

由 Andi Kleen 提交于 2月 12, 2009

Impact: Low priority fix

The 32-bit defconfig already had it enabled. And it's a pretty
fundamental feature, so better enable it on 64 bits too.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>

e35849e9

[IA64] xen_domu build fix · ec8148de

由 Tony Luck 提交于 2月 19, 2009

arch/ia64/xen/xen_pv_ops.c:156: error: xen_init_ops causes a section type conflict
arch/ia64/xen/xen_pv_ops.c:340: error: xen_iosapic_ops causes a section type conflict
Signed-off-by: NTony Luck <tony.luck@intel.com>

ec8148de

[IA64] fixes configs and add default config for ia64 xen domU · 1d5b20f4

由 Isaku Yamahata 提交于 1月 16, 2009

This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.
Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

1d5b20f4

[IA64] Remove redundant cpu_clear() in __cpu_disable path · c0acdea2

由 Alex Chiang 提交于 2月 09, 2009

The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

c0acdea2

[IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" · 66db2e63

由 Alex Chiang 提交于 2月 09, 2009

This reverts commit e7b14036.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

66db2e63

[IA64] bte_copy of BTE_MAX_XFER trips BUG_ON. · 39d481cb

由 Robin Holt 提交于 2月 03, 2009

BTE_MAX_XFER is wrong.  It is one greater than the number of cache
lines the BTE is actually able to transfer.  If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.

This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.
Signed-off-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

39d481cb

[IA64] Build fix for __early_pfn_to_nid() undefined link error · 334f85b6

由 Tony Luck 提交于 2月 19, 2009

ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:

	commit: f2dbcfa7
	mm: clean up for early_pfn_to_nid()

ends up with some link problems for certain configuration files.

Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.
Signed-off-by: NTony Luck <tony.luck@intel.com>

334f85b6

[ARM] 5405/1: ep93xx: remove unused gesbc9312.h header · 9dd446f6

由 Hartley Sweeten 提交于 2月 19, 2009

Remove the gesbc9312.h header since it is unused.
Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

9dd446f6

19 2月, 2009 7 次提交

[ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC · 9da616fb

由 Makito SHIOKAWA 提交于 2月 19, 2009

READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)
Signed-off-by: NMakito SHIOKAWA <lkhmkt@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

9da616fb

[S390] fix "mem=" handling in case of standby memory · 23d75d9c

由 Heiko Carstens 提交于 2月 19, 2009

Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

23d75d9c

[S390] Fix timeval regression on s390 · d5cd0343

由 Christian Borntraeger 提交于 2月 19, 2009

commit aa5e97ce
[PATCH] improve precision of process accounting.

Introduced a timing regression:
-bash-3.2# time ls
real    0m0.006s
user    0m1.754s
sys     0m1.094s

The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d5cd0343

[ARM] omap: fix clock reparenting in omap2_clk_set_parent() · 41f3103f

由 Russell King 提交于 2月 19, 2009

When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted.  Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

41f3103f

[ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo() · 3fd9825c

由 Nicolas Pitre 提交于 2月 18, 2009

In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0.  Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.
Reported-by: NLennert Buytenhek <buytenh@marvell.com>
Signed-off-by: NNicolas Pitre <nico@marvell.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

3fd9825c

mm: fix memmap init for handling memory hole · cc2559bc

由 KAMEZAWA Hiroyuki 提交于 2月 18, 2009

Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: NDavid Miller <davem@davemlloft.net>
Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc2559bc

mm: clean up for early_pfn_to_nid() · f2dbcfa7

由 KAMEZAWA Hiroyuki 提交于 2月 18, 2009

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: NDavid Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2dbcfa7

18 2月, 2009 4 次提交

x86, mce: fix a race condition in mce_read() · ef41df43

由 Huang Ying 提交于 2月 12, 2009

Impact: bugfix

Considering the situation as follow:

before: mcelog.next == 1, mcelog.entry[0].finished = 1

+--------------------------------------------------------------------------
R                   W1                  W2                  W3

read mcelog.next (1)
                    mcelog.next++ (2)
                    (working on entry 1,
                    finished == 0)

mcelog.next = 0
                                        mcelog.next++ (1)
                                        (working on entry 0)
                                                           mcelog.next++ (2)
                                                           (working on entry 1)
                        <----------------- race ---------------->
                    (done on entry 1,
                    finished = 1)
                                                           (done on entry 1,
                                                           finished = 1)

To fix the race condition, a cmpxchg loop is added to mce_read() to
ensure no new MCE record can be added between mcelog.next reading and
mcelog.next = 0.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

ef41df43

x86, mce: disable machine checks on offlined CPUs · d6b75584

由 Andi Kleen 提交于 2月 12, 2009

Impact: Lower priority bug fix

Offlined CPUs could still get machine checks, but the machine check handler
cannot handle them properly, leading to an unconditional crash. Disable
machine checks on CPUs that are going down.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

d6b75584

x86, mce: don't set up mce sysdev devices with mce=off · 5b4408fd

由 Andi Kleen 提交于 2月 12, 2009

Impact: bug fix, in this case the resume handler shouldn't run which
	avoids incorrectly reenabling machine checks on resume

When MCEs are completely disabled on the command line don't set
up the sysdev devices for them either.

Includes a comment fix from Thomas Gleixner.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

5b4408fd

x86, mce: switch machine check polling to per CPU timer · 52d168e2

由 Andi Kleen 提交于 2月 12, 2009

Impact: Higher priority bug fix

The machine check poller runs a single timer and then broadcasted an
IPI to all CPUs to check them. This leads to unnecessary
synchronization between CPUs. The original CPU running the timer has
to wait potentially a long time for all other CPUs answering. This is
also real time unfriendly and in general inefficient.

This was especially a problem on systems with a lot of events where
the poller run with a higher frequency after processing some events.
There could be more and more CPU time wasted with this, to
the point of significantly slowing down machines.

The machine check polling is actually fully independent per CPU, so
there's no reason to not just do this all with per CPU timers.  This
patch implements that.

Also switch the poller also to use standard timers instead of work
queues. It was using work queues to be able to execute a user program
on a event, but mce_notify_user() handles this case now with a
separate callback. So instead always run the poll code in in a
standard per CPU timer, which means that in the common case of not
having to execute a trigger there will be less overhead.

This allows to clean up the initialization significantly, because
standard timers are already up when machine checks get init'ed.  No
multiple initialization functions.

Thanks to Thomas Gleixner for some help.

Cc: thockin@google.com
v2: Use del_timer_sync() on cpu shutdown and don't try to handle
migrated timers.
v3: Add WARN_ON for timer running on unexpected CPU
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NH. Peter Anvin <hpa@zytor.com>

52d168e2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功