提交 · 1d5b20f490f61f36a58e6ecf1713a49a43620666 · openanolis / cloud-kernel

20 2月, 2009 5 次提交

[IA64] fixes configs and add default config for ia64 xen domU · 1d5b20f4

由 Isaku Yamahata 提交于 1月 16, 2009

This patch fixes xen related Kconfigs and add default config
file for ia64 xen domU.
Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

1d5b20f4

[IA64] Remove redundant cpu_clear() in __cpu_disable path · c0acdea2

由 Alex Chiang 提交于 2月 09, 2009

The second call to cpu_clear() is redundant, as we've already removed
the CPU from cpu_online_map before calling migrate_platform_irqs().
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

c0acdea2

[IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" · 66db2e63

由 Alex Chiang 提交于 2月 09, 2009

This reverts commit e7b14036.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.
Signed-off-by: NAlex Chiang <achiang@hp.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

66db2e63

[IA64] bte_copy of BTE_MAX_XFER trips BUG_ON. · 39d481cb

由 Robin Holt 提交于 2月 03, 2009

BTE_MAX_XFER is wrong.  It is one greater than the number of cache
lines the BTE is actually able to transfer.  If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.

This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.
Signed-off-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>

39d481cb

[IA64] Build fix for __early_pfn_to_nid() undefined link error · 334f85b6

由 Tony Luck 提交于 2月 19, 2009

ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:

	commit: f2dbcfa7
	mm: clean up for early_pfn_to_nid()

ends up with some link problems for certain configuration files.

Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.
Signed-off-by: NTony Luck <tony.luck@intel.com>

334f85b6

19 2月, 2009 2 次提交

mm: fix memmap init for handling memory hole · cc2559bc

由 KAMEZAWA Hiroyuki 提交于 2月 18, 2009

Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: NDavid Miller <davem@davemlloft.net>
Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc2559bc

mm: clean up for early_pfn_to_nid() · f2dbcfa7

由 KAMEZAWA Hiroyuki 提交于 2月 18, 2009

What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: NDavid Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2dbcfa7

16 2月, 2009 3 次提交

cpumask: Use cpu_*_mask accessors code: alpha · 1371be0f

由 Rusty Russell 提交于 2月 16, 2009

Impact: use new API, fix SMP bug.

Use the new accessors rather than frobbing bits directly.

This also removes the bug introduced in ee0c468b (alpha: compile
fixes) which had Alpha setting bits on an on-stack cpumask, not the
cpu_online_map.

Cc: Richard Henderson <rth@twiddle.net>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>
Acked-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: NIngo Molnar <mingo@elte.hu>

1371be0f

cpumask: fix powernow-k8: partial revert of · a0abd520

由 Rusty Russell 提交于 2月 16, 2009

Impact: fix powernow-k8 when acpi=off (or other error).

There was a spurious change introduced into powernow-k8 in this patch:
so that we try to "restore" the cpus_allowed we never saved.  We revert
that file.

See lkml "[PATCH] x86/powernow: fix cpus_allowed brokage when
acpi=off" from Yinghai for the bug report.

Cc: Mike Travis <travis@sgi.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NIngo Molnar <mingo@elte.hu>

a0abd520

trace: mmiotrace to the tracer menu in Kconfig · 6bc5c366

由 Pekka Paalanen 提交于 1月 03, 2009

Impact: cosmetic change in Kconfig menu layout

This patch was originally suggested by Peter Zijlstra, but seems it
was forgotten.

CONFIG_MMIOTRACE and CONFIG_MMIOTRACE_TEST were selectable
directly under the Kernel hacking / debugging menu in the kernel
configuration system. They were present only for x86 and x86_64.

Other tracers that use the ftrace tracing framework are in their own
sub-menu. This patch moves the mmiotrace configuration options there.
Since the Kconfig file, where the tracer menu is, is not architecture
specific, HAVE_MMIOTRACE_SUPPORT is introduced and provided only by
x86/x86_64. CONFIG_MMIOTRACE now depends on it.
Signed-off-by: NPekka Paalanen <pq@iki.fi>
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6bc5c366

15 2月, 2009 11 次提交

x86, vm86: fix preemption bug · be716615

由 Thomas Gleixner 提交于 1月 13, 2009

Commit 3d2a71a5 ("x86, traps: converge
do_debug handlers") changed the preemption disable logic of do_debug()
so vm86_handle_trap() is called with preemption disabled resulting in:

 BUG: sleeping function called from invalid context at include/linux/kernel.h:155
 in_atomic(): 1, irqs_disabled(): 0, pid: 3005, name: dosemu.bin
 Pid: 3005, comm: dosemu.bin Tainted: G        W  2.6.29-rc1 #51
 Call Trace:
  [<c050d669>] copy_to_user+0x33/0x108
  [<c04181f4>] save_v86_state+0x65/0x149
  [<c0418531>] handle_vm86_trap+0x20/0x8f
  [<c064e345>] do_debug+0x15b/0x1a4
  [<c064df1f>] debug_stack_correct+0x27/0x2c
  [<c040365b>] sysenter_do_call+0x12/0x2f
 BUG: scheduling while atomic: dosemu.bin/3005/0x10000001

Restore the original calling convention and reenable preemption before
calling handle_vm86_trap().
Reported-by: NMichal Suchanek <hramrach@centrum.cz>
Cc: stable@kernel.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

be716615

KVM: VMX: Flush volatile msrs before emulating rdmsr · 516a1a7e

由 Avi Kivity 提交于 2月 15, 2009

Some msrs (notable MSR_KERNEL_GS_BASE) are held in the processor registers
and need to be flushed to the vcpu struture before they can be read.

This fixes cygwin longjmp() failure on Windows x64.
Signed-off-by: NAvi Kivity <avi@redhat.com>

516a1a7e

KVM: x86: fix LAPIC pending count calculation · b682b814

由 Marcelo Tosatti 提交于 2月 10, 2009

Simplify LAPIC TMCCT calculation by using hrtimer provided
function to query remaining time until expiration.

Fixes host hang with nested ESX.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>
Signed-off-by: NAvi Kivity <avi@redhat.com>

b682b814

KVM: MMU: Map device MMIO as UC in EPT · 2aaf69dc

由 Sheng Yang 提交于 1月 21, 2009

Software are not allow to access device MMIO using cacheable memory type, the
patch limit MMIO region with UC and WC(guest can select WC using PAT and
PCD/PWT).
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

2aaf69dc

KVM: x86: disable kvmclock on non constant TSC hosts · abe6655d

由 Marcelo Tosatti 提交于 2月 10, 2009

This is better.

Currently, this code path is posing us big troubles,
and we won't have a decent patch in time. So, temporarily
disable it.
Signed-off-by: NGlauber Costa <glommer@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

abe6655d

KVM: PIT: fix i8254 pending count read · d2a8284e

由 Marcelo Tosatti 提交于 12月 30, 2008

count_load_time assignment is bogus: its supposed to contain what it
means, not the expiration time.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d2a8284e

KVM: Fix racy in kvm_free_assigned_irq · ba4cef31

由 Sheng Yang 提交于 1月 06, 2009

In the past, kvm_get_kvm() and kvm_put_kvm() was called in assigned device irq
handler and interrupt_work, in order to prevent cancel_work_sync() in
kvm_free_assigned_irq got a illegal state when waiting for interrupt_work done.
But it's tricky and still got two problems:

1. A bug ignored two conditions that cancel_work_sync() would return true result
in a additional kvm_put_kvm().

2. If interrupt type is MSI, we would got a window between cancel_work_sync()
and free_irq(), which interrupt would be injected again...

This patch discard the reference count used for irq handler and interrupt_work,
and ensure the legal state by moving the free function at the very beginning of
kvm_destroy_vm(). And the patch fix the second bug by disable irq before
cancel_work_sync(), which may result in nested disable of irq but OK for we are
going to free it.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ba4cef31

KVM: Add kvm_arch_sync_events to sync with asynchronize events · ad8ba2cd

由 Sheng Yang 提交于 1月 06, 2009

kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.

For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.
Signed-off-by: NSheng Yang <sheng@linux.intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

ad8ba2cd

KVM: Avoid using CONFIG_ in userspace visible headers · 7a0eb196

由 Avi Kivity 提交于 1月 19, 2009

Kconfig symbols are not available in userspace, and are not stripped by
headers-install.  Avoid their use by adding #defines in <asm/kvm.h> to
suit each architecture.
Signed-off-by: NAvi Kivity <avi@redhat.com>

7a0eb196

KVM: ia64: fix fp fault/trap handler · d39123a4

由 Yang Zhang 提交于 1月 08, 2009

The floating-point registers f6-f11 is used by vmm and
saved in kvm-pt-regs, so should set the correct bit mask
and the pointer in fp_state, otherwise, fpswa may touch
vmm's fp registers instead of guests'.

In addition, for fp trap handling,  since the instruction
which leads to fp trap is completely executed, so can't
use retry machanism to re-execute it, because it may
pollute some registers.
Signed-off-by: NYang Zhang <yang.zhang@intel.com>
Signed-off-by: NAvi Kivity <avi@redhat.com>

d39123a4

x86, olpc: fix model detection without OFW · e49590b6

由 Chris Ball 提交于 2月 13, 2009

Impact: fix "garbled display, laptop is unusable" bug

Commit e51a1ac2 ("x86, olpc: fix endian
bug in openfirmware workaround") breaks model comparison on OLPC; the value
0xc2 needs to be scaled up by olpc_board().

The pre-patch version was wrong, but accidentally worked anyway
(big-endian 0xc2 is big enough to satisfy all other board revisions,
but little endian 0xc2 is not).
Signed-off-by: NChris Ball <cjb@laptop.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: NAndres Salomon <dilinger@queued.net>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e49590b6

13 2月, 2009 8 次提交

x86, hpet: fix for LS21 + HPET = boot hang · b13e2464

由 john stultz 提交于 2月 12, 2009

Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
systems that had the HPET enabled in the BIOS, resulting in boot hangs
for x86_64.

Specifically commit b8ce3359, which
merges the i386 and x86_64 HPET code.

Prior to this commit, when we setup the HPET timers in x86_64, we did
the following:

	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                    HPET_TN_32BIT, HPET_T0_CFG);

However after the i386/x86_64 HPET merge, we do the following:

	cfg = hpet_readl(HPET_Tn_CFG(timer));
	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
			HPET_TN_SETVAL | HPET_TN_32BIT;
	hpet_writel(cfg, HPET_Tn_CFG(timer));

However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
causes the periodic interrupt to be not so periodic, and that results in
the boot time hang I reported earlier in the delay calibration.

My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b13e2464

powerpc/vsx: Fix VSX alignment handler for regs 32-63 · 26456dcf

由 Michael Neuling 提交于 2月 12, 2009

Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
in the VMX part of the thread_struct not the FPR part.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
CC: stable@kernel.org (2.6.27 & .28 please)
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

26456dcf

powerpc/ps3: Move ps3_mm_add_memory to device_initcall · 0047656e

由 Geoff Levand 提交于 2月 12, 2009

Change the PS3 hotplug memory routine ps3_mm_add_memory() from
a core_initcall to a device_initcall.

core_initcall routines run before the powerpc topology_init()
startup routine, which is a subsys_initcall, resulting in
failure of ps3_mm_add_memory() when CONFIG_NUMA=y.  When
ps3_mm_add_memory() fails the system will boot with just the
128 MiB of boot memory
Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0047656e

powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6

由 Dave Hansen 提交于 2月 12, 2009

Fix the powerpc NUMA reserve bootmem page selection logic.

commit 8f64e1f2 (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions.  As the folowing discussion reports,
the new logic was not correct.

mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area.  It searches for
active regions that intersect with that LMB and are on the
specified node.  It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect.  We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.

We base the size of the bootmem reservation on two possible
things.  Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.

However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions.  Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions.  So, the bootmem reservation must be trimmed to
fit inside the current active region.

That's all fine and dandy, but we trim the reservation
in a page-aligned fashion.  That's bad because we start the
reservation at a non-page-aligned address: physbase.

The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.

Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000.  You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
and then call

  reserve_bootmem_node(node, physbase=0xfff, size=0x1000);

0x1000 may not be on the same node as 0xfff.  Oops.

In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range.  Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.

We also need to do math for the reserved size with physbase
instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node.  However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area.  That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.

From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

06eccea6

powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL · fbc78b07

由 Philippe Gerum 提交于 2月 12, 2009

Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.
Signed-off-by: NPhilippe Gerum <rpm@xenomai.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fbc78b07

x86: CPA avoid repeated lazy mmu flush · 7ad9de6a

由 Thomas Gleixner 提交于 2月 12, 2009

Impact: Flush the lazy MMU only once

Pending mmu updates only need to be flushed once to bring the
in-memory pagetable state up to date.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

7ad9de6a

x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context · 34b0900d

由 Thomas Gleixner 提交于 2月 12, 2009

Impact: Catch cases where lazy MMU state is active in a preemtible context

arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
the checks in enter/leave will never trigger. Put the preemtible()
check into arch_flush_lazy_mmu_cpu() to catch such cases.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

34b0900d

x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption · d85cf93d

由 Jeremy Fitzhardinge 提交于 2月 12, 2009

Impact: avoid access to percpu vars in preempible context

They are intended to be used whenever there's the possibility
that there's some stale state which is going to be overwritten
with a queued update, or to force a state change when we may be
in lazy mode.  Either way, we could end up calling it with
preemption enabled, so wrap the functions in their own little
preempt-disable section so they can be safely called in any
context (though preemption should never be enabled if we're actually
in a lazy state).

(Move out of line to avoid #include dependencies.)
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

d85cf93d

12 2月, 2009 2 次提交

x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem · be03d9e8

由 Suresh Siddha 提交于 2月 11, 2009

Jeff Mahoney reported:

> With Suse's hwinfo tool, on -tip:
> WARNING: at arch/x86/mm/pat.c:637 reserve_pfn_range+0x5b/0x26d()

reserve_pfn_range() is not tracking the memory range below 1MB
as non-RAM and as such is inconsistent with similar checks in
reserve_memtype() and free_memtype()

Rename the pagerange_is_ram() to pat_pagerange_is_ram() and add the
"track legacy 1MB region as non RAM" condition.

And also, fix reserve_pfn_range() to return -EINVAL, when the pfn
range is RAM. This is to be consistent with this API design.
Reported-and-tested-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

be03d9e8

x86/cpa: make sure cpa is safe to call in lazy mmu mode · 4f06b043

由 Jeremy Fitzhardinge 提交于 2月 11, 2009

Impact: fix race leading to crash under KVM and Xen

The CPA code may be called while we're in lazy mmu update mode - for
example, when using DEBUG_PAGE_ALLOC and doing a slab allocation
in an interrupt handler which interrupted a lazy mmu update.  In this
case, the in-memory pagetable state may be out of date due to pending
queued updates.  We need to flush any pending updates before inspecting
the page table.  Similarly, we must explicitly flush any modifications
CPA may have made (which comes down to flushing queued operations when
flushing the TLB).
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
Cc: Stable Kernel <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4f06b043

11 2月, 2009 7 次提交

x86, ptrace, mm: fix double-free on race · 9f339e70

由 Markus Metzger 提交于 2月 11, 2009

Ptrace_detach() races with __ptrace_unlink() if the traced task is
reaped while detaching. This might cause a double-free of the BTS
buffer.

Change the ptrace_detach() path to only do the memory accounting in
ptrace_bts_detach() and leave the buffer free to ptrace_bts_untrace()
which will be called from __ptrace_unlink().

The fix follows a proposal from Oleg Nesterov.
Reported-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9f339e70

M
[S390] Update default configuration. · 95ec807e
由 Martin Schwidefsky 提交于 2月 11, 2009
```
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
```
95ec807e

[S390] Fix init irq proc build break. · 0addff81

由 Sachin Sant 提交于 2月 11, 2009

Embed init_irq_proc(s390) within CONFIG_PROC_FS to fix a build break.

Signed-off-by : Sachin Sant <sachinp@in.ibm.com>

0addff81

[S390] vdso: fix per cpu vdso pointer in lowcore · d5e842c4

由 Martin Schwidefsky 提交于 2月 11, 2009

The vdso_per_cpu_data entry in the lowcore structure uses __u32
instead of __u64. If the data page is above 4GB the pointer is
truncated and the kernel crashes.
Reported-by: NMijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>

d5e842c4

tracing, x86: fix constraint for parent variable · f47a454d

由 Steven Rostedt 提交于 2月 10, 2009

The constraint used for retrieving and restoring the parent function
pointer is incorrect. The parent variable is a pointer, and the
address of the pointer is modified by the asm statement and not
the pointer itself. It is incorrect to pass it in as an output
constraint since the asm will never update the pointer.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f47a454d

powerpc/mm: Fix _PAGE_COHERENT support on classic ppc32 HW · f99fb8a2

由 Kumar Gala 提交于 2月 10, 2009

The following commit:

commit 64b3d0e8
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date:   Thu Dec 18 19:13:51 2008 +0000

    powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED

broke setting of the _PAGE_COHERENT bit in the PPC HW PTE.  Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.
Reported-by: NMartyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

f99fb8a2

tracing, x86: fix fixup section to return to original code · e3944bfa

由 Steven Rostedt 提交于 2月 10, 2009

Impact: fix to prevent a kernel crash on fault

If for some reason the pointer to the parent function on the
stack takes a fault, the fix up code will not return back to
the original faulting code. This can lead to unpredictable
results and perhaps even a kernel panic.

A fault should not happen, but if it does, we should simply
disable the tracer, warn, and continue running the kernel.
It should not lead to a kernel crash.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>

e3944bfa

10 2月, 2009 2 次提交

i8327: fix outb() parameter order · b52af409

由 Clemens Ladisch 提交于 2月 10, 2009

In i8237A_resume(), when resetting the DMA controller, the parameters to
dma_outb() were mixed up.
Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
[ cleaned up the file a tiny bit. ]
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b52af409

[ARM] Storage class should be before const qualifier · e0fc4f97

由 Tobias Klauser 提交于 2月 09, 2009

The C99 specification states in section 6.11.5:

The placement of a storage-class specifier other than at the beginning of the
declaration specifiers in a declaration is an obsolescent feature.
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

e0fc4f97

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功