1. 20 2月, 2009 5 次提交
    • I
      [IA64] fixes configs and add default config for ia64 xen domU · 1d5b20f4
      Isaku Yamahata 提交于
      This patch fixes xen related Kconfigs and add default config
      file for ia64 xen domU.
      Signed-off-by: NIsaku Yamahata <yamahata@valinux.co.jp>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      1d5b20f4
    • A
      [IA64] Remove redundant cpu_clear() in __cpu_disable path · c0acdea2
      Alex Chiang 提交于
      The second call to cpu_clear() is redundant, as we've already removed
      the CPU from cpu_online_map before calling migrate_platform_irqs().
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      c0acdea2
    • A
      [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs" · 66db2e63
      Alex Chiang 提交于
      This reverts commit e7b14036.
      
      Commit e7b14036 removes the targetted disabled CPU from the
      cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.
      
      Paul McKenney states that the reasoning behind the patch was to
      prevent irq handlers from running on CPUs marked offline because:
      
      	RCU happily ignores CPUs that don't have their bits set in
      	cpu_online_map, so if there are RCU read-side critical sections
      	in the irq handlers being run, RCU will ignore them.  If the
      	other CPUs were running, they might sequence through the RCU
      	state machine, which could result in data structures being
      	yanked out from under those irq handlers, which in turn could
      	result in oopses or worse.
      
      Unfortunately, both ia64 functions above look at cpu_online_map to find
      a new CPU to migrate interrupts onto. This means we can potentially
      migrate an interrupt off ourself back to... ourself. Uh oh.
      
      This causes an oops when we finally try to process pending interrupts on
      the CPU we want to disable. The oops results from calling __do_IRQ with
      a NULL pt_regs:
      
      Unable to handle kernel NULL pointer dereference (address 0000000000000040)
      Call Trace:
       [<a000000100016930>] show_stack+0x50/0xa0
                                      sp=e0000009c922fa00 bsp=e0000009c92214d0
       [<a0000001000171a0>] show_regs+0x820/0x860
                                      sp=e0000009c922fbd0 bsp=e0000009c9221478
       [<a00000010003c700>] die+0x1a0/0x2e0
                                      sp=e0000009c922fbd0 bsp=e0000009c9221438
       [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                      sp=e0000009c922fbd0 bsp=e0000009c92213d8
       [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                      sp=e0000009c922fc60 bsp=e0000009c92213d8
       [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                      sp=e0000009c922fe30 bsp=e0000009c9221398
       [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                      sp=e0000009c922fe30 bsp=e0000009c9221330
       [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                      sp=e0000009c922fe30 bsp=e0000009c92212f8
       [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                      sp=e0000009c922fe30 bsp=e0000009c9221290
       [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                      sp=e0000009c922fe30 bsp=e0000009c9221208
       [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                      sp=e0000009c922fe30 bsp=e0000009c92211a8
       [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                      sp=e0000009c922fe30 bsp=e0000009c9221168
       [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                      sp=e0000009c922fe30 bsp=e0000009c9221148
       [<a000000100122610>] stop_cpu+0xd0/0x200
                                      sp=e0000009c922fe30 bsp=e0000009c92210f0
       [<a0000001000e0440>] kthread+0xc0/0x140
                                      sp=e0000009c922fe30 bsp=e0000009c92210c8
       [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                      sp=e0000009c922fe30 bsp=e0000009c92210a0
       [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                      sp=e0000009c922fe30 bsp=e0000009c92210a0
      
      I don't like this revert because it is fragile. ia64 is getting lucky
      because we seem to only ever process timer interrupts in this path, but
      if we ever race with an IPI here, we definitely use RCU and have the
      potential of hitting an oops that Paul describes above.
      
      Patching ia64's timer_interrupt() to check for NULL pt_regs is
      insufficient though, as we still hit the above oops.
      
      As a short term solution, I do think that this revert is the right
      answer. The revert hold up under repeated testing (24+ hour test runs)
      with this setup:
      
      	- 8-way rx6600
      	- randomly toggling CPU online/offline state every 2 seconds
      	- running CPU exercisers, memory hog, disk exercisers, and
      	  network stressors
      	- average system load around ~160
      
      In the long term, we really need to figure out why we set pt_regs = NULL
      in ia64_process_pending_intr(). If it turns out that it is unnecessary
      to do so, then we could safely re-introduce e7b14036 (along with some
      other logic to be smarter about migrating interrupts).
      
      One final note: x86 also removes the disabled CPU from cpu_online_map
      and then re-enables interrupts for 1ms, presumably to handle any pending
      interrupts:
      
      arch/x86/kernel/irq_32.c (and irq_64.c):
      cpu_disable_common:
      	[remove cpu from cpu_online_map]
      
      	fixup_irqs():
      		for_each_irq:
      			[break CPU affinities]
      
      		local_irq_enable();
      		mdelay(1);
      		local_irq_disable();
      
      So they are doing implicitly what ia64 is doing explicitly.
      Signed-off-by: NAlex Chiang <achiang@hp.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      66db2e63
    • R
      [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON. · 39d481cb
      Robin Holt 提交于
      BTE_MAX_XFER is wrong.  It is one greater than the number of cache
      lines the BTE is actually able to transfer.  If you request a transfer
      of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
      should certainly be made more clear.
      
      This patch fixes that constant and also cleans up the BUG_ON()s in
      arch/ia64/sn/kernel/bte.c to test one condition per line.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NTony Luck <aegl@agluck-desktop.(none)>
      39d481cb
    • T
      [IA64] Build fix for __early_pfn_to_nid() undefined link error · 334f85b6
      Tony Luck 提交于
      ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
      so the recent:
      
      	commit: f2dbcfa7
      	mm: clean up for early_pfn_to_nid()
      
      ends up with some link problems for certain configuration files.
      
      Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
      cases where we do provide this function.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      334f85b6
  2. 19 2月, 2009 2 次提交
    • K
      mm: fix memmap init for handling memory hole · cc2559bc
      KAMEZAWA Hiroyuki 提交于
      Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
      and memmap initialization was not done. This was a trouble for
      sparc boot.
      
      To fix this, the PFN should be initialized and marked as PG_reserved.
      This patch changes early_pfn_in_nid() return true if PFN is a hole.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc2559bc
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  3. 16 2月, 2009 3 次提交
  4. 15 2月, 2009 11 次提交
  5. 13 2月, 2009 8 次提交
    • J
      x86, hpet: fix for LS21 + HPET = boot hang · b13e2464
      john stultz 提交于
      Between 2.6.23 and 2.6.24-rc1 a change was made that broke IBM LS21
      systems that had the HPET enabled in the BIOS, resulting in boot hangs
      for x86_64.
      
      Specifically commit b8ce3359, which
      merges the i386 and x86_64 HPET code.
      
      Prior to this commit, when we setup the HPET timers in x86_64, we did
      the following:
      
      	hpet_writel(HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
                          HPET_TN_32BIT, HPET_T0_CFG);
      
      However after the i386/x86_64 HPET merge, we do the following:
      
      	cfg = hpet_readl(HPET_Tn_CFG(timer));
      	cfg |= HPET_TN_ENABLE | HPET_TN_PERIODIC |
      			HPET_TN_SETVAL | HPET_TN_32BIT;
      	hpet_writel(cfg, HPET_Tn_CFG(timer));
      
      However on LS21s with HPET enabled in the BIOS, the HPET_T0_CFG register
      boots with Level triggered interrupts (HPET_TN_LEVEL) enabled. This
      causes the periodic interrupt to be not so periodic, and that results in
      the boot time hang I reported earlier in the delay calibration.
      
      My fix: Always disable HPET_TN_LEVEL when setting up periodic mode.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b13e2464
    • M
      powerpc/vsx: Fix VSX alignment handler for regs 32-63 · 26456dcf
      Michael Neuling 提交于
      Fix the VSX alignment handler for VSX registers > 32.  32-63 are stored
      in the VMX part of the thread_struct not the FPR part.
      Signed-off-by: NMichael Neuling <mikey@neuling.org>
      CC: stable@kernel.org (2.6.27 & .28 please)
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      26456dcf
    • G
      powerpc/ps3: Move ps3_mm_add_memory to device_initcall · 0047656e
      Geoff Levand 提交于
      Change the PS3 hotplug memory routine ps3_mm_add_memory() from
      a core_initcall to a device_initcall.
      
      core_initcall routines run before the powerpc topology_init()
      startup routine, which is a subsys_initcall, resulting in
      failure of ps3_mm_add_memory() when CONFIG_NUMA=y.  When
      ps3_mm_add_memory() fails the system will boot with just the
      128 MiB of boot memory
      Signed-off-by: NGeoff Levand <geoffrey.levand@am.sony.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0047656e
    • D
      powerpc/mm: Fix numa reserve bootmem page selection · 06eccea6
      Dave Hansen 提交于
      Fix the powerpc NUMA reserve bootmem page selection logic.
      
      commit 8f64e1f2 (powerpc: Reserve
      in bootmem lmb reserved regions that cross NUMA nodes) changed
      the logic for how the powerpc LMB reserved regions were converted
      to bootmen reserved regions.  As the folowing discussion reports,
      the new logic was not correct.
      
      mark_reserved_regions_for_nid() goes through each LMB on the
      system that specifies a reserved area.  It searches for
      active regions that intersect with that LMB and are on the
      specified node.  It attempts to bootmem-reserve only the area
      where the active region and the reserved LMB intersect.  We
      can not reserve things on other nodes as they may not have
      bootmem structures allocated, yet.
      
      We base the size of the bootmem reservation on two possible
      things.  Normally, we just make the reservation start and
      stop exactly at the start and end of the LMB.
      
      However, the LMB reservations are not aware of NUMA nodes and
      on occasion a single LMB may cross into several adjacent
      active regions.  Those may even be on different NUMA nodes
      and will require separate calls to the bootmem reserve
      functions.  So, the bootmem reservation must be trimmed to
      fit inside the current active region.
      
      That's all fine and dandy, but we trim the reservation
      in a page-aligned fashion.  That's bad because we start the
      reservation at a non-page-aligned address: physbase.
      
      The reservation may only span 2 bytes, but that those bytes
      may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
      
      Take the case where you reserve 0x2 bytes at 0x0fff and
      where the active region ends at 0x1000.  You'll jump into
      that if() statment, but node_ar.end_pfn=0x1 and
      start_pfn=0x0.  You'll end up with a reserve_size=0x1000,
      and then call
      
        reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
      
      0x1000 may not be on the same node as 0xfff.  Oops.
      
      In almost all the vm code, end_<anything> is not inclusive.
      If you have an end_pfn of 0x1234, page 0x1234 is not
      included in the range.  Using PFN_UP instead of the
      (>> >> PAGE_SHIFT) will make this consistent with the other VM
      code.
      
      We also need to do math for the reserved size with physbase
      instead of start_pfn.  node_ar.end_pfn << PAGE_SHIFT is
      *precisely* the end of the node.  However,
      (start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
      of the reserved area.  That is, of course, physbase.
      If we don't use physbase here, the reserve_size can be
      made too large.
      
      From: Dave Hansen <dave@linux.vnet.ibm.com>
      Tested-by: Geoff Levand <geoffrey.levand@am.sony.com>  Tested on PS3.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      06eccea6
    • P
      powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL · fbc78b07
      Philippe Gerum 提交于
      Fix _PAGE_CHG_MASK so that pte_modify() does not affect the _PAGE_SPECIAL bit.
      Signed-off-by: NPhilippe Gerum <rpm@xenomai.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fbc78b07
    • T
      x86: CPA avoid repeated lazy mmu flush · 7ad9de6a
      Thomas Gleixner 提交于
      Impact: Flush the lazy MMU only once
      
      Pending mmu updates only need to be flushed once to bring the
      in-memory pagetable state up to date.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      7ad9de6a
    • T
      x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context · 34b0900d
      Thomas Gleixner 提交于
      Impact: Catch cases where lazy MMU state is active in a preemtible context
      
      arch_flush_lazy_mmu_cpu() has been changed to disable preemption so
      the checks in enter/leave will never trigger. Put the preemtible()
      check into arch_flush_lazy_mmu_cpu() to catch such cases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      34b0900d
    • J
      x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption · d85cf93d
      Jeremy Fitzhardinge 提交于
      Impact: avoid access to percpu vars in preempible context
      
      They are intended to be used whenever there's the possibility
      that there's some stale state which is going to be overwritten
      with a queued update, or to force a state change when we may be
      in lazy mode.  Either way, we could end up calling it with
      preemption enabled, so wrap the functions in their own little
      preempt-disable section so they can be safely called in any
      context (though preemption should never be enabled if we're actually
      in a lazy state).
      
      (Move out of line to avoid #include dependencies.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      d85cf93d
  6. 12 2月, 2009 2 次提交
  7. 11 2月, 2009 7 次提交
  8. 10 2月, 2009 2 次提交