1. 13 3月, 2009 4 次提交
  2. 19 2月, 2009 1 次提交
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
  3. 09 2月, 2009 1 次提交
  4. 27 1月, 2009 1 次提交
  5. 17 12月, 2008 1 次提交
  6. 26 7月, 2008 1 次提交
  7. 25 7月, 2008 1 次提交
  8. 22 7月, 2008 1 次提交
  9. 08 7月, 2008 6 次提交
    • Y
      x86: remove end_pfn in 64bit · c987d12f
      Yinghai Lu 提交于
      and use max_pfn directly.
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c987d12f
    • Y
      x86: introduce initmem_init for 64 bit · 1f75d7e3
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1f75d7e3
    • T
      x86: numa_64.c fix shadowed variable · 886533a3
      Thomas Gleixner 提交于
      sparse mutters:
      arch/x86/mm/numa_64.c:195:27: warning: symbol 'end_pfn' shadows an earlier one
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      886533a3
    • T
      x86: numa_64.c make local variables static · 864fc31e
      Thomas Gleixner 提交于
      plat_node_bdata, cmdline, nodemap_addr, nodemap_size are local to
      numa_64.c. Make them static
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      864fc31e
    • M
      x86: remove the static 256k node_to_cpumask_map · 9f248bde
      Mike Travis 提交于
        * Consolidate node_to_cpumask operations and remove the 256k
          byte node_to_cpumask_map.  This is done by allocating the
          node_to_cpumask_map array after the number of possible nodes
          (nr_node_ids) is known.
      
        * Debug printouts when CONFIG_DEBUG_PER_CPU_MAPS is active have
          been increased.  It now shows faults when calling node_to_cpumask()
          and node_to_cpumask_ptr().
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      9f248bde
    • M
      x86: cleanup early per cpu variables/accesses v4 · 23ca4bba
      Mike Travis 提交于
        * Introduce a new PER_CPU macro called "EARLY_PER_CPU".  This is
          used by some per_cpu variables that are initialized and accessed
          before there are per_cpu areas allocated.
      
          ["Early" in respect to per_cpu variables is "earlier than the per_cpu
          areas have been setup".]
      
          This patchset adds these new macros:
      
      	DEFINE_EARLY_PER_CPU(_type, _name, _initvalue)
      	EXPORT_EARLY_PER_CPU_SYMBOL(_name)
      	DECLARE_EARLY_PER_CPU(_type, _name)
      
      	early_per_cpu_ptr(_name)
      	early_per_cpu_map(_name, _idx)
      	early_per_cpu(_name, _cpu)
      
          The DEFINE macro defines the per_cpu variable as well as the early
          map and pointer.  It also initializes the per_cpu variable and map
          elements to "_initvalue".  The early_* macros provide access to
          the initial map (usually setup during system init) and the early
          pointer.  This pointer is initialized to point to the early map
          but is then NULL'ed when the actual per_cpu areas are setup.  After
          that the per_cpu variable is the correct access to the variable.
      
          The early_per_cpu() macro is not very efficient but does show how to
          access the variable if you have a function that can be called both
          "early" and "late".  It tests the early ptr to be NULL, and if not
          then it's still valid.  Otherwise, the per_cpu variable is used
          instead:
      
      	#define early_per_cpu(_name, _cpu) 			\
      		(early_per_cpu_ptr(_name) ?			\
      			early_per_cpu_ptr(_name)[_cpu] :	\
      			per_cpu(_name, _cpu))
      
          A better method is to actually check the pointer manually.  In the
          case below, numa_set_node can be called both "early" and "late":
      
      	void __cpuinit numa_set_node(int cpu, int node)
      	{
      	    int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
      
      	    if (cpu_to_node_map)
      		    cpu_to_node_map[cpu] = node;
      	    else
      		    per_cpu(x86_cpu_to_node_map, cpu) = node;
      	}
      
        * Add a flag "arch_provides_topology_pointers" that indicates pointers
          to topology cpumask_t maps are available.  Otherwise, use the function
          returning the cpumask_t value.  This is useful if cpumask_t set size
          is very large to avoid copying data on to/off of the stack.
      
        * The coverage of CONFIG_DEBUG_PER_CPU_MAPS has been increased while
          the non-debug case has been optimized a bit.
      
        * Remove an unreferenced compiler warning in drivers/base/topology.c
      
        * Clean up #ifdef in setup.c
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      23ca4bba
  10. 25 5月, 2008 1 次提交
  11. 27 4月, 2008 1 次提交
    • Y
      x86_64: fix setup_node_bootmem to support big mem excluding with memmap · 1a27fc0a
      Yinghai Lu 提交于
      typical case: four sockets system, every node has 4g ram, and we are using:
      
      	memmap=10g$4g
      
      to mask out memory on node1 and node2
      
      when numa is enabled, early_node_mem is used to get node_data and node_bootmap.
      
      if it can not get memory from the same node with find_e820_area(), it will
      use alloc_bootmem to get buff from previous nodes.
      
      so check it and print out some info about it.
      
      need to move early_res_to_bootmem into every setup_node_bootmem.
      and it takes range that node has. otherwise alloc_bootmem could return addr
      that reserved early.
      
      depends on "mm: make reserve_bootmem can crossed the nodes".
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a27fc0a
  12. 20 4月, 2008 2 次提交
  13. 17 4月, 2008 2 次提交
  14. 22 3月, 2008 1 次提交
  15. 05 3月, 2008 1 次提交
  16. 19 2月, 2008 1 次提交
    • Y
      x86: reenable support for system without on node0 · b7ad149d
      Yinghai Lu 提交于
      One system doesn't have RAM for node0 installed.
      
      SRAT: PXM 0 -> APIC 0 -> Node 0
      SRAT: PXM 0 -> APIC 1 -> Node 0
      SRAT: PXM 1 -> APIC 2 -> Node 1
      SRAT: PXM 1 -> APIC 3 -> Node 1
      SRAT: Node 1 PXM 1 0-a0000
      SRAT: Node 1 PXM 1 0-dd000000
      SRAT: Node 1 PXM 1 0-123000000
      ACPI: SLIT: nodes = 2
       10 13
       13 10
      mapped APIC to ffffffffff5fb000 (        fee00000)
      Bootmem setup node 1 0000000000000000-0000000123000000
        NODE_DATA [000000000000e000 - 0000000000014fff]
        bootmap [0000000000015000 -  00000000000395ff] pages 25
      Could not find start_pfn for node 0
      Pid: 0, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #14
      
      Call Trace:
       [<ffffffff80bab498>] free_area_init_node+0x22/0x381
       [<ffffffff8045ffc5>] generic_swap+0x0/0x17
       [<ffffffff80bab0cc>] find_zone_movable_pfns_for_nodes+0x54/0x271
       [<ffffffff80baba5f>] free_area_init_nodes+0x239/0x287
       [<ffffffff80ba6311>] paging_init+0x46/0x4c
       [<ffffffff80b9dda5>] setup_arch+0x3c3/0x44e
       [<ffffffff80b978be>] start_kernel+0x6f/0x2c7
       [<ffffffff80b971cc>] _sinittext+0x1cc/0x1d3
      
      This happens because node 0 is not online, but the node state in
      mm/page_alloc.c has node 0 set.
      
              nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
                      [N_POSSIBLE] = NODE_MASK_ALL,
                      [N_ONLINE] = { { [0] = 1UL } },
      
      So we need to clear node_online_map before initializing the memory.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b7ad149d
  17. 08 2月, 2008 1 次提交
    • B
      Introduce flags for reserve_bootmem() · 72a7fe39
      Bernhard Walle 提交于
      This patchset adds a flags variable to reserve_bootmem() and uses the
      BOOTMEM_EXCLUSIVE flag in crashkernel reservation code to detect collisions
      between crashkernel area and already used memory.
      
      This patch:
      
      Change the reserve_bootmem() function to accept a new flag BOOTMEM_EXCLUSIVE.
      If that flag is set, the function returns with -EBUSY if the memory already
      has been reserved in the past.  This is to avoid conflicts.
      
      Because that code runs before SMP initialisation, there's no race condition
      inside reserve_bootmem_core().
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix powerpc build]
      Signed-off-by: NBernhard Walle <bwalle@suse.de>
      Cc: <linux-arch@vger.kernel.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72a7fe39
  18. 04 2月, 2008 1 次提交
  19. 02 2月, 2008 3 次提交
    • Y
      x86: remove unneeded round_up · 9347e0b0
      Yinghai Lu 提交于
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9347e0b0
    • Y
      x86_64: make bootmap_start page align v6 · 24a5da73
      Yinghai Lu 提交于
      boot oopses when a system has 64 or 128 GB of RAM installed:
      
      Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
      BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
      IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
      PGD 0
      Oops: 0000 [1] SMP
      CPU 0
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
      RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
      RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
      RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
      RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
      RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
      R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
      R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
      FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
      Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
       00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
       0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
      Call Trace:
       [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
       [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
       [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
       [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
       [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
       [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
       [<ffffffff8020ccee>] ? child_rip+0x0/0x12
      
      Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
      RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
       RSP <ffff810824c57e60>
      CR2: 000000000000005f
      ---[ end trace 02c2d78def82877a ]---
      Kernel panic - not syncing: Attempted to kill init!
      
      it turns out some variables near end of bss are corrupted already.
      
      in System.map we have
      ffffffff80d40420 b rsi_table
      ffffffff80d40620 B krb5_seq_lock
      ffffffff80d40628 b i.20437
      ffffffff80d40630 b xprt_rdma_inline_write_padding
      ffffffff80d40638 b sunrpc_table_header
      ffffffff80d40640 b zero
      ffffffff80d40644 b min_memreg
      ffffffff80d40648 b rpcrdma_tk_lock_g
      ffffffff80d40650 B sctp_assocs_id_lock
      ffffffff80d40658 B proc_net_sctp
      ffffffff80d40660 B sctp_assocs_id
      ffffffff80d40680 B sysctl_sctp_mem
      ffffffff80d40690 B sysctl_sctp_rmem
      ffffffff80d406a0 B sysctl_sctp_wmem
      ffffffff80d406b0 b sctp_ctl_socket
      ffffffff80d406b8 b sctp_pf_inet6_specific
      ffffffff80d406c0 b sctp_pf_inet_specific
      ffffffff80d406c8 b sctp_af_v4_specific
      ffffffff80d406d0 b sctp_af_v6_specific
      ffffffff80d406d8 b sctp_rand.33270
      ffffffff80d406dc b sctp_memory_pressure
      ffffffff80d406e0 b sctp_sockets_allocated
      ffffffff80d406e4 b sctp_memory_allocated
      ffffffff80d406e8 b sctp_sysctl_header
      ffffffff80d406f0 b zero
      ffffffff80d406f4 A __bss_stop
      ffffffff80d406f4 A _end
      
      and setup_node_bootmem() will use that page 0xd40000 for bootmap
      Bootmem setup node 0 0000000000000000-0000000828000000
        NODE_DATA [000000000008a485 - 0000000000091484]
        bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
      Bootmem setup node 1 0000000828000000-0000001028000000
        NODE_DATA [0000000828000000 - 0000000828006fff]
        bootmap [0000000828007000 -  0000000828106fff] pages 100
      Bootmem setup node 2 0000001028000000-0000001828000000
        NODE_DATA [0000001028000000 - 0000001028006fff]
        bootmap [0000001028007000 -  0000001028106fff] pages 100
      Bootmem setup node 3 0000001828000000-0000002028000000
        NODE_DATA [0000001828000000 - 0000001828006fff]
        bootmap [0000001828007000 -  0000001828106fff] pages 100
      
      setup_node_bootmem() makes NODE_DATA cacheline aligned,
      and bootmap is page-aligned.
      
      the patch updates find_e820_area() to make sure we can meet
      the alignment constraints.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      24a5da73
    • Y
      x86_64: add debug name for early_res · 25eff8d4
      Yinghai Lu 提交于
      helps debugging problems in this rather murky area of code.
      Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      25eff8d4
  20. 30 1月, 2008 9 次提交