提交 · 24a5da73f49c17ca88f369b257fef620a494e79d · openeuler / raspberrypi-kernel

02 2月, 2008 2 次提交

x86_64: make bootmap_start page align v6 · 24a5da73

由 Yinghai Lu 提交于 2月 01, 2008

boot oopses when a system has 64 or 128 GB of RAM installed:

Calling initcall 0xffffffff80bc33b6: sctp_init+0x0/0x711()
BUG: unable to handle kernel NULL pointer dereference at 000000000000005f
IP: [<ffffffff802bfe55>] proc_register+0xe7/0x10f
PGD 0
Oops: 0000 [1] SMP
CPU 0
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.24-smp-g5a514e21-dirty #6
RIP: 0010:[<ffffffff802bfe55>]  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
RSP: 0000:ffff810824c57e60  EFLAGS: 00010246
RAX: 000000000000d7d7 RBX: ffff811024c5fa80 RCX: ffff810824c57e08
RDX: 0000000000000000 RSI: 0000000000000195 RDI: ffffffff80cc2460
RBP: ffffffffffffffff R08: 0000000000000000 R09: ffff811024c5fa80
R10: 0000000000000000 R11: 0000000000000002 R12: ffff810824c57e6c
R13: 0000000000000000 R14: ffff810824c57ee0 R15: 00000006abd25bee
FS:  0000000000000000(0000) GS:ffffffff80b4d000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 000000000000005f CR3: 0000000000201000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper (pid: 1, threadinfo ffff810824c56000, task ffff812024c52000)
Stack:  ffffffff80a57348 0000019500000000 ffff811024c5fa80 0000000000000000
 00000000ffffff97 ffffffff802bfef0 0000000000000000 ffffffffffffffff
 0000000000000000 ffffffff80bc3b4b ffff810824c57ee0 ffffffff80bc34a5
Call Trace:
 [<ffffffff802bfef0>] ? create_proc_entry+0x73/0x8a
 [<ffffffff80bc3b4b>] ? sctp_snmp_proc_init+0x1c/0x34
 [<ffffffff80bc34a5>] ? sctp_init+0xef/0x711
 [<ffffffff80b976e3>] ? kernel_init+0x175/0x2e1
 [<ffffffff8020ccf8>] ? child_rip+0xa/0x12
 [<ffffffff80b9756e>] ? kernel_init+0x0/0x2e1
 [<ffffffff8020ccee>] ? child_rip+0x0/0x12

Code: 1e 48 83 7b 38 00 75 08 48 c7 43 38 f0 e8 82 80 48 83 7b 30 00 75 08 48 c7 43 30 d0 e9 82 80 48 c7 c7 60 24 cc 80 e8 bd 5a 54 00 <48> 8b 45 60 48 89 6b 58 48 89 5d 60 48 89 43 50 fe 05 f5 25 a0
RIP  [<ffffffff802bfe55>] proc_register+0xe7/0x10f
 RSP <ffff810824c57e60>
CR2: 000000000000005f
---[ end trace 02c2d78def82877a ]---
Kernel panic - not syncing: Attempted to kill init!

it turns out some variables near end of bss are corrupted already.

in System.map we have
ffffffff80d40420 b rsi_table
ffffffff80d40620 B krb5_seq_lock
ffffffff80d40628 b i.20437
ffffffff80d40630 b xprt_rdma_inline_write_padding
ffffffff80d40638 b sunrpc_table_header
ffffffff80d40640 b zero
ffffffff80d40644 b min_memreg
ffffffff80d40648 b rpcrdma_tk_lock_g
ffffffff80d40650 B sctp_assocs_id_lock
ffffffff80d40658 B proc_net_sctp
ffffffff80d40660 B sctp_assocs_id
ffffffff80d40680 B sysctl_sctp_mem
ffffffff80d40690 B sysctl_sctp_rmem
ffffffff80d406a0 B sysctl_sctp_wmem
ffffffff80d406b0 b sctp_ctl_socket
ffffffff80d406b8 b sctp_pf_inet6_specific
ffffffff80d406c0 b sctp_pf_inet_specific
ffffffff80d406c8 b sctp_af_v4_specific
ffffffff80d406d0 b sctp_af_v6_specific
ffffffff80d406d8 b sctp_rand.33270
ffffffff80d406dc b sctp_memory_pressure
ffffffff80d406e0 b sctp_sockets_allocated
ffffffff80d406e4 b sctp_memory_allocated
ffffffff80d406e8 b sctp_sysctl_header
ffffffff80d406f0 b zero
ffffffff80d406f4 A __bss_stop
ffffffff80d406f4 A _end

and setup_node_bootmem() will use that page 0xd40000 for bootmap
Bootmem setup node 0 0000000000000000-0000000828000000
  NODE_DATA [000000000008a485 - 0000000000091484]
  bootmap [0000000000d406f4 -  0000000000e456f3] pages 105
Bootmem setup node 1 0000000828000000-0000001028000000
  NODE_DATA [0000000828000000 - 0000000828006fff]
  bootmap [0000000828007000 -  0000000828106fff] pages 100
Bootmem setup node 2 0000001028000000-0000001828000000
  NODE_DATA [0000001028000000 - 0000001028006fff]
  bootmap [0000001028007000 -  0000001028106fff] pages 100
Bootmem setup node 3 0000001828000000-0000002028000000
  NODE_DATA [0000001828000000 - 0000001828006fff]
  bootmap [0000001828007000 -  0000001828106fff] pages 100

setup_node_bootmem() makes NODE_DATA cacheline aligned,
and bootmap is page-aligned.

the patch updates find_e820_area() to make sure we can meet
the alignment constraints.
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

24a5da73

x86_64: add debug name for early_res · 25eff8d4

由 Yinghai Lu 提交于 2月 01, 2008

helps debugging problems in this rather murky area of code.
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

25eff8d4

30 1月, 2008 21 次提交

x86: fix nodemap_size according to nodeid bits · afadcd78

由 Yinghai Lu 提交于 1月 30, 2008

memnode.map is s16 array because of nodeid is 16 bit now.

so need to increase the nodemap_size according to that bits.
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

afadcd78

x86: early cpu_to_node fix in numa_64.c · 1ce35712

由 travis@sgi.com 提交于 1月 30, 2008

Both of these references to cpu_to_node() can potentially occur
before the "late" cpu_to_node map is setup.  Therefore, they
should be changed to use early_cpu_to_node().
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

1ce35712

x86: change size of node ids from u8 to s16 · 43238382

由 travis@sgi.com 提交于 1月 30, 2008

Change the size of node ids for X86_64 from u8 to s16 to
accomodate more than 32k nodes and allow for NUMA_NO_NODE
(-1) to be sign extended to int.

Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Reviewed-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

43238382

x86: fix early cpu_to_node panic from nr_free_zone_pages · 625d6cff

由 Mike Travis 提交于 1月 30, 2008

call early_cpu_to_node() since per_cpu(cpu_to_node_map) might not be setup
yet.

I also had to export x86_cpu_to_node_map_early_ptr because of some calls
from the network code to numa_node_id():

	net/ipv4/netfilter/arp_tables.c:
	net/ipv4/netfilter/ip_tables.c:
	net/ipv4/netfilter/ip_tables.c:
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

625d6cff

x86: clean up paging_init() · 9c5ba489

由 Ingo Molnar 提交于 1月 30, 2008

Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

9c5ba489

x86: only support sparsemem · fc7250ab

由 Yinghai Lu 提交于 1月 30, 2008

sparsemem is only one supported, so could remove FLAT_NODE_MEM related,
that is only needed !SPARSEMEM
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Reviewed-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

fc7250ab

x86: add debug of invalid per_cpu map accesses · c49a4955

由 travis@sgi.com 提交于 1月 30, 2008

Provide a means to trap usages of per_cpu map variables before
they are setup.  Define CONFIG_DEBUG_PER_CPU_MAPS to activate.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

c49a4955

x86: replace hard coded reservations in 64-bit early boot code with dynamic table · 75175278

由 Andi Kleen 提交于 1月 30, 2008

On x86-64 there are several memory allocations before bootmem. To avoid
them stomping on each other they used to be all hard coded in bad_area().
Replace this with an array that is filled as needed.

This cleans up the code considerably and allows to expand its use.

Cc: peterz@infradead.org
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

75175278

x86: fixup NR-CPUS patch for numa · 316390b0

由 travis@sgi.com 提交于 1月 30, 2008

This patch removes the EXPORT_SYMBOL for:

	x86_cpu_to_node_map_init
	x86_cpu_to_node_map_early_ptr

... thus fixing the section mismatch problem.

Also, the mem -> node hash lookup is fixed.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

316390b0

arch/x86/mm/numa_64.c: section fix · 118c8909

由 Andrew Morton 提交于 1月 30, 2008

WARNING: vmlinux.o(__ksymtab+0x670): Section mismatch: reference to .init.data:x86_cpu_to_node_map_init (between '__ksymtab_x86_cpu_to_node_map_init' and '__ksymtab_node_data')

Cc: Matthew Dobson <colpatch@us.ibm.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Christoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

118c8909

x86: reduce memory and intra-node effects · 693e3c56

由 Mike Travis 提交于 1月 30, 2008

Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

693e3c56

x86: change NR_CPUS arrays in numa_64 · df3825c5

由 travis@sgi.com 提交于 1月 30, 2008

Change the following static arrays sized by NR_CPUS to
per_cpu data variables:

	char cpu_to_node_map[NR_CPUS];
Signed-off-by: NMike Travis <travis@sgi.com>
Reviewed-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

df3825c5

x86: change size of node ids from u8 to u16 · 3cc87e3f

由 travis@sgi.com 提交于 1月 30, 2008

Change the size of node ids from 8 bits to 16 bits to
accomodate more than 256 nodes.
Signed-off-by: NMike Travis <travis@sgi.com>
Reviewed-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

3cc87e3f

x86: change size of APICIDs from u8 to u16 · ef97001f

由 travis@sgi.com 提交于 1月 30, 2008

Change the size of APICIDs from u8 to u16.  This partially
supports the new x2apic mode that will be present on future
processor chips. (Chips actually support 32-bit APICIDs, but that
change is more intrusive. Supporting 16-bit is sufficient for now).
Signed-off-by: NJack Steiner <steiner@sgi.com>

I've included just the partial change from u8 to u16 apicids.  The
remaining x2apic changes will be in a separate patch.

In addition, the fake_node_to_pxm_map[] and fake_apicid_to_node[]
tables have been moved from local data to the __initdata section
reducing stack pressure when MAX_NUMNODES and MAX_LOCAL_APIC are
increased in size.
Signed-off-by: NMike Travis <travis@sgi.com>
Reviewed-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

ef97001f

x86: cleanup setup_node_zones called by paging_init() · a261670a

由 Yinghai Lu 提交于 1月 30, 2008

setup_node_zones() calcuates some variables but only use them when
FLAT_NODE_MEM_MAP is set

so change the MACRO postion to avoid calculating.

also change it to static, and rename it to flat_setup_node_zones().
Signed-off-by: NYinghai Lu <yinghai.lu@sun.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

a261670a

x86: clean up bitops-related warnings · 5548fecd

由 Jeremy Fitzhardinge 提交于 1月 30, 2008

Add casts to appropriate places to silence spurious bitops warnings.
Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5548fecd

x86: 64-bit, make sparsemem vmemmap the only memory model · b263295d

由 Christoph Lameter 提交于 1月 30, 2008

Use sparsemem as the only memory model for UP, SMP and NUMA.  Measurements
indicate that DISCONTIGMEM has a higher overhead than sparsemem.  And
FLATMEMs benefits are minimal.  So I think its best to simply standardize
on sparsemem.

Results of page allocator tests (test can be had via git from slab git
tree branch tests)

Measurements in cycle counts. 1000 allocations were performed and then the
average cycle count was calculated.

Order	FlatMem	Discontig	SparseMem
0	  639	  665		  641
1	  567	  647		  593
2	  679	  774		  692
3	  763	  967		  781
4	  961	 1501		  962
5	 1356	 2344		 1392
6	 2224	 3982		 2336
7	 4869	 7225		 5074
8	12500	14048		12732
9	27926	28223		28165
10	58578	58714		58682

(Note that FlatMem is an SMP config and the rest NUMA configurations)

Memory use:

SMP Sparsemem
-------------

Kernel size:

   text    data     bss     dec     hex filename
3849268  397739 1264856 5511863  541ab7 vmlinux

             total       used       free     shared    buffers     cached
Mem:       8242252      41164    8201088          0        352      11512
-/+ buffers/cache:      29300    8212952
Swap:      9775512          0    9775512

SMP Flatmem
-----------

Kernel size:

   text    data     bss     dec     hex filename
3844612  397739 1264536 5506887  540747 vmlinux

So 4.5k growth in text size vs. FLATMEM.

             total       used       free     shared    buffers     cached
Mem:       8244052      40544    8203508          0        352      11484
-/+ buffers/cache:      28708    8215344

2k growth in overall memory use after boot.

NUMA discontig:

   text    data     bss     dec     hex filename
3888124  470659 1276504 5635287  55fcd7 vmlinux

             total       used       free     shared    buffers     cached
Mem:       8256256      56908    8199348          0        352      11496
-/+ buffers/cache:      45060    8211196
Swap:      9775512          0    9775512

NUMA sparse:

   text    data     bss     dec     hex filename
3896428  470659 1276824 5643911  561e87 vmlinux

8k text growth. Given that we fully inline virt_to_page and friends now
that is rather good.

             total       used       free     shared    buffers     cached
Mem:       8264720      57240    8207480          0        352      11516
-/+ buffers/cache:      45372    8219348
Swap:      9775512          0    9775512

The total available memory is increased by 8k.

This patch makes sparsemem the default and removes discontig and
flatmem support from x86.

[ akpm@linux-foundation.org: allnoconfig build fix ]
Acked-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

b263295d

x86: fixup numa 64 namespace · 7462894a

由 Thomas Gleixner 提交于 1月 30, 2008

Using a variable name, which is the same as a macro name is not
really smart. Change the variable names and fixup all users.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7462894a

x86: cleanup numa_64.c · e3cfe529

由 Thomas Gleixner 提交于 1月 30, 2008

Clean it up before applying more patches.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e3cfe529

x86: nuke a ton of unused exports · 3abf024d

由 Thomas Gleixner 提交于 1月 30, 2008

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3abf024d

x86: move k8 related declarations · c9ff0342

由 Thomas Gleixner 提交于 1月 30, 2008

Move k8 related declarations to k8.h and fix numa_64.c
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c9ff0342

20 10月, 2007 1 次提交

x86: convert cpu_to_apicid to be a per cpu variable · 71fff5e6

由 Mike Travis 提交于 10月 19, 2007

This patch converts the x86_cpu_to_apicid array to be a per cpu
variable. This saves sizeof(apicid) * NR unused cpus.  Access is mostly
from startup and CPU HOTPLUG functions.

MP_processor_info() is one of the functions that require access to the
x86_cpu_to_apicid array before the per_cpu data area is setup.  For this
case, a pointer to the __initdata array is initialized in setup_arch()
and removed in smp_prepare_cpus() after the per_cpu data area is
initialized.

A second change is included to change the initial array value of ARCH
i386 from 0xff to BAD_APICID to be consistent with ARCH x86_64.
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

71fff5e6

18 10月, 2007 2 次提交

x86: fix cpu_to_node references · 98c9e27a

由 Mike Travis 提交于 10月 17, 2007

In x86_64 and i386 architectures most arrays that are sized using
NR_CPUS lay in local memory on node 0.  Not only will most (99%?) of the
systems not use all the slots in these arrays, particularly when NR_CPUS
is increased to accommodate future very high cpu count systems, but a
number of cache lines are passed unnecessarily on the system bus when
these arrays are referenced by cpus on other nodes.

Typically, the values in these arrays are referenced by the cpu
accessing it's own values, though when passing IPI interrupts, the cpu
does access the data relevant to the targeted cpu/node.  Of course, if
the referencing cpu is not on node 0, then the reference will still
require cross node exchanges of cache lines.  A common use of this is
for an interrupt service routine to pass the interrupt to other cpus
local to that node.

Ideally, all the elements in these arrays should be moved to the per_cpu
data area.  In some cases (such as x86_cpu_to_apicid) the array is
referenced before the per_cpu data areas are setup.  In this case, a
static array is declared in the __initdata area and initialized by the
booting cpu (BSP).  The values are then moved to the per_cpu area after
it is initialized and the original static array is freed with the rest
of the __initdata.

This patch:

Fix four instances where cpu_to_node is referenced by array instead of
via the cpu_to_node macro.  This is preparation to moving it to the
per_cpu data area.
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

98c9e27a

x86: 0 -> NULL, for arch/x86_64 · 83e83d54

由 Yoann Padioleau 提交于 10月 17, 2007

When comparing a pointer, it's clearer to compare it to NULL than to 0.

[ tglx: arch/x86 adaptation ]
Signed-off-by: NYoann Padioleau <padator@wanadoo.fr>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: ak@suse.de
Cc: discuss@x86-64.org
Cc: akpm@linux-foundation.org
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

83e83d54

11 10月, 2007 2 次提交

x86_64: move mm · 95119fbd

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

95119fbd

x86_64: prepare shared mm/numa.c · 4d381b58

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4d381b58

22 7月, 2007 3 次提交

x86_64: disable srat when numa emulation succeeds · 1c05f093

由 David Rientjes 提交于 7月 21, 2007

When NUMA emulation succeeds, acpi_numa needs to be set to -1 so that
srat_disabled() will always return true.  We won't be calling
acpi_scan_nodes() or registering the true nodes we've found.

[hugh@veritas.com: Fix x86_64 CONFIG_NUMA_EMU build: acpi_numa needs CONFIG_ACPI_NUMA]
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: NHugh Dickins <hugh@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c05f093

x86_64: fix e820_hole_size based on address ranges · a7e96629

由 David Rientjes 提交于 7月 21, 2007

e820_hole_size() now uses the newly extracted helper function,
e820_find_active_region(), to determine the size of usable RAM in a range of
PFN's.

This was previously broken because of two reasons:

 - The start and end PFN's of each e820 entry were not properly rounded
   prior to excluding those entries in the range, and

 - Entries smaller than a page were not properly excluded from being
   accumulated.

This resulted in emulated nodes being incorrectly mapped to ranges that
were completely reserved and not candidates for being registered as
active ranges.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7e96629

x86_64: fake pxm-to-node mapping for fake numa · 3484d798

由 David Rientjes 提交于 7月 21, 2007

For NUMA emulation, our SLIT should represent the true NUMA topology of the
system but our proximity domain to node ID mapping needs to reflect the
emulated state.

When NUMA emulation has successfully setup fake nodes on the system, a new
function, acpi_fake_nodes() is called. This function determines the proximity
domain (_PXM) for each true node found on the system. It then finds which
emulated nodes have been allocated on this true node as determined by its
starting address. The node ID to PXM mapping is changed so that each fake
node ID points to the PXM of the true node that it is located on.

If the machine failed to register a SLIT, then we assume there is no special
requirement for emulated node affinity so we use the default LOCAL_DISTANCE,
which is newly exported to this code, as our measurement if the emulated nodes
appear in the same PXM. Otherwise, we use REMOTE_DISTANCE.

PXM_INVAL and NID_INVAL are also exported to the ACPI header file so that we
can compare node_to_pxm() results in generic code (in this case, the SRAT
code).

Cc: Len Brown <lenb@kernel.org>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3484d798

03 5月, 2007 4 次提交

[PATCH] x86-64: set node_possible_map at runtime - try 2 · e3f1caee

由 Suresh Siddha 提交于 5月 02, 2007

Set the node_possible_map at runtime on x86_64.  On a non NUMA system,
num_possible_nodes() will now say '1'.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>

e3f1caee

[PATCH] x86-64: fixed size remaining fake nodes · 382591d5

由 David Rientjes 提交于 5月 02, 2007

Extends the numa=fake x86_64 command-line option to split the remaining system
memory into nodes of fixed size.  Any leftover memory is allocated to a final
node unless the command-line ends with a comma.

For example:
  numa=fake=2*512,*128	gives two 512M nodes and the remaining system
			memory is split into nodes of 128M each.

This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the size of the remaining nodes to be allocated is
known based on their capacity for resource management.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

382591d5

[PATCH] x86-64: split remaining fake nodes equally · 14694d73

由 David Rientjes 提交于 5月 02, 2007

Extends the numa=fake x86_64 command-line option to split the remaining
system memory into equal-sized nodes.

For example:
numa=fake=2*512,4*	gives two 512M nodes and the remaining system
			memory is split into four approximately equal
			chunks.

This is beneficial for systems where the exact size of RAM is unknown or not
necessarily relevant, but the granularity with which nodes shall be allocated
is known.

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

14694d73

[PATCH] x86-64: configurable fake numa node sizes · 8b8ca80e

由 David Rientjes 提交于 5月 02, 2007

Extends the numa=fake x86_64 command-line option to allow for configurable
node sizes.  These nodes can be used in conjunction with cpusets for coarse
memory resource management.

The old command-line option is still supported:
  numa=fake=32	gives 32 fake NUMA nodes, ignoring the NUMA setup of the
		actual machine.

But now you may configure your system for the node sizes of your choice:
  numa=fake=2*512,1024,2*256
		gives two 512M nodes, one 1024M node, two 256M nodes, and
		the rest of system memory to a sixth node.

The existing hash function is maintained to support the various node sizes
that are possible with this implementation.

Each node of the same size receives roughly the same amount of available
pages, regardless of any reserved memory with its address range.  The total
available pages on the system is calculated and divided by the number of equal
nodes to allocate.  These nodes are then dynamically allocated and their
borders extended until such time as their number of available pages reaches
the required size.

Configurable node sizes are recommended when used in conjunction with cpusets
for memory control because it eliminates the overhead associated with scanning
the zonelists of many smaller full nodes on page_alloc().

Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Paul Jackson <pj@sgi.com>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

8b8ca80e

13 2月, 2007 4 次提交

[PATCH] x86-64: clean up sparsemem memory_present call · f0a5a58a

由 Bob Picco 提交于 2月 13, 2007

Eliminate arch specific memory_present call x86_64 NUMA by utilizing
sparse_memory_present_with_active_regions.
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

f0a5a58a

[PATCH] x86-64: Fix fake numa for x86_64 machines with big IO hole · 53fee04f

由 Rohit Seth 提交于 2月 13, 2007

This patch resolves the issue of running with numa=fake=X on kernel command
line on x86_64 machines that have big IO hole.  While calculating the size
of each node now we look at the total hole size in that range.

Previously there were nodes that only had IO holes in them causing kernel
boot problems.  We now use the NODE_MIN_SIZE (64MB) as the minimum size of
memory that any node must have.  We reduce the number of allocated nodes if
the number of nodes specified on kernel command line results in any node
getting memory smaller than NODE_MIN_SIZE.

This change allows the extra memory to be incremented in NODE_MIN_SIZE
granule and uniformly distribute among as many nodes (called big nodes) as
possible.

[akpm@osdl.org: build fix]
Signed-off-by: NDavid Rientjes <reintjes@google.com>
Signed-off-by: NPaul Menage <menage@google.com>
Signed-off-by: NRohit Seth <rohitseth@google.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

53fee04f

[PATCH] x86-64: x86_64-make-the-numa-hash-function-nodemap-allocation fix fix · 54413927

由 Amul Shah 提交于 2月 13, 2007

- Removed an extraneous debug message from allocate_cachealigned_map

- Changed extract_lsb_from_nodes to return 63 for the case where there was
  only one memory node.  The prevents the creation of the dynamic hashmap.

- Changed extract_lsb_from_nodes to use only the starting memory address of
  a node.  On an ES7000, our nodes overlap the starting and ending address,
  meaning, that we see nodes like

	00000 - 10000
	10000 - 20000

  But other systems have nodes whose start and end addresses do not overlap.
   For example:

	00000 - 0FFFF
	10000 - 1FFFF

  In this case, using the ending address will result in an LSB much lower
  than what is possible.  In this case an LSB of 1 when in reality it should
  be 16.

Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NAndi Kleen <ak@suse.de>

54413927

[PATCH] x86-64: Allocate the NUMA hash function nodemap dynamically · 076422d2

由 Amul Shah 提交于 2月 13, 2007

Remove the statically allocated memory to NUMA node hash map in favor of a
dynamically allocated memory to node hash map (it is cache aligned).

This patch has the nice side effect in that it allows the hash map to grow
for systems with large amounts of memory (256GB - 1TB), but suffer from
having small PCI space tacked onto the boot node (which is somewhere
between 192MB to 512MB on the ES7000).
Signed-off-by: NAmul Shah <amul.shah@unisys.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Cc: Andi Kleen <ak@suse.de>
Cc: Rohit Seth <rohitseth@google.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>

076422d2

12 10月, 2006 1 次提交

[PATCH] mm: use symbolic names instead of indices for zone initialisation · 6391af17

由 Mel Gorman 提交于 10月 11, 2006

Arch-independent zone-sizing is using indices instead of symbolic names to
offset within an array related to zones (max_zone_pfns). The unintended
impact is that ZONE_DMA and ZONE_NORMAL is initialised on powerpc instead
of ZONE_DMA and ZONE_HIGHMEM when CONFIG_HIGHMEM is set. As a result, the
the machine fails to boot but will boot with CONFIG_HIGHMEM turned off.

The following patch properly initialises the max_zone_pfns[] array and uses
symbolic names instead of indices in each architecture using
arch-independent zone-sizing. Two users have successfully booted their
powerpcs with it (one an ibook G4). It has also been boot tested on x86,
x86_64, ppc64 and ia64. Please merge for 2.6.19-rc2.

Credit to Benjamin Herrenschmidt for identifying the bug and rolling the
first fix. Additional credit to Johannes Berg and Andreas Schwab for
reporting the problem and testing on powerpc.
Signed-off-by: NMel Gorman <mel@csn.ul.ie>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6391af17