提交 · 25f73891c3059e9ce6ff0a02670aa98baf6cbce9 · OpenHarmony / kernel_linux

23 4月, 2006 1 次提交

[PATCH] x86_64: sparsemem does not need node_mem_map · 3b5fd59f

由 Andy Whitcroft 提交于 4月 22, 2006

Seems we are trying to init the node_mem_map when we don't need to, for
example when SPARSEMEM is enabled.  This causes the error below during
compilation.  Use CONFIG_FLAT_NODE_MEM_MAP to gate allocation and init.

  arch/x86_64/mm/numa.c: In function `setup_node_zones':
  arch/x86_64/mm/numa.c:191: error: structure has no member
                                                  named `node_mem_map'
Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
Acked-by: NAndi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3b5fd59f

10 4月, 2006 2 次提交

[PATCH] x86_64: Handle empty PXMs that only contain hotplug memory · a8062231

由 Andi Kleen 提交于 4月 07, 2006

The node setup code would try to allocate the node metadata in the node
itself, but that fails if there is no memory in there.

This can happen with memory hotplug when the hotplug area defines an so
far empty node.

Now use bootmem to try to allocate the mem_map in other nodes.

And if it fails don't panic, but just ignore the node.

To make this work I added a new __alloc_bootmem_nopanic function that
does what its name implies.

TBD should try to use nearby nodes here.  Currently we just use any.
It's hard to do it better because bootmem doesn't have proper fallback
lists yet.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a8062231

[PATCH] x86_64: Reserve SRAT hotadd memory on x86-64 · 68a3a7fe

由 Andi Kleen 提交于 4月 07, 2006

From: Keith Mannthey, Andi Kleen

Implement memory hotadd without sparsemem. The memory in the SRAT
hotadd area is just preserved instead and can be activated later.

There are a few restrictions:
- Only one continuous hotadd area allowed per node

The main problem is dealing with the many buggy SRAT tables
that are out there. The strategy here is to reject anything
suspicious.

Originally from Keith Mannthey, with several hacks and changes by AK
and also contributions from Andrew Morton

[ TBD: Problems pointed out by KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>:

 1) Goto's rebuild_zonelist patch will not work if CONFIG_MEMORY_HOTPLUG=n.

    Rebuilding zonelist is necessary when the system has just memory <
    4G at boot, and hot add memory > 4G.  because x86_64 has DMA32,
    ZONE_NORAML is not included into zonelist at boot time if system
    doesn't have memory >4G at boot.

    [AK: should just force the higher zones at boot time when SRAT tells us]

 2) zone and node's spanned_pages and present_pages are not incremented.
    They should be.

    For example, our server (ia64/Fujitsu PrimeQuest) can equip memory
    from 4G to 1T(maybe 2T in future), and SRAT will *always* say we have
    possible 1T +memory.  (Microsoft requires "write all possible memory
    in SRAT") When we reserve memmap for possible 1T memory, Linux will
    not work well in +minimum 4G configuraion ;)

    [AK: needs limiting to 5-10% of max memory]
 ]
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

68a3a7fe

28 3月, 2006 1 次提交

[PATCH] unify pfn_to_page: x86_64 pfn_to_page · dc8ecb43

由 KAMEZAWA Hiroyuki 提交于 3月 27, 2006

x86_64 can use generic funcs.
For DISCONTIGMEM, CONFIG_OUT_OF_LINE_PFN_TO_PAGE is selected.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dc8ecb43

26 3月, 2006 3 次提交

[PATCH] x86_64: group memnodemap and memnodeshift in a memnode structure · dcf36bfa

由 Eric Dumazet 提交于 3月 25, 2006

pfn_to_page() and others need to access both memnode_shift and the very
first bytes of memnodemap[]. If we force memnode_shift to be just before the
memnodemap array, we can reduce the memory footprint to one cache line
instead of two for most setups. This patch introduce a 'memnode' structure
where shift and map[] are carefully placed.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

dcf36bfa

[PATCH] x86_64: Try to allocate node memmap near the end of node · 267b4801

由 Andi Kleen 提交于 3月 25, 2006

This fixes problems with very large nodes (over 128GB) filling up all of
the first 4GB with their mem_map and not leaving enough space for the
swiotlb.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

267b4801

[PATCH] x86_64: Rename struct node in x86-64 NUMA code to struct bootnode · abe059e7

由 Andi Kleen 提交于 3月 25, 2006

It conflicts with the struct node in node.h
Actually the x86-64 version was there first, but ..

Suggested by Jan Beulich

Cc: jbeulich@novell.com
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

abe059e7

16 2月, 2006 1 次提交

[PATCH] x86_64: early initialization of cpu_to_node · d1db4ec8

由 Daniel Yeisley 提交于 2月 15, 2006

The early initialization of cpu_to_node code as it is now only updates the
cpu_to_node array, and does not update cpu_pda()->nodemember.  This will
cause numa_node_id() to return 0 on systems where CPU 0 is not on Node 0.
This leads to a kernel panic in slab.c.

I've tested the patch below on a 16 processor x86_64 ES7000-600 server, and
no longer see the panic I saw with the original 2.6.16-rc3.
Signed-off-by: NDan Yeisley <dan.yeisley@unisys.com>
Acked-by: NAndi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d1db4ec8

12 1月, 2006 5 次提交

[PATCH] x86_64: Move NUMA page_to_pfn/pfn_to_page functions out of line · cf050132

由 Andi Kleen 提交于 1月 11, 2006

Saves about ~18K .text in defconfig

There would be more optimization potential, but that's for later.

Suggestion originally from Bill Irwin.
Fix from Andy Whitcroft.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

cf050132

[PATCH] x86_64: Node local pda take 2 -- cpu_pda preparation · df79efde

由 Ravikiran G Thirumalai 提交于 1月 11, 2006

Helper patch to change cpu_pda users to use macros to access cpu_pda
instead of the cpu_pda[] array.
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NShai Fultheim <shai@scalex86.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

df79efde

[PATCH] x86_64: Early initialization of cpu_to_node · 05b3cbd8

由 Ravikiran Thirumalai 提交于 1月 11, 2006

Patch enables early intialization of cpu_to_node.
apicid_to_node is built by reading the SRAT table, from acpi_numa_init with
ACPI_NUMA and k8_scan_nodes with K8_NUMA.
x86_cpu_to_apicid is built by parsing the ACPI MADT table, from acpi_boot_init.
We combine these two tables and setup cpu_to_node.

Early intialization helps the static per_cpu_areas in getting pages from
correct node.

Change since last release:
Do not initialize early init_cpu_to_node for faking node cases.

Patch tested on TYAN dual core 4P board with K8 only, ACPI_NUMA.
Tested on EM64T NUMA. Also tested with numa=off, numa=fake, and  running
a kernel compiled with NUMA on a regular EM64 2 way SMP.
Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NShai Fultheim <shai@scalex86.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

05b3cbd8

A
[PATCH] x86_64: Clean up some printks in NUMA code · 6b050f80
由 Andi Kleen 提交于 1月 11, 2006
```
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
```
6b050f80

[PATCH] x86_64: Fix up coding style in numa.c · d18ff470

由 Andi Kleen 提交于 1月 11, 2006

No functional changes
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d18ff470

13 12月, 2005 1 次提交

[PATCH] x86_64: Bug correction in populate_memnodemap() · 8309cf66

由 Eric Dumazet 提交于 12月 12, 2005

As reported by Keith Mannthey, there are problems in populate_memnodemap()

The bug was that the compute_hash_shift() was returning 31, with incorrect
initialization of memnodemap[]

To correct the bug, we must use (1UL << shift) instead of (1 << shift) to
avoid an integer overflow, and we must check that shift < 64 to avoid an
infinite loop.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

8309cf66

15 11月, 2005 5 次提交

[PATCH] x86_64: Fix sparse mem · d3ee871e

由 Bob Picco 提交于 11月 05, 2005

Fix up booting with sparse mem enabled. Otherwise it would just
cause an early PANIC at boot.
Signed-off-by: NBob Picco <bob.picco@hp.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

d3ee871e

[PATCH] x86_64: Make node boundaries consistent · ffd10a2b

由 Magnus Damm 提交于 11月 05, 2005

The current x86_64 NUMA memory code is inconsequent when it comes to node
memory ranges. The exact behaviour varies depending on which config option
that is used.

setup_node_bootmem() has start and end as arguments and these are used to
calculate the size of the node like this: (end - start). This is all fine
if end is pointing to the first non-available byte. The problem is that the
current x86_64 code sometimes treats it as the last present byte and sometimes
as the first non-available byte. The result is that some configurations might
lose a page at the end of the range.

This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU
so they all treat the end variable as the first non-available byte. This is
the same way as the single node code.

The patch is boot tested on dual x86_64 hardware with the above configurations,
but maybe the removed code is needed as some workaround?
Signed-off-by: NMagnus Damm <magnus@valinux.co.jp>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

ffd10a2b

[PATCH] x86_64: Optimize NUMA node hash function · 529a3404

由 Eric Dumazet 提交于 11月 05, 2005

Compute the highest possible value for memnode_shift, in order to reduce
footprint of memnodemap[] to the minimum, thus making all users
(phys_to_nid(), kfree()), more cache friendly.

Before the patch :

 Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
 Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
 Using 23 for the hash shift. Max adder is 3ffffffff

After the patch :

 Node 0 MemBase 0000000000000000 Limit 00000001ffffffff
 Node 1 MemBase 0000000200000000 Limit 00000003ffffffff
 Using 33 for the hash shift.

In this case, only 2 bytes of memnodemap[] are used, instead of 2048
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

529a3404

[PATCH] x86_64: Speed up numa_node_id by putting it directly into the PDA · 69d81fcd

由 Andi Kleen 提交于 11月 05, 2005

Not go from the CPU number to an mapping array.
Mode number is often used now in fast paths.

This also adds a generic numa_node_id to all the topology includes

Suggested by Eric Dumazet
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

69d81fcd

[PATCH] x86_64: Add 4GB DMA32 zone · a2f1b424

由 Andi Kleen 提交于 11月 05, 2005

Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

As a bit of historical background: when the x86-64 port
was originally designed we had some discussion if we should
use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
both. Both was ruled out at this point because it was in early
2.4 when VM is still quite shakey and had bad troubles even
dealing with one DMA zone.  We settled on the 16MB DMA zone mainly
because we worried about older soundcards and the floppy.

But this has always caused problems since then because
device drivers had trouble getting enough DMA able memory. These days
the VM works much better and the wide use of NUMA has proven
it can deal with many zones successfully.

So this patch adds both zones.

This helps drivers who need a lot of memory below 4GB because
their hardware is not accessing more (graphic drivers - proprietary
and free ones, video frame buffer drivers, sound drivers etc.).
Previously they could only use IOMMU+16MB GFP_DMA, which
was not enough memory.

Another common problem is that hardware who has full memory
addressing for >4GB misses it for some control structures in memory
(like transmit rings or other metadata).  They tended to allocate memory
in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
(even on AMD systems the IOMMU tends to be quite small) especially if you have
many devices.  With the new zone pci_alloc_consistent can just put
this stuff into memory below 4GB which works better.

One argument was still if the zone should be 4GB or 2GB. The main
motivation for 2GB would be an unnamed not so unpopular hardware
raid controller (mostly found in older machines from a particular four letter
company) who has a strange 2GB restriction in firmware. But
that one works ok with swiotlb/IOMMU anyways, so it doesn't really
need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
it seems to be the most common restriction.

The new zone is so far added only for x86-64.

For other architectures who don't set up this
new zone nothing changes. Architectures can set a compatibility
define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
as GFP_DMA. Otherwise it's a nop because on 32bit architectures
it's normally not needed because GFP_NORMAL (=0) is DMA able
enough.

One problem is still that GFP_DMA means different things on different
architectures. e.g. some drivers used to have #ifdef ia64  use GFP_DMA
(trusting it to be 4GB) #elif __x86_64__ (use other hacks like
the swiotlb because 16MB is not enough) ... . This was quite
ugly and is now obsolete.

These should be now converted to use GFP_DMA32 unconditionally. I haven't done
this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
which will use GFP_DMA32 transparently.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

a2f1b424

01 10月, 2005 2 次提交

[PATCH] x86_64 early numa init fix · 85cc5135

由 Ravikiran G Thirumalai 提交于 9月 30, 2005

The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] is
not setup early enough by numa_init_array due to the x86_64 changes in
2.6.14-rc*, and unfortunately set wrongly by the work around code in
numa_init_array().  cpu_to_node[0] gets set with 1 early and later gets set
properly to 0 during identify_cpu() when all cpus are brought up, but
confusing the numa slab in the process.

Here is a quick fix for this.  The right fix obviously is to have
cpu_to_node[bsp] setup early for numa_init_array().  The following patch
will fix the problem now, and the code can stay on even when
cpu_to_node{BP] gets fixed early correctly.

Thanks to Petr for access to his box.

Signed off by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

85cc5135

[PATCH] x86_64: fix the BP node_to_cpumask · e6a045a5

由 Ravikiran G Thirumalai 提交于 9月 30, 2005

Fix the BP node_to_cpumask.  2.6.14-rc* broke the boot cpu bit as the
cpu_to_node(0) is now not setup early enough for numa_init_array.
cpu_to_node[] is setup much later at srat_detect_node on acpi srat based
em64t machines.  This seems like a problem on amd machines too, Tested on
em64t though.  /sys/devices/system/node/node0/cpumap shows up sanely after
this patch.

Signed off by: Ravikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NShai Fultheim <shai@scalex86.org>
Cc: Andi Kleen <ak@muc.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e6a045a5

13 9月, 2005 2 次提交

[PATCH] x86-64: Support dualcore and 8 socket systems in k8 fallback node parsing · 3f098c26

由 Andi Kleen 提交于 9月 12, 2005

In particular on systems where the local APIC space and node space
is very different from the Linux CPU number space.

Previously the older NUMA setup code directly parsing the K8
northbridge registers had some issues on 8 socket or dual core
systems. This patch fixes them.

This is mainly done by fixing some confusion between Linux
CPU numbers and local APIC ids. We now pass the local APIC IDs
to later code, which avoids mismatches.

Also add some heuristics to detect cases where the Hypertransport
nodeids and the local APIC IDs don't match, but are shifted
by a constant offset.

This is still all quite hackish, hopefully BIOS writers fill
in correct SRATs instead.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

3f098c26

[PATCH] x86-64: Don't assign CPU numbers in SRAT parsing · 0b07e984

由 Andi Kleen 提交于 9月 12, 2005

Do that later when the CPU boots. SRAT just stores the APIC<->Node
mapping node. This fixes problems on systems where the order
of SRAT entries does not match the MADT.
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

0b07e984

08 9月, 2005 1 次提交

[PATCH] Additions to .data.read_mostly section · 6c231b7b

由 Ravikiran G Thirumalai 提交于 9月 06, 2005

Mark variables which are usually accessed for reads with __readmostly.
Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com>
Signed-off-by: NShai Fultheim <shai@scalex86.org>
Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

6c231b7b

27 8月, 2005 1 次提交

[PATCH] x86_64: Tell VM about holes in nodes · 485761bd

由 Andi Kleen 提交于 8月 26, 2005

Some nodes can have large holes on x86-64.

This fixes problems with the VM allowing too many dirty pages because it
overestimates the number of available RAM in a node.  In extreme cases you
can end up with all RAM filled with dirty pages which can lead to deadlocks
and other nasty behaviour.

This patch just tells the VM about the known holes from e820.  Reserved
(like the kernel text or mem_map) is still not taken into account, but that
should be only a few percent error now.

Small detail is that the flat setup uses the NUMA free_area_init_node() now
too because it offers more flexibility.

(akpm: lotsa thanks to Martin for working this problem out)

Cc: Martin Bligh <mbligh@mbligh.org>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

485761bd

29 7月, 2005 1 次提交

[PATCH] x86_64: Fix overflow in NUMA hash function setup · b684664f

由 Keith Mannthey 提交于 7月 28, 2005

Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

b684664f

26 6月, 2005 1 次提交

[PATCH] x86_64: Change init sections for CPU hotplug support · e6982c67

由 Ashok Raj 提交于 6月 25, 2005

This patch adds __cpuinit and __cpuinitdata sections that need to exist past
boot to support cpu hotplug.

Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
Andi Kleen.  Much of the generic hotplug code in kernel, and none of the other
archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
mark sections with __cpuinit, but only mark them as __devinit, and
__devinitdata.

If someone is motivated to change generic code, we need to make sure all
existing hotplug code does not break, on other arch's that dont use __cpuinit,
and __cpudevinit.
Signed-off-by: NAshok Raj <ashok.raj@intel.com>
Acked-by: NAndi Kleen <ak@muc.de>
Acked-by: NZwane Mwaikambo <zwane@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

e6982c67

24 6月, 2005 1 次提交

[PATCH] add x86-64 specific support for sparsemem · bbfceef4

由 Matt Tolentino 提交于 6月 23, 2005

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels.  Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

   text	   data	    bss	    dec	    hex	filename
2371036	 765884	1237108	4374028	 42be0c	vmlinux.discontig
2366549	 776484	1302772	4445805	 43d66d	vmlinux.sparse
Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

bbfceef4

17 4月, 2005 1 次提交

Linux-2.6.12-rc2 · 1da177e4

由 Linus Torvalds 提交于 4月 16, 2005

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

1da177e4

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年