- 26 3月, 2006 2 次提交
-
-
由 Andi Kleen 提交于
This fixes problems with very large nodes (over 128GB) filling up all of the first 4GB with their mem_map and not leaving enough space for the swiotlb. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
It conflicts with the struct node in node.h Actually the x86-64 version was there first, but .. Suggested by Jan Beulich Cc: jbeulich@novell.com Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 16 2月, 2006 1 次提交
-
-
由 Daniel Yeisley 提交于
The early initialization of cpu_to_node code as it is now only updates the cpu_to_node array, and does not update cpu_pda()->nodemember. This will cause numa_node_id() to return 0 on systems where CPU 0 is not on Node 0. This leads to a kernel panic in slab.c. I've tested the patch below on a 16 processor x86_64 ES7000-600 server, and no longer see the panic I saw with the original 2.6.16-rc3. Signed-off-by: NDan Yeisley <dan.yeisley@unisys.com> Acked-by: NAndi Kleen <ak@muc.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 12 1月, 2006 5 次提交
-
-
由 Andi Kleen 提交于
Saves about ~18K .text in defconfig There would be more optimization potential, but that's for later. Suggestion originally from Bill Irwin. Fix from Andy Whitcroft. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ravikiran G Thirumalai 提交于
Helper patch to change cpu_pda users to use macros to access cpu_pda instead of the cpu_pda[] array. Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NShai Fultheim <shai@scalex86.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ravikiran Thirumalai 提交于
Patch enables early intialization of cpu_to_node. apicid_to_node is built by reading the SRAT table, from acpi_numa_init with ACPI_NUMA and k8_scan_nodes with K8_NUMA. x86_cpu_to_apicid is built by parsing the ACPI MADT table, from acpi_boot_init. We combine these two tables and setup cpu_to_node. Early intialization helps the static per_cpu_areas in getting pages from correct node. Change since last release: Do not initialize early init_cpu_to_node for faking node cases. Patch tested on TYAN dual core 4P board with K8 only, ACPI_NUMA. Tested on EM64T NUMA. Also tested with numa=off, numa=fake, and running a kernel compiled with NUMA on a regular EM64 2 way SMP. Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com> Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NShai Fultheim <shai@scalex86.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
No functional changes Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 12月, 2005 1 次提交
-
-
由 Eric Dumazet 提交于
As reported by Keith Mannthey, there are problems in populate_memnodemap() The bug was that the compute_hash_shift() was returning 31, with incorrect initialization of memnodemap[] To correct the bug, we must use (1UL << shift) instead of (1 << shift) to avoid an integer overflow, and we must check that shift < 64 to avoid an infinite loop. Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 11月, 2005 5 次提交
-
-
由 Bob Picco 提交于
Fix up booting with sparse mem enabled. Otherwise it would just cause an early PANIC at boot. Signed-off-by: NBob Picco <bob.picco@hp.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Magnus Damm 提交于
The current x86_64 NUMA memory code is inconsequent when it comes to node memory ranges. The exact behaviour varies depending on which config option that is used. setup_node_bootmem() has start and end as arguments and these are used to calculate the size of the node like this: (end - start). This is all fine if end is pointing to the first non-available byte. The problem is that the current x86_64 code sometimes treats it as the last present byte and sometimes as the first non-available byte. The result is that some configurations might lose a page at the end of the range. This patch tries to fix CONFIG_ACPI_NUMA, CONFIG_K8_NUMA and CONFIG_NUMA_EMU so they all treat the end variable as the first non-available byte. This is the same way as the single node code. The patch is boot tested on dual x86_64 hardware with the above configurations, but maybe the removed code is needed as some workaround? Signed-off-by: NMagnus Damm <magnus@valinux.co.jp> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Eric Dumazet 提交于
Compute the highest possible value for memnode_shift, in order to reduce footprint of memnodemap[] to the minimum, thus making all users (phys_to_nid(), kfree()), more cache friendly. Before the patch : Node 0 MemBase 0000000000000000 Limit 00000001ffffffff Node 1 MemBase 0000000200000000 Limit 00000003ffffffff Using 23 for the hash shift. Max adder is 3ffffffff After the patch : Node 0 MemBase 0000000000000000 Limit 00000001ffffffff Node 1 MemBase 0000000200000000 Limit 00000003ffffffff Using 33 for the hash shift. In this case, only 2 bytes of memnodemap[] are used, instead of 2048 Signed-off-by: NEric Dumazet <dada1@cosmosbay.com> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Not go from the CPU number to an mapping array. Mode number is often used now in fast paths. This also adds a generic numa_node_id to all the topology includes Suggested by Eric Dumazet Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones. As a bit of historical background: when the x86-64 port was originally designed we had some discussion if we should use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or both. Both was ruled out at this point because it was in early 2.4 when VM is still quite shakey and had bad troubles even dealing with one DMA zone. We settled on the 16MB DMA zone mainly because we worried about older soundcards and the floppy. But this has always caused problems since then because device drivers had trouble getting enough DMA able memory. These days the VM works much better and the wide use of NUMA has proven it can deal with many zones successfully. So this patch adds both zones. This helps drivers who need a lot of memory below 4GB because their hardware is not accessing more (graphic drivers - proprietary and free ones, video frame buffer drivers, sound drivers etc.). Previously they could only use IOMMU+16MB GFP_DMA, which was not enough memory. Another common problem is that hardware who has full memory addressing for >4GB misses it for some control structures in memory (like transmit rings or other metadata). They tended to allocate memory in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent, but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory (even on AMD systems the IOMMU tends to be quite small) especially if you have many devices. With the new zone pci_alloc_consistent can just put this stuff into memory below 4GB which works better. One argument was still if the zone should be 4GB or 2GB. The main motivation for 2GB would be an unnamed not so unpopular hardware raid controller (mostly found in older machines from a particular four letter company) who has a strange 2GB restriction in firmware. But that one works ok with swiotlb/IOMMU anyways, so it doesn't really need GFP_DMA32. I chose 4GB to be compatible with IA64 and because it seems to be the most common restriction. The new zone is so far added only for x86-64. For other architectures who don't set up this new zone nothing changes. Architectures can set a compatibility define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32 as GFP_DMA. Otherwise it's a nop because on 32bit architectures it's normally not needed because GFP_NORMAL (=0) is DMA able enough. One problem is still that GFP_DMA means different things on different architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA (trusting it to be 4GB) #elif __x86_64__ (use other hacks like the swiotlb because 16MB is not enough) ... . This was quite ugly and is now obsolete. These should be now converted to use GFP_DMA32 unconditionally. I haven't done this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent which will use GFP_DMA32 transparently. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 01 10月, 2005 2 次提交
-
-
由 Ravikiran G Thirumalai 提交于
The tests Alok carried out on Petr's box confirmed that cpu_to_node[BP] is not setup early enough by numa_init_array due to the x86_64 changes in 2.6.14-rc*, and unfortunately set wrongly by the work around code in numa_init_array(). cpu_to_node[0] gets set with 1 early and later gets set properly to 0 during identify_cpu() when all cpus are brought up, but confusing the numa slab in the process. Here is a quick fix for this. The right fix obviously is to have cpu_to_node[bsp] setup early for numa_init_array(). The following patch will fix the problem now, and the code can stay on even when cpu_to_node{BP] gets fixed early correctly. Thanks to Petr for access to his box. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Ravikiran G Thirumalai 提交于
Fix the BP node_to_cpumask. 2.6.14-rc* broke the boot cpu bit as the cpu_to_node(0) is now not setup early enough for numa_init_array. cpu_to_node[] is setup much later at srat_detect_node on acpi srat based em64t machines. This seems like a problem on amd machines too, Tested on em64t though. /sys/devices/system/node/node0/cpumap shows up sanely after this patch. Signed off by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NShai Fultheim <shai@scalex86.org> Cc: Andi Kleen <ak@muc.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 13 9月, 2005 2 次提交
-
-
由 Andi Kleen 提交于
In particular on systems where the local APIC space and node space is very different from the Linux CPU number space. Previously the older NUMA setup code directly parsing the K8 northbridge registers had some issues on 8 socket or dual core systems. This patch fixes them. This is mainly done by fixing some confusion between Linux CPU numbers and local APIC ids. We now pass the local APIC IDs to later code, which avoids mismatches. Also add some heuristics to detect cases where the Hypertransport nodeids and the local APIC IDs don't match, but are shifted by a constant offset. This is still all quite hackish, hopefully BIOS writers fill in correct SRATs instead. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
由 Andi Kleen 提交于
Do that later when the CPU boots. SRAT just stores the APIC<->Node mapping node. This fixes problems on systems where the order of SRAT entries does not match the MADT. Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 08 9月, 2005 1 次提交
-
-
由 Ravikiran G Thirumalai 提交于
Mark variables which are usually accessed for reads with __readmostly. Signed-off-by: NAlok N Kataria <alokk@calsoftinc.com> Signed-off-by: NShai Fultheim <shai@scalex86.org> Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 27 8月, 2005 1 次提交
-
-
由 Andi Kleen 提交于
Some nodes can have large holes on x86-64. This fixes problems with the VM allowing too many dirty pages because it overestimates the number of available RAM in a node. In extreme cases you can end up with all RAM filled with dirty pages which can lead to deadlocks and other nasty behaviour. This patch just tells the VM about the known holes from e820. Reserved (like the kernel text or mem_map) is still not taken into account, but that should be only a few percent error now. Small detail is that the flat setup uses the NUMA free_area_init_node() now too because it offers more flexibility. (akpm: lotsa thanks to Martin for working this problem out) Cc: Martin Bligh <mbligh@mbligh.org> Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 29 7月, 2005 1 次提交
-
-
由 Keith Mannthey 提交于
Signed-off-by: NAndi Kleen <ak@suse.de> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 6月, 2005 1 次提交
-
-
由 Ashok Raj 提交于
This patch adds __cpuinit and __cpuinitdata sections that need to exist past boot to support cpu hotplug. Caveat: This is done *only* for EM64T CPU Hotplug support, on request from Andi Kleen. Much of the generic hotplug code in kernel, and none of the other archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont mark sections with __cpuinit, but only mark them as __devinit, and __devinitdata. If someone is motivated to change generic code, we need to make sure all existing hotplug code does not break, on other arch's that dont use __cpuinit, and __cpudevinit. Signed-off-by: NAshok Raj <ashok.raj@intel.com> Acked-by: NAndi Kleen <ak@muc.de> Acked-by: NZwane Mwaikambo <zwane@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 24 6月, 2005 1 次提交
-
-
由 Matt Tolentino 提交于
This patch adds in the necessary support for sparsemem such that x86-64 kernels may use sparsemem as an alternative to discontigmem for NUMA kernels. Note that this does no preclude one from continuing to build NUMA kernels using discontigmem, but merely allows the option to build NUMA kernels with sparsemem. Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels results in reduced text size for otherwise equivalent kernels as shown in the example builds below: text data bss dec hex filename 2371036 765884 1237108 4374028 42be0c vmlinux.discontig 2366549 776484 1302772 4445805 43d66d vmlinux.sparse Signed-off-by: NMatt Tolentino <matthew.e.tolentino@intel.com> Signed-off-by: NDave Hansen <haveblue@us.ibm.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 17 4月, 2005 1 次提交
-
-
由 Linus Torvalds 提交于
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!
-