- 13 5月, 2011 1 次提交
-
-
由 Stefano Stabellini 提交于
Introduce a new x86_init hook called pagetable_reserve that at the end of init_memory_mapping is used to reserve a range of memory addresses for the kernel pagetable pages we used and free the other ones. On native it just calls memblock_x86_reserve_range while on xen it also takes care of setting the spare memory previously allocated for kernel pagetable pages from RO to RW, so that it can be used for other purposes. A detailed explanation of the reason why this hook is needed follows. As a consequence of the commit: commit 4b239f45 Author: Yinghai Lu <yinghai@kernel.org> Date: Fri Dec 17 16:58:28 2010 -0800 x86-64, mm: Put early page table high at some point init_memory_mapping is going to reach the pagetable pages area and map those pages too (mapping them as normal memory that falls in the range of addresses passed to init_memory_mapping as argument). Some of those pages are already pagetable pages (they are in the range pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and everything is fine. Some of these pages are not pagetable pages yet (they fall in the range pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they are going to be mapped RW. When these pages become pagetable pages and are hooked into the pagetable, xen will find that the guest has already a RW mapping of them somewhere and fail the operation. The reason Xen requires pagetables to be RO is that the hypervisor needs to verify that the pagetables are valid before using them. The validation operations are called "pinning" (more details in arch/x86/xen/mmu.c). In order to fix the issue we mark all the pages in the entire range pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation is completed only the range pgt_buf_start-pgt_buf_end is reserved by init_memory_mapping. Hence the kernel is going to crash as soon as one of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those ranges are RO). For this reason we need a hook to reserve the kernel pagetable pages we used and free the other ones so that they can be reused for other purposes. On native it just means calling memblock_x86_reserve_range, on Xen it also means marking RW the pagetable pages that we allocated before but that haven't been used before. Another way to fix this is without using the hook is by adding a 'if (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen counterpart, but that is just nasty. Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NH. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- 02 5月, 2011 1 次提交
-
-
由 Yinghai Lu 提交于
numa_cleanup_meminfo() trims each memblk between low (0) and high (max_pfn) limits and discards empty ones. However, the emptiness detection incorrectly used equality test. If the start of a memblk is higher than max_pfn, it is empty but fails the equality test and doesn't get discarded. The condition triggers when max_pfn is lower than start of a NUMA node and results in memory misconfiguration - leading to WARN_ON()s and other funnies. The bug was discovered in devel branch where 32bit too uses this code path for NUMA init. If a node is above the addressing limit, max_pfn ends up lower than the node triggering this problem. The failure hasn't been observed on x86-64 but is still possible with broken hardware e820/NUMA info. As the fix is very low risk, it would be better to apply it even for 64bit. Fix it by using >= instead of ==. Signed-off-by: NYinghai Lu <yinghai@kernel.org> [ Extracted the actual fix from the original patch and rewrote patch description. ] Signed-off-by: NTejun Heo <tj@kernel.org> Link: http://lkml.kernel.org/r/20110501171204.GO29280@htj.dyndns.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 21 4月, 2011 1 次提交
-
-
由 David Rientjes 提交于
The cpu<->node mappings under CONFIG_DEBUG_PER_CPU_MAPS=y when NUMA emulation is enabled is currently broken because it does not iterate through every emulated node and bind cpus that have affinity to it. NUMA emulation should bind each cpu to every local node to accurately represent the true NUMA topology of the underlying machine. debug_cpumask_set_cpu() needs to be fixed at the same time so that the debugging information that it emits shows the new cpumask of the node being assigned when the cpu is being added or removed. It can now take responsibility of setting or clearing the cpu itself to remove the need for duplicate code. Also change its last parameter, "enable", to have the correct bool type since it can only be true or false. -v2: Fix the return statements, by Kosaki Motohiro Acked-and-Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Andreas Herrmann <herrmann.der.user@googlemail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1104201918470.12634@chino.kir.corp.google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 4月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
Commit d8fc3afc (x86, NUMA: Move *_numa_init() invocations into initmem_init()) moved acpi_numa_init() call into NUMA initmem_init() but forgot to update 32bit NUMA init breaking ACPI NUMA configuration for 32bit. acpi_numa_init() call was later moved again to srat_64.c. Match it by adding the call to get_memcfg_from_srat() in srat_32.c. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: H. Peter Anvin <hpa@linux.intel.com> LKML-Reference: <20110404100645.GE1420@mtj.dyndns.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 24 3月, 2011 3 次提交
-
-
由 Stephen Wilson 提交于
Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Stephen Wilson 提交于
Morally, the question of whether an address lies in a gate vma should be asked with respect to an mm, not a particular task. Moreover, dropping the dependency on task_struct will help make existing and future operations on mm's more flexible and convenient. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Stephen Wilson 提交于
Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: NStephen Wilson <wilsons@start.ca> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 20 3月, 2011 1 次提交
-
-
由 Yinghai Lu 提交于
Now cleanup_highmap actually is in two steps: one is early in head64.c and only clears above _end; a second one is in init_memory_mapping() and tries to clean from _brk_end to _end. It should check if those boundaries are PMD_SIZE aligned but currently does not. Also init_memory_mapping() is called several times for numa or memory hotplug, so we really should not handle initial kernel mappings there. This patch moves cleanup_highmap() down after _brk_end is settled so we can do everything in one step. Also we honor max_pfn_mapped in the implementation of cleanup_highmap. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com> LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop> Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
-
- 18 3月, 2011 2 次提交
-
-
由 Shaohua Li 提交于
According to intel CPU manual, every time PGD entry is changed in i386 PAE mode, we need do a full TLB flush. Current code follows this and there is comment for this too in the code. But current code misses the multi-threaded case. A changed page table might be used by several CPUs, every such CPU should flush TLB. Usually this isn't a problem, because we prepopulate all PGD entries at process fork. But when the process does munmap and follows new mmap, this issue will be triggered. When it happens, some CPUs keep doing page faults: http://marc.info/?l=linux-kernel&m=129915020508238&w=2 Reported-by: Yasunori Goto<y-goto@jp.fujitsu.com> Tested-by: Yasunori Goto<y-goto@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: Shaohua Li<shaohua.li@intel.com> Cc: Mallick Asit K <asit.k.mallick@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm <linux-mm@kvack.org> Cc: stable <stable@kernel.org> LKML-Reference: <1300246649.2337.95.camel@sli10-conroe> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Lucas De Marchi 提交于
They were generated by 'codespell' and then manually reviewed. Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 3月, 2011 1 次提交
-
-
由 Xiao Guangrong 提交于
native_flush_tlb_others() is called from: flush_tlb_current_task() flush_tlb_mm() flush_tlb_page() All these functions disable preemption explicitly, so we can use smp_processor_id() instead of get_cpu() and put_cpu(). Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Cliff Wickman <cpw@sgi.com> LKML-Reference: <4D7EC791.4040003@cn.fujitsu.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 12 3月, 2011 1 次提交
-
-
由 Tejun Heo 提交于
The distance transforming in numa_emulation() used to call numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node combinations regardless of which are enabled. As numa_set_distance() ignores all out-of-bound distance settings, this doesn't cause any problem other than looping unnecessarily many times during boot. However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the code such that it iterates through only the enabled combinations. Yinghai Lu identified the issue and provided an initial patch to address the issue; however, the patch was incorrect in that it didn't build emulated distance table when there's no physical distance table and unnecessarily complex. http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org>
-
- 10 3月, 2011 2 次提交
-
-
由 Andrea Arcangeli 提交于
It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NRik van Riel <riel@redhat.com> Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201102162345.p1GNjMjm021738@imap1.linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Andrey Vagin 提交于
mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: NAndrey Vagin <avagin@openvz.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: <stable@kernel.org> LKML-Reference: <201103092322.p29NMNPH001682@imap1.linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 3月, 2011 5 次提交
-
-
由 Tejun Heo 提交于
Undetermined entries in emu_nid_to_phys[] are filled with zero assuming that physical node 0 is always online; however, this might not be true depending on hardware configuration. Find a physical node which is actually online and use it instead. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NDavid Rientjes <rientjes@google.com> LKML-Reference: <alpine.DEB.2.00.1103020628210.31626@chino.kir.corp.google.com>
-
由 Yinghai Lu 提交于
This crash happens on a system that does not have RAM on node0. When numa_emulation is compiled in, and: 1. we boot the system without numa=fake... 2. or we boot the system with numa=fake=128 to make emulation fail we will get: [ 0.076025] ------------[ cut here ]------------ [ 0.080004] kernel BUG at arch/x86/mm/numa_64.c:788! [ 0.080004] invalid opcode: 0000 [#1] SMP [...] need to use early_cpu_to_node() directly, because cpu_to_apicid and apicid_to_node will return node0 that is not onlined. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NTejun Heo <tj@kernel.org> Cc: David Rientjes <rientjes@google.com> LKML-Reference: <4D6ECF72.5010308@kernel.org> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 David Rientjes 提交于
This patch cleans initmem_init() so that it is more readable and doesn't use an unnecessary array of function pointers to convolute the flow of the code. It also makes it obvious that dummy_numa_init() will always succeed (and documents that requirement) so that the existing BUG() is never actually reached. No functional change. -tj: Updated comment for dummy_numa_init() slightly. Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Yinghai Lu 提交于
On one system that does not have RAM on node0. When numa_emulation is compiled in, and 1. boot system without numa=fake... 2. or boot system with numa=fake=128 to make emulation fail will get: [ 0.092026] ------------[ cut here ]------------ [ 0.096005] kernel BUG at arch/x86/mm/numa_emulation.c:439! [ 0.096005] invalid opcode: 0000 [#1] SMP [ 0.096005] last sysfs file: [ 0.096005] CPU 0 [ 0.096005] Modules linked in: [ 0.096005] [ 0.096005] Pid: 0, comm: swapper Not tainted 2.6.38-rc6-tip-yh-03869-gcb0491d-dirty #684 Sun Microsystems Sun Fire X4240/Sun Fire X4240 [ 0.096005] RIP: 0010:[<ffffffff81cdc65b>] [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP: 0000:ffffffff82437ed8 EFLAGS: 00010246 ... [ 0.096005] Call Trace: [ 0.096005] [<ffffffff81cd7931>] identify_cpu+0x2d7/0x2df [ 0.096005] [<ffffffff827e54fa>] identify_boot_cpu+0x10/0x30 [ 0.096005] [<ffffffff827e5704>] check_bugs+0x9/0x2d [ 0.096005] [<ffffffff827dceda>] start_kernel+0x3d7/0x3f1 [ 0.096005] [<ffffffff827dc2cc>] x86_64_start_reservations+0x9c/0xa0 [ 0.096005] [<ffffffff827dc4ad>] x86_64_start_kernel+0x1dd/0x1e8 [ 0.096005] Code: 74 06 48 8d 04 90 eb 0f 48 c7 c0 30 d9 00 00 48 03 04 d5 90 0f 60 82 8b 00 83 f8 ff 74 0d 0f a3 05 8b 7e 92 00 19 d2 85 d2 75 02 <0f> 0b 48 98 be 00 01 00 00 48 c7 c7 e0 44 60 82 44 8b 2c 85 e0 [ 0.096005] RIP [<ffffffff81cdc65b>] numa_add_cpu+0x56/0xcf [ 0.096005] RSP <ffffffff82437ed8> [ 0.096026] ---[ end trace a7919e7f17c0a725 ]--- We need to use early_cpu_to_node() directly, because numa_cpu_node() will return node0 that is not onlined. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
This patch reverts NUMA affine page table allocation added by commit 1411e0ec (x86-64, numa: Put pgtable to local node memory). The commit made an undocumented change where the kernel linear mapping strictly follows intersection of e820 memory map and NUMA configuration. If the physical memory configuration has holes or NUMA nodes are not properly aligned, this leads to using unnecessarily smaller mapping size which leads to increased TLB pressure. For details, http://thread.gmane.org/gmane.linux.kernel/1104672 Patches to fix the problem have been proposed but the underlying code needs more cleanup and the approach itself seems a bit heavy handed and it has been determined to revert the feature for now and come back to it in the next developement cycle. http://thread.gmane.org/gmane.linux.kernel/1105959 As init_memory_mapping_high() callsites have been consolidated since the commit, reverting is done manually. Also, the RED-PEN comment in arch/x86/mm/init.c is not restored as the problem no longer exists with memblock based top-down early memory allocation. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de>
-
- 02 3月, 2011 2 次提交
-
-
由 Tejun Heo 提交于
Handling of out-of-bounds distances and allocation failure can use better documentation. Add it. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com>
-
由 Yinghai Lu 提交于
NUMA distance table handling has the following problems. * numa_reset_distance() uses numa_distance * sizeof(numa_distance[0]) as the table size when it should be using the square of numa_distance. * The same size miscalculation when allocation space for phys_dist in numa_emulation(). * In numa_emulation(), phys_dist must be reserved; otherwise, the new emulated distance table may overlap it. Fix them and, while at it, take numa_distance_cnt resetting in numa_reset_distance() out of the if block to simplify the code a bit. David Rientjes reported incorrect handling of distance table during emulation. -tj: Edited out numa_alloc_distance() related changes which weren't necessary and rewrote patch description. -v2: Ingo was unhappy with 80-column limit induced linebreaks. Let lines run over 80-column. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Reported-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: NDavid Rientjes <rientjes@google.com>
-
- 25 2月, 2011 1 次提交
-
-
由 David Rientjes 提交于
numa_distance should be sized like the SLIT, an NxN matrix where N is the highest node id + 1. This patch fixes the calculation to avoid overflowing the array on the subsequent iteration. -tj: The original patch used last index to calculate size. Yinghai pointed out it should be incremented so it is the number of elements instead of the last index to calculate the size of the table. Updated accordingly. Signed-off-by: NDavid Rientjes <rientjes@google.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 24 2月, 2011 1 次提交
-
-
由 Yinghai Lu 提交于
e820_table_{start|end|top}, which are used to buffer page table allocation during early boot, are now derived from memblock and don't have much to do with e820. Change the names so that they reflect what they're used for. This patch doesn't introduce any behavior change. -v2: Ingo found that earlier patch "x86: Use early pre-allocated page table buffer top-down" caused crash on 32bit and needed to be dropped. This patch was updated to reflect the change. -tj: Updated commit description. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 22 2月, 2011 4 次提交
-
-
由 Yinghai Lu 提交于
Alloc code is much bigger the distance setting. Separate it out into numa_alloc_distance() for readability. -v2: Let alloc_numa_distance to return -ENOMEM on failing path, requested by tj. -tj: Description update. Minor tweaks including function name, location and return value check. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>
-
由 Tejun Heo 提交于
Create numa_emulation.c and move all NUMA emulation code there. The definitions of struct numa_memblk and numa_meminfo are moved to numa_64.h. Also, numa_remove_memblk_from(), numa_cleanup_meminfo(), numa_reset_distance() along with numa_emulation() are made global. - v2: Internal declarations moved to numa_internal.h as suggested by Yinghai. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NYinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>
-
由 Tejun Heo 提交于
Update numa_emulation() such that, it - takes @numa_meminfo and @numa_dist_cnt instead of directly referencing the global variables. - copies the distance table by iterating each distance with node_distance() instead of memcpy'ing the distance table. - tests emu_cmdline to determine whether emulation is requested and fills emu_nid_to_phys[] with identity mapping if emulation is not used. This allows the caller to call numa_emulation() unconditionally and makes return value unncessary. - defines dummy version if CONFIG_NUMA_EMU is disabled. This patch doesn't introduce any behavior change. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com>
-
- 21 2月, 2011 1 次提交
-
-
由 Yinghai Lu 提交于
By the time setup_node_bootmem() is called, all the memblocks are already registered. As node_data is allocated from these memblocks, calling it more than once doesn't make any difference. Drop the loop. tj: Dropped comment referencing to the old behavior as suggested by David and rephrased the description. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 17 2月, 2011 12 次提交
-
-
由 Yinghai Lu 提交于
dummy_numa_init() is used only during system boot. Put it in .init like other NUMA init functions. - tj: Description update. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Yinghai Lu 提交于
Do not call __pa(numa_distance) if it was not allocated before. Calling with invalid address triggers VIRTUAL_BUG_ON() in __phys_addr() if CONFIG_DEBUG_VIRTUAL. Also reported by Ingo. http://thread.gmane.org/gmane.linux.kernel/1101306/focus=1101785 - v2: Change to check existing path as tj requested. - tj: Description update. Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NIngo Molnar <mingo@elte.hu>
-
由 Tejun Heo 提交于
NUMA emulation needs to update node distance information. It did it by remapping apicid to PXM mapping, even when amdtopology is being used. There is no reason to go through such convolution. The generic code has all the information necessary to transform the distance table to the emulated nid space. Implement generic distance table transformation in numa_emulation() and drop private implementations in srat_64 and amdtopology_64. This makes find_node_by_addr() and fake_physnodes() and related functions unnecessary, drop them. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
NUMA emulation changes node mappings and thus apicid -> node mapping needs to be updated accordingly. srat_64 and amdtopology_64 did this separately; however, all the necessary information is the mapping from emulated nodes to physical nodes which is available in emu_nid_to_phys[]. Implement common __apicid_to_node[] transformation in numa_emulation() and drop duplicate implementations. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
NUMA emulation built physnodes[] array which could only represent configurations from the physical meminfo and emulated nodes using the information. There's no reason to take this extra level of indirection. Update emulation functions so that they operate directly on numa_meminfo. This simplifies the code and makes emulation layout behave better with interleaved physical nodes. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
Both emulation layout functions - split_nodes[_size]_interleave() - didn't wrap emulated nid while laying out the fake nodes and tried to avoid interating over the specified number of nodes, which is fragile. Now that the emulation code generates numa_meminfo, the node memblks don't need to be consecutive and emulated node IDs can simply wrap. This makes the code more robust and is necessary for updates to better handle the cases where the physical nodes are interleaved. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
NUMA emulation code built nodes[] array and had its own registration path to set up the emulated nodes. Update it such that it generates emulated numa_meminfo and returns control to initmem_init() and shares the same registration path with non-emulated cases. Because {acpi|amd}_fake_nodes() expect nodes[] parameter, fake_physnodes() now generates nodes[] from numa_meminfo. This will go away with further updates. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
NUMA emulation copied physical NUMA configuration into physnodes[] and used it to reverse-map emulated nodes to physical nodes, which is unnecessarily convoluted. Build emu_nid_to_phys[] array to map emulated nids directly to the matching physical nids and use it in numa_add_cpu(). physnodes[] will be removed with further patches. - v2: Build failure when CONFIG_DEBUG_PER_CPU_MAPS due to missing local variable definition fixed. Reported by Ingo. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
* Separate out numa_add_memblk_to() from numa_add_memblk() so that different numa_meminfo can be used. * Rename cmdline to emu_cmdline. * Drop @start/last_pfn from numa_emulation() and use max_pfn directly. This patch doesn't introduce any behavior change. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
Node distance either used direct node comparison, ACPI PXM comparison or ACPI SLIT table lookup. This patch implements generic node distance handling. NUMA init methods can call numa_set_distance() to set distance between nodes and the common __node_distance() implementation will report the set distance. Due to the way NUMA emulation is implemented, the generic node distance handling is used only when emulation is not used. Later patches will update NUMA emulation to use the generic distance mechanism. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
With all memory configuration information now carried in numa_meminfo, there's no need to keep mem_nodes_parsed separate. Drop it and use numa_nodes_parsed for CPU / memory-less nodes. A new helper numa_nodemask_from_meminfo() is added to calculate memnode mask on the fly which is currently used to set node_possible_map. This simplifies NUMA init methods a bit and removes a source of possible inconsistencies. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-
由 Tejun Heo 提交于
It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Shaohui Zheng <shaohui.zheng@intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: H. Peter Anvin <hpa@linux.intel.com>
-