- 12 1月, 2011 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 11 1月, 2011 1 次提交
-
-
由 Jesse Larrew 提交于
The header asm/hvcall.h was previously included indirectly via smp.h. On non-SMP systems, however, these declarations are excluded and the build breaks. This is easily fixed by including asm/hvcall.h directly. The VPHN feature is only meaningful on NUMA systems that implement the SPLPAR option, so exclude the VPHN code on systems without SPLPAR enabled. Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled, even if CONFIG_HOTPLUG_CPU is disabled. Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the node masks after boot time, so remove the __cpuinit annotation to fix a section mismatch. Signed-off-by: NJesse Larrew <jlarrew@linux.vnet.ibm.com>
-
- 09 12月, 2010 1 次提交
-
-
由 Jesse Larrew 提交于
Tie the polling mechanism into the ibm,suspend-me rtas call to stop/restart polling before/after a suspend, hibernate, migrate, or checkpoint restart operation. This ensures that the system has a chance to disable the polling if the partition is migrated to a system that does not support VPHN (and vice versa). Signed-off-by: NJesse Larrew <jlarrew@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 7月, 2010 1 次提交
-
-
由 Grant Likely 提交于
of_node_to_nid() is only relevant in a few architectures. Don't force everyone to implement it anyway. Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
- 09 7月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
Form 1 affinity allows multiple entries in ibm,associativity-reference-points which represent affinity domains in decreasing order of importance. The Linux concept of a node is always the first entry, but using the other values as an input to node_distance() allows the memory allocator to make better decisions on which node to go first when local memory has been exhausted. We keep things simple and create an array indexed by NUMA node, capped at 4 entries. Each time we lookup an associativity property we initialise the array which is overkill, but since we should only hit this path during boot it didn't seem worth adding a per node valid bit. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 21 5月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets enabled via this: /* * If another node is sufficiently far away then it is better * to reclaim pages in a zone before going off node. */ if (distance > RECLAIM_DISTANCE) zone_reclaim_mode = 1; Since we use the default value of 20 for REMOTE_DISTANCE and 20 for RECLAIM_DISTANCE it never kicks in. The local to remote bandwidth ratios can be quite large on System p machines so it makes sense for us to reclaim clean pagecache locally before going off node. The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables zone reclaim. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 06 5月, 2010 2 次提交
-
-
由 Anton Blanchard 提交于
Convert NUMA code to new cpumask API. We shift the node to cpumask setup code until after we complete bootmem allocation so we can dynamically allocate the cpumasks. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Anton Blanchard 提交于
Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks. We don't need to set_cpu_online() the boot cpu in smp_prepare_boot_cpu, init/main.c does it for us. We also postpone setting of the boot cpu in cpu_sibling_map and cpu_core_map until when the memory allocator is available (smp_prepare_cpus), similar to x86. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 07 4月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets enabled via this: /* * If another node is sufficiently far away then it is better * to reclaim pages in a zone before going off node. */ if (distance > RECLAIM_DISTANCE) zone_reclaim_mode = 1; Since we use the default value of 20 for REMOTE_DISTANCE and 20 for RECLAIM_DISTANCE it never kicks in. The local to remote bandwidth ratios can be quite large on System p machines so it makes sense for us to reclaim clean pagecache locally before going off node. The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables zone reclaim. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 09 2月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
Clean up SD_NODE_INITS so we can easily compare it to x86. Similar to the work in 47734f89 (sched: Clean up topology.h) Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 1月, 2010 1 次提交
-
-
由 Anton Blanchard 提交于
pcibus_to_node can return -1 if we cannot determine which node a pci bus is on. If passed -1, cpumask_of_node will negatively index the lookup array and pull in random data: # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus 00000000,00000003,00000000,00000000 # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist 64-65 Change cpumask_of_node to check for -1 and return cpu_all_mask in this case: # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus ffffffff,ffffffff,ffffffff,ffffffff # cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist 0-127 Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 24 9月, 2009 3 次提交
-
-
由 Rusty Russell 提交于
There were replaced by topology_core_cpumask and topology_thread_cpumask. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
cpumask_of_pcibus() is the new version. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 16 9月, 2009 1 次提交
-
-
由 Peter Zijlstra 提交于
Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't really mind too much, SD_BALANCE_NEWIDLE picks up most of the slack. On a dual socket, quad core, dual thread nehalem system: sysbench (--num_threads=16): SD_BALANCE_WAKE-: 13982 tx/s SD_BALANCE_WAKE+: 15688 tx/s kbuild (-j16): SD_BALANCE_WAKE-: 47.648295846 seconds time elapsed ( +- 0.312% ) SD_BALANCE_WAKE+: 47.608607360 seconds time elapsed ( +- 0.026% ) (same within noise) Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 15 9月, 2009 3 次提交
-
-
由 Mike Galbraith 提交于
Make the idle balancer more agressive, to improve a x264 encoding workload provided by Jason Garrett-Glaser: NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 252.82 fps, 22096.60 kb/s encoded 600 frames, 250.69 fps, 22096.60 kb/s encoded 600 frames, 245.76 fps, 22096.60 kb/s NO_NEXT_BUDDY LB_BIAS encoded 600 frames, 344.44 fps, 22096.60 kb/s encoded 600 frames, 346.66 fps, 22096.60 kb/s encoded 600 frames, 352.59 fps, 22096.60 kb/s NO_NEXT_BUDDY NO_LB_BIAS encoded 600 frames, 425.75 fps, 22096.60 kb/s encoded 600 frames, 425.45 fps, 22096.60 kb/s encoded 600 frames, 422.49 fps, 22096.60 kb/s Peter pointed out that this is better done via newidle_idx, not via LB_BIAS, newidle balancing should look for where there is load _now_, not where there was load 2 ticks ago. Worst-case latencies are improved as well as no buddies means less vruntime spread. (as per prior lkml discussions) This change improves kbuild-peak parallelism as well. Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com> Signed-off-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1253011667.9128.16.camel@marge.simson.net> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
When merging select_task_rq_fair() and sched_balance_self() we lost the use of wake_idx, restore that and set them to 0 to make wake balancing more aggressive. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
The problem with wake_idle() is that is doesn't respect things like cpu_power, which means it doesn't deal well with SMT nor the recent RT interaction. To cure this, it needs to do what sched_balance_self() does, which leads to the possibility of merging select_task_rq_fair() and sched_balance_self(). Modify sched_balance_self() to: - update_shares() when walking up the domain tree, (it only called it for the top domain, but it should have done this anyway), which allows us to remove this ugly bit from try_to_wake_up(). - do wake_affine() on the smallest domain that contains both this (the waking) and the prev (the wakee) cpu for WAKE invocations. Then use the top-down balance steps it had to replace wake_idle(). This leads to the dissapearance of SD_WAKE_BALANCE and SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE. SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective. Touch all topology bits to replace the old with new SD flags -- platforms might need re-tuning, enabling SD_BALANCE_WAKE conditionally on a NUMA distance seems like a good additional feature, magny-core and small nehalem systems would want this enabled, systems with slow interconnects would not. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 30 3月, 2009 1 次提交
-
-
由 Rusty Russell 提交于
Everyone defines it, and only one person uses it (arch/mips/sgi-ip27/ip27-nmi.c). So just open code it there. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: linux-mips@linux-mips.org
-
- 01 1月, 2009 1 次提交
-
-
由 Rusty Russell 提交于
Impact: New API The old topology_core_siblings() and topology_thread_siblings() return a cpumask_t; these new ones return a (const) struct cpumask *. Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NMike Travis <travis@sgi.com>
-
- 26 12月, 2008 1 次提交
-
-
由 Rusty Russell 提交于
Impact: New APIs The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these return a pointer to a struct cpumask. Part of removing cpumasks from the stack. (Also replaces powerpc internal uses of node_to_cpumask). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 26 11月, 2008 1 次提交
-
-
由 Ingo Molnar 提交于
Mathieu Desnoyers reported this build failure on powerpc: kernel/sched.c: In function 'sd_init_NODE': kernel/sched.c:7319: error: non-static initialization of a flexible array member kernel/sched.c:7319: error: (near initialization for '(anonymous)') this happens because .span changed to cpumask_var_t, hence the static CPU_MASK_NONE initializers in the SD_*_INIT templates are not type-correct anymore. Remove them, as they default to empty anyway. Also remove them from IA64, MIPS and SH. Reported-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 04 8月, 2008 1 次提交
-
-
由 Stephen Rothwell 提交于
from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 28 7月, 2008 2 次提交
-
-
由 Nathan Lynch 提交于
Existing Open Firmware practice is to report each processor core as a separate node in the device tree. Report the value of the "reg" OF property corresponding to a logical CPU's device node as the core_id attribute in /sys/devices/system/cpu/cpu*/topology/core_id. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
Implement the notion of "core siblings" for powerpc. This makes /sys/devices/system/cpu/cpu*/topology/core_siblings present sensible values, indicating online CPUs which share an L2 cache. BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead of IBM-specific open coded search Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 20 4月, 2008 1 次提交
-
-
由 Mike Travis 提交于
Create a simple macro to always return a pointer to the node_to_cpumask(node) value. This relies on compiler optimization to remove the extra indirection: #define node_to_cpumask_ptr(v, node) \ cpumask_t _##v = node_to_cpumask(node), *v = &_##v For those systems with a large cpumask size, then a true pointer to the array element can be used: #define node_to_cpumask_ptr(v, node) \ cpumask_t *v = &(node_to_cpumask_map[node]) A node_to_cpumask_ptr_next() macro is provided to access another node_to_cpumask value. The other change is to always include asm-generic/topology.h moving the ifdef CONFIG_NUMA to this same file. Note: there are no references to either of these new macros in this patch, only the definition. Based on 2.6.25-rc5-mm1 # alpha Cc: Richard Henderson <rth@twiddle.net> # fujitsu Cc: David Howells <dhowells@redhat.com> # ia64 Cc: Tony Luck <tony.luck@intel.com> # powerpc Cc: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> # sparc Cc: David S. Miller <davem@davemloft.net> Cc: William L. Irwin <wli@holomorphy.com> # x86 Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: NMike Travis <travis@sgi.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 17 10月, 2007 1 次提交
-
-
由 Mike Travis 提交于
Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu variable. This saves sizeof(cpumask_t) * NR unused cpus. Access is mostly from startup and CPU HOTPLUG functions. Signed-off-by: NMike Travis <travis@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "Luck, Tony" <tony.luck@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 05 3月, 2007 1 次提交
-
-
由 Con Kolivas 提交于
Remove the SMT-nice feature which idles sibling cpus on SMT cpus to facilitiate nice working properly where cpu power is shared. The idling of cpus in the presence of runnable tasks is considered too fragile, easy to break with outside code, and the complexity of managing this system if an architecture comes along with many logical cores sharing cpu power will be unworkable. Remove the associated per_cpu_gain variable in sched_domains used only by this code. Also: The reason is that with dynticks enabled, this code breaks without yet further tweaks so dynticks brought on the rapid demise of this code. So either we tweak this code or kill it off entirely. It was Ingo's preference to kill it off. Either way this needs to happen for 2.6.21 since dynticks has gone in. Signed-off-by: NCon Kolivas <kernel@kolivas.org> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 12月, 2006 1 次提交
-
-
由 Christoph Lameter 提交于
Large sched domains can be very expensive to scan. Add an option SD_SERIALIZE to the sched domain flags. If that flag is set then we make sure that no other such domain is being balanced. [akpm@osdl.org: build fix] Signed-off-by: NChristoph Lameter <clameter@sgi.com> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Christoph Lameter <clameter@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 04 12月, 2006 1 次提交
-
-
由 Arnd Bergmann 提交于
At least the ide driver calls pcibus_to_node, which is not defined when CONFIG_PCI is disabled. This adds a nop function for the !PCI case. Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com>
-
- 16 11月, 2006 1 次提交
-
-
由 Stephen Rothwell 提交于
This adds the /sys/devices/system/cpu/*/topology/thread_siblings files on powerpc. These files are already available on other architectures. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 03 10月, 2006 1 次提交
-
-
由 Siddha, Suresh B 提交于
Introduce the child field in sched_domain struct and use it in sched_balance_self(). We will also use this field in cleaning up the sched group cpu_power setup(done in a different patch) code. Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NNick Piggin <nickpiggin@yahoo.com.au> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 28 6月, 2006 1 次提交
-
-
由 Siddha, Suresh B 提交于
sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in /sys/devices/system/cpu/ control the MC/SMT power savings policy for the scheduler. Based on the values (1-enable, 0-disable) for these controls, sched groups cpu power will be determined for different domains. When power savings policy is enabled and under light load conditions, scheduler will minimize the physical packages/cpu cores carrying the load and thus conserving power(with a perf impact based on the workload characteristics... see OLS 2005 CMP kernel scheduler paper for more details..) Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Con Kolivas <kernel@kolivas.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 15 6月, 2006 1 次提交
-
-
由 Anton Blanchard 提交于
of_node_to_nid returns -1 if the associativity cannot be found. This means pcibus_to_cpumask has to be careful not to pass a negative index into node_to_cpumask. Since pcibus_to_node could be used a lot, and of_node_to_nid is slow (it walks a list doing strcmps), lets also cache the node in the pci_controller struct. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 09 6月, 2006 1 次提交
-
-
由 Christoph Hellwig 提交于
On 64bit powerpc we can find out what node a pci bus hangs off, so implement the topology.h macros that export this information. For 32bit this seems a little more difficult, but I don't know of 32bit powerpc NUMA machines either, so let's leave it out for now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 02 5月, 2006 1 次提交
-
-
由 Jeremy Kerr 提交于
Change of_node_to_nid() to traverse the device tree, looking for a numa id. Cell uses this to assign ids to SPUs, which are children of the CPU node. Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(), which doesn't do the traversal. Export an attach_sysdev_to_node() function, allowing system devices (eg. SPUs) to link themselves into the numa topology in sysfs. Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 26 4月, 2006 1 次提交
-
-
由 David Woodhouse 提交于
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
-
- 13 1月, 2006 1 次提交
-
-
由 akpm@osdl.org 提交于
) From: Ingo Molnar <mingo@elte.hu> This is the latest version of the scheduler cache-hot-auto-tune patch. The first problem was that detection time scaled with O(N^2), which is unacceptable on larger SMP and NUMA systems. To solve this: - I've added a 'domain distance' function, which is used to cache measurement results. Each distance is only measured once. This means that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT distances 0 and 1, and on SMP distance 0 is measured. The code walks the domain tree to determine the distance, so it automatically follows whatever hierarchy an architecture sets up. This cuts down on the boot time significantly and removes the O(N^2) limit. The only assumption is that migration costs can be expressed as a function of domain distance - this covers the overwhelming majority of existing systems, and is a good guess even for more assymetric systems. [ People hacking systems that have assymetries that break this assumption (e.g. different CPU speeds) should experiment a bit with the cpu_distance() function. Adding a ->migration_distance factor to the domain structure would be one possible solution - but lets first see the problem systems, if they exist at all. Lets not overdesign. ] Another problem was that only a single cache-size was used for measuring the cost of migration, and most architectures didnt set that variable up. Furthermore, a single cache-size does not fit NUMA hierarchies with L3 caches and does not fit HT setups, where different CPUs will often have different 'effective cache sizes'. To solve this problem: - Instead of relying on a single cache-size provided by the platform and sticking to it, the code now auto-detects the 'effective migration cost' between two measured CPUs, via iterating through a wide range of cachesizes. The code searches for the maximum migration cost, which occurs when the working set of the test-workload falls just below the 'effective cache size'. I.e. real-life optimized search is done for the maximum migration cost, between two real CPUs. This, amongst other things, has the positive effect hat if e.g. two CPUs share a L2/L3 cache, a different (and accurate) migration cost will be found than between two CPUs on the same system that dont share any caches. (The reliable measurement of migration costs is tricky - see the source for details.) Furthermore i've added various boot-time options to override/tune migration behavior. Firstly, there's a blanket override for autodetection: migration_cost=1000,2000,3000 will override the depth 0/1/2 values with 1msec/2msec/3msec values. Secondly, there's a global factor that can be used to increase (or decrease) the autodetected values: migration_factor=120 will increase the autodetected values by 20%. This option is useful to tune things in a workload-dependent way - e.g. if a workload is cache-insensitive then CPU utilization can be maximized by specifying migration_factor=0. I've tested the autodetection code quite extensively on x86, on 3 P3/Xeon/2MB, and the autodetected values look pretty good: Dual Celeron (128K L2 cache): --------------------- migration cost matrix (max_cache_size: 131072, cpu: 467 MHz): --------------------- [00] [01] [00]: - 1.7(1) [01]: 1.7(1) - --------------------- cacheflush times [2]: 0.0 (0) 1.7 (1784008) --------------------- Here the slow memory subsystem dominates system performance, and even though caches are small, the migration cost is 1.7 msecs. Dual HT P4 (512K L2 cache): --------------------- migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz): --------------------- [00] [01] [02] [03] [00]: - 0.4(1) 0.0(0) 0.4(1) [01]: 0.4(1) - 0.4(1) 0.0(0) [02]: 0.0(0) 0.4(1) - 0.4(1) [03]: 0.4(1) 0.0(0) 0.4(1) - --------------------- cacheflush times [2]: 0.0 (33900) 0.4 (448514) --------------------- Here it can be seen that there is no migration cost between two HT siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory system makes inter-physical-CPU migration pretty cheap: 0.4 msecs. 8-way P3/Xeon [2MB L2 cache]: --------------------- migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz): --------------------- [00] [01] [02] [03] [04] [05] [06] [07] [00]: - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [01]: 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [02]: 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) [03]: 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) 19.2(1) [04]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) 19.2(1) [05]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) 19.2(1) [06]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - 19.2(1) [07]: 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) - --------------------- cacheflush times [2]: 0.0 (0) 19.2 (19281756) --------------------- This one has huge caches and a relatively slow memory subsystem - so the migration cost is 19 msecs. Signed-off-by: NIngo Molnar <mingo@elte.hu> Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NKen Chen <kenneth.w.chen@intel.com> Cc: <wilder@us.ibm.com> Signed-off-by: NJohn Hawkes <hawkes@sgi.com> Signed-off-by: NAndrew Morton <akpm@osdl.org> Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
-
- 09 1月, 2006 2 次提交
-
-
由 Arnd Bergmann 提交于
include/asm-ppc/ had #ifdef __KERNEL__ in all header files that are not meant for use by user space, include/asm-powerpc does not have this yet. This patch gets us a lot closer there. There are a few cases where I was not sure, so I left them out. I have verified that no CONFIG_* symbols are used outside of __KERNEL__ any more and that there are no obvious compile errors when including any of the headers in user space libraries. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Anton Blanchard 提交于
We used to print a NUMA cpu summary at boot before the hotplug cpu code was added. This has been useful for catching machine configuration as well as firmware bugs in the past. This patch restores that functionality. An example of the output is: Node 0 CPUs: 0-7 Node 1 CPUs: 8-15 Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-