提交 · 5d7d8072edc11080a7cf6cc37c9f4e61ca1e93c9 · openeuler / raspberrypi-kernel

12 1月, 2011 1 次提交
- B
  powerpc/pseries: Fix build of topology stuff without CONFIG_NUMA · 5d7d8072
  由 Benjamin Herrenschmidt 提交于 1月 12, 2011
```
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
```
  5d7d8072
11 1月, 2011 1 次提交

powerpc/pseries: Fix VPHN build errors on non-SMP systems · 39bf990e

由 Jesse Larrew 提交于 12月 17, 2010

The header asm/hvcall.h was previously included indirectly via
smp.h. On non-SMP systems, however, these declarations are excluded
and the build breaks. This is easily fixed by including asm/hvcall.h
directly.

The VPHN feature is only meaningful on NUMA systems that implement
the SPLPAR option, so exclude the VPHN code on systems without
SPLPAR enabled.

Also, expose unmap_cpu_from_node() on systems with SPLPAR enabled,
even if CONFIG_HOTPLUG_CPU is disabled.

Lastly, map_cpu_to_node() is now needed by VPHN to manipulate the
node masks after boot time, so remove the __cpuinit annotation to
fix a section mismatch.
Signed-off-by: NJesse Larrew <jlarrew@linux.vnet.ibm.com>

39bf990e

09 12月, 2010 1 次提交

powerpc: Disable VPHN polling during a suspend operation · 3b7a27db

由 Jesse Larrew 提交于 12月 01, 2010

Tie the polling mechanism into the ibm,suspend-me rtas call to
stop/restart polling before/after a suspend, hibernate, migrate,
or checkpoint restart operation. This ensures that the system has a
chance to disable the polling if the partition is migrated to a system
that does not support VPHN (and vice versa).
Signed-off-by: NJesse Larrew <jlarrew@linux.vnet.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

3b7a27db

30 7月, 2010 1 次提交

of: Provide default of_node_to_nid() implementation. · 559e2b7e

由 Grant Likely 提交于 7月 23, 2010

of_node_to_nid() is only relevant in a few architectures.  Don't force
everyone to implement it anyway.
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

559e2b7e

09 7月, 2010 1 次提交

powerpc/numa: Use form 1 affinity to setup node distance · 41eab6f8

由 Anton Blanchard 提交于 5月 16, 2010

Form 1 affinity allows multiple entries in ibm,associativity-reference-points
which represent affinity domains in decreasing order of importance. The
Linux concept of a node is always the first entry, but using the other
values as an input to node_distance() allows the memory allocator to make
better decisions on which node to go first when local memory has been
exhausted.

We keep things simple and create an array indexed by NUMA node, capped at
4 entries. Each time we lookup an associativity property we initialise
the array which is overkill, but since we should only hit this path during
boot it didn't seem worth adding a per node valid bit.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

41eab6f8

21 5月, 2010 1 次提交

powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim · 56608209

由 Anton Blanchard 提交于 5月 16, 2010

I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets
enabled via this:

        /*
         * If another node is sufficiently far away then it is better
         * to reclaim pages in a zone before going off node.
         */
        if (distance > RECLAIM_DISTANCE)
                zone_reclaim_mode = 1;

Since we use the default value of 20 for REMOTE_DISTANCE and 20 for
RECLAIM_DISTANCE it never kicks in.

The local to remote bandwidth ratios can be quite large on System p
machines so it makes sense for us to reclaim clean pagecache locally before
going off node.

The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables
zone reclaim.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

56608209

06 5月, 2010 2 次提交

powerpc/cpumask: Convert NUMA code to new cpumask API · 25863de0

由 Anton Blanchard 提交于 4月 26, 2010

Convert NUMA code to new cpumask API. We shift the node to cpumask
setup code until after we complete bootmem allocation so we can
dynamically allocate the cpumasks.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

25863de0

powerpc/cpumask: Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks · cc1ba8ea

由 Anton Blanchard 提交于 4月 26, 2010

Dynamically allocate cpu_sibling_map and cpu_core_map cpumasks.

We don't need to set_cpu_online() the boot cpu in smp_prepare_boot_cpu,
init/main.c does it for us.

We also postpone setting of the boot cpu in cpu_sibling_map and cpu_core_map
until when the memory allocator is available (smp_prepare_cpus), similar
to x86.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

cc1ba8ea

07 4月, 2010 1 次提交

powerpc/numa: Set a smaller value for RECLAIM_DISTANCE to enable zone reclaim · 27f10907

由 Anton Blanchard 提交于 2月 18, 2010

I noticed /proc/sys/vm/zone_reclaim_mode was 0 on a ppc64 NUMA box. It gets
enabled via this:

        /*
         * If another node is sufficiently far away then it is better
         * to reclaim pages in a zone before going off node.
         */
        if (distance > RECLAIM_DISTANCE)
                zone_reclaim_mode = 1;

Since we use the default value of 20 for REMOTE_DISTANCE and 20 for
RECLAIM_DISTANCE it never kicks in.

The local to remote bandwidth ratios can be quite large on System p
machines so it makes sense for us to reclaim clean pagecache locally before
going off node.

The patch below sets a smaller value for RECLAIM_DISTANCE and thus enables
zone reclaim.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

27f10907

09 2月, 2010 1 次提交

powerpc: Reformat SD_NODE_INIT to match x86 · a13672fb

由 Anton Blanchard 提交于 2月 07, 2010

Clean up SD_NODE_INITS so we can easily compare it to x86. Similar to the
work in 47734f89 (sched: Clean up topology.h)
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a13672fb

15 1月, 2010 1 次提交

powerpc: cpumask_of_node() should handle -1 as a node · c81b812a

由 Anton Blanchard 提交于 1月 05, 2010

pcibus_to_node can return -1 if we cannot determine which node a pci bus
is on. If passed -1, cpumask_of_node will negatively index the lookup array
and pull in random data:

# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus
00000000,00000003,00000000,00000000
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist
64-65

Change cpumask_of_node to check for -1 and return cpu_all_mask in this
case:

# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpus
ffffffff,ffffffff,ffffffff,ffffffff
# cat /sys/devices/pci0000:00/0000:00:01.0/local_cpulist
0-127
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

c81b812a

24 9月, 2009 3 次提交
- R
  cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: powerpc · 399d0682
  由 Rusty Russell 提交于 9月 24, 2009
```
There were replaced by topology_core_cpumask and topology_thread_cpumask.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  399d0682
- R
  cpumask: remove obsolete node_to_cpumask now everyone uses cpumask_of_node · 29c337a0
  由 Rusty Russell 提交于 9月 24, 2009
```
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  29c337a0
- R
  cpumask: remove the now-obsoleted pcibus_to_cpumask(): powerpc · b966cd6b
  由 Rusty Russell 提交于 9月 24, 2009
```
cpumask_of_pcibus() is the new version.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
```
  b966cd6b
16 9月, 2009 1 次提交

sched: Disable wakeup balancing · 182a85f8

由 Peter Zijlstra 提交于 9月 16, 2009

Sysbench thinks SD_BALANCE_WAKE is too agressive and kbuild doesn't
really mind too much, SD_BALANCE_NEWIDLE picks up most of the
slack.

On a dual socket, quad core, dual thread nehalem system:

sysbench (--num_threads=16):

 SD_BALANCE_WAKE-: 13982 tx/s
 SD_BALANCE_WAKE+: 15688 tx/s

kbuild (-j16):

 SD_BALANCE_WAKE-: 47.648295846  seconds time elapsed   ( +-   0.312% )
 SD_BALANCE_WAKE+: 47.608607360  seconds time elapsed   ( +-   0.026% )

(same within noise)
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

182a85f8

15 9月, 2009 3 次提交

sched: Improve latencies and throughput · 0ec9fab3

由 Mike Galbraith 提交于 9月 15, 2009

Make the idle balancer more agressive, to improve a
x264 encoding workload provided by Jason Garrett-Glaser:

 NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 252.82 fps, 22096.60 kb/s
 encoded 600 frames, 250.69 fps, 22096.60 kb/s
 encoded 600 frames, 245.76 fps, 22096.60 kb/s

 NO_NEXT_BUDDY LB_BIAS
 encoded 600 frames, 344.44 fps, 22096.60 kb/s
 encoded 600 frames, 346.66 fps, 22096.60 kb/s
 encoded 600 frames, 352.59 fps, 22096.60 kb/s

 NO_NEXT_BUDDY NO_LB_BIAS
 encoded 600 frames, 425.75 fps, 22096.60 kb/s
 encoded 600 frames, 425.45 fps, 22096.60 kb/s
 encoded 600 frames, 422.49 fps, 22096.60 kb/s

Peter pointed out that this is better done via newidle_idx,
not via LB_BIAS, newidle balancing should look for where
there is load _now_, not where there was load 2 ticks ago.

Worst-case latencies are improved as well as no buddies
means less vruntime spread. (as per prior lkml discussions)

This change improves kbuild-peak parallelism as well.
Reported-by: NJason Garrett-Glaser <darkshikari@gmail.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1253011667.9128.16.camel@marge.simson.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0ec9fab3

sched: Tweak wake_idx · 78e7ed53

由 Peter Zijlstra 提交于 9月 03, 2009

When merging select_task_rq_fair() and sched_balance_self() we lost
the use of wake_idx, restore that and set them to 0 to make wake
balancing more aggressive.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

78e7ed53

sched: Merge select_task_rq_fair() and sched_balance_self() · c88d5910

由 Peter Zijlstra 提交于 9月 10, 2009

The problem with wake_idle() is that is doesn't respect things like
cpu_power, which means it doesn't deal well with SMT nor the recent
RT interaction.

To cure this, it needs to do what sched_balance_self() does, which
leads to the possibility of merging select_task_rq_fair() and
sched_balance_self().

Modify sched_balance_self() to:

  - update_shares() when walking up the domain tree,
    (it only called it for the top domain, but it should
     have done this anyway), which allows us to remove
    this ugly bit from try_to_wake_up().

  - do wake_affine() on the smallest domain that contains
    both this (the waking) and the prev (the wakee) cpu for
    WAKE invocations.

Then use the top-down balance steps it had to replace wake_idle().

This leads to the dissapearance of SD_WAKE_BALANCE and
SD_WAKE_IDLE_FAR, with SD_WAKE_IDLE replaced with SD_BALANCE_WAKE.

SD_WAKE_AFFINE needs SD_BALANCE_WAKE to be effective.

Touch all topology bits to replace the old with new SD flags --
platforms might need re-tuning, enabling SD_BALANCE_WAKE
conditionally on a NUMA distance seems like a good additional
feature, magny-core and small nehalem systems would want this
enabled, systems with slow interconnects would not.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c88d5910

30 3月, 2009 1 次提交

cpumask: remove node_to_first_cpu · 0451fb2e

由 Rusty Russell 提交于 3月 30, 2009

Everyone defines it, and only one person uses it
(arch/mips/sgi-ip27/ip27-nmi.c).  So just open code it there.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: linux-mips@linux-mips.org

0451fb2e

01 1月, 2009 1 次提交

cpumask: Introduce topology_core_cpumask()/topology_thread_cpumask(): powerpc · 9150641d

由 Rusty Russell 提交于 1月 01, 2009

Impact: New API

The old topology_core_siblings() and topology_thread_siblings() return
a cpumask_t; these new ones return a (const) struct cpumask *.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMike Travis <travis@sgi.com>

9150641d

26 12月, 2008 1 次提交

cpumask: powerpc: Introduce cpumask_of_{node,pcibus} to replace {node,pcibus}_to_cpumask · 86c6f274

由 Rusty Russell 提交于 12月 26, 2008

Impact: New APIs

The old node_to_cpumask/node_to_pcibus returned a cpumask_t: these
return a pointer to a struct cpumask.  Part of removing cpumasks from
the stack.

(Also replaces powerpc internal uses of node_to_cpumask).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

86c6f274

26 11月, 2008 1 次提交

sched: convert struct root_domain to cpumask_var_t, fix · 1c391948

由 Ingo Molnar 提交于 11月 26, 2008

Mathieu Desnoyers reported this build failure on powerpc:

 kernel/sched.c: In function 'sd_init_NODE':
 kernel/sched.c:7319: error: non-static initialization of a flexible array member
 kernel/sched.c:7319: error: (near initialization for '(anonymous)')

this happens because .span changed to cpumask_var_t, hence
the static CPU_MASK_NONE initializers in the SD_*_INIT
templates are not type-correct anymore.

Remove them, as they default to empty anyway.

Also remove them from IA64, MIPS and SH.
Reported-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1c391948

04 8月, 2008 1 次提交

powerpc: Move include files to arch/powerpc/include/asm · b8b572e1

由 Stephen Rothwell 提交于 8月 01, 2008

from include/asm-powerpc.  This is the result of a

mkdir arch/powerpc/include/asm
git mv include/asm-powerpc/* arch/powerpc/include/asm

Followed by a few documentation/comment fixups and a couple of places
where <asm-powepc/...> was being used explicitly.  Of the latter only
one was outside the arch code and it is a driver only built for powerpc.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b8b572e1

28 7月, 2008 2 次提交

powerpc: Make core id information available to userspace · e9efed3b

由 Nathan Lynch 提交于 7月 27, 2008

Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e9efed3b

powerpc: Make core sibling information available to userspace · 440a0857

由 Nathan Lynch 提交于 7月 27, 2008

Implement the notion of "core siblings" for powerpc.  This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.

BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

440a0857

20 4月, 2008 1 次提交

asm-generic: add node_to_cpumask_ptr macro · aa6b5446

由 Mike Travis 提交于 3月 31, 2008

Create a simple macro to always return a pointer to the node_to_cpumask(node)
value.  This relies on compiler optimization to remove the extra indirection:

    #define node_to_cpumask_ptr(v, node) 		\
	    cpumask_t _##v = node_to_cpumask(node), *v = &_##v

For those systems with a large cpumask size, then a true pointer
to the array element can be used:

    #define node_to_cpumask_ptr(v, node)		\
	    cpumask_t *v = &(node_to_cpumask_map[node])

A node_to_cpumask_ptr_next() macro is provided to access another
node_to_cpumask value.

The other change is to always include asm-generic/topology.h moving the
ifdef CONFIG_NUMA to this same file.

Note: there are no references to either of these new macros in this patch,
only the definition.

Based on 2.6.25-rc5-mm1

# alpha
Cc: Richard Henderson <rth@twiddle.net>

# fujitsu
Cc: David Howells <dhowells@redhat.com>

# ia64
Cc: Tony Luck <tony.luck@intel.com>

# powerpc
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Blanchard <anton@samba.org>

# sparc
Cc: David S. Miller <davem@davemloft.net>
Cc: William L. Irwin <wli@holomorphy.com>

# x86
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

aa6b5446

17 10月, 2007 1 次提交

Convert cpu_sibling_map to be a per cpu variable · d5a7430d

由 Mike Travis 提交于 10月 16, 2007

Convert cpu_sibling_map from a static array sized by NR_CPUS to a per_cpu
variable.  This saves sizeof(cpumask_t) * NR unused cpus.  Access is mostly
from startup and CPU HOTPLUG functions.
Signed-off-by: NMike Travis <travis@sgi.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5a7430d

05 3月, 2007 1 次提交

[PATCH] sched: remove SMT nice · 69f7c0a1

由 Con Kolivas 提交于 3月 05, 2007

Remove the SMT-nice feature which idles sibling cpus on SMT cpus to
facilitiate nice working properly where cpu power is shared.  The idling of
cpus in the presence of runnable tasks is considered too fragile, easy to
break with outside code, and the complexity of managing this system if an
architecture comes along with many logical cores sharing cpu power will be
unworkable.

Remove the associated per_cpu_gain variable in sched_domains used only by
this code.

Also:

  The reason is that with dynticks enabled, this code breaks without yet
  further tweaks so dynticks brought on the rapid demise of this code.  So
  either we tweak this code or kill it off entirely.  It was Ingo's preference
  to kill it off.  Either way this needs to happen for 2.6.21 since dynticks
  has gone in.
Signed-off-by: NCon Kolivas <kernel@kolivas.org>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

69f7c0a1

11 12月, 2006 1 次提交

[PATCH] sched: add option to serialize load balancing · 08c183f3

由 Christoph Lameter 提交于 12月 10, 2006

Large sched domains can be very expensive to scan.  Add an option SD_SERIALIZE
to the sched domain flags.  If that flag is set then we make sure that no
other such domain is being balanced.

[akpm@osdl.org: build fix]
Signed-off-by: NChristoph Lameter <clameter@sgi.com>
Cc: Peter Williams <pwil3058@bigpond.net.au>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

08c183f3

04 12月, 2006 1 次提交

[POWERPC] fix building without PCI · 11faa658

由 Arnd Bergmann 提交于 11月 27, 2006

At least the ide driver calls pcibus_to_node, which is not
defined when CONFIG_PCI is disabled. This adds a nop function
for the !PCI case.
Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com>

11faa658

16 11月, 2006 1 次提交

[POWERPC] Add the thread_siblings files to sysfs · 056f4faa

由 Stephen Rothwell 提交于 11月 13, 2006

This adds the /sys/devices/system/cpu/*/topology/thread_siblings
files on powerpc.  These files are already available on other
architectures.
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

056f4faa

03 10月, 2006 1 次提交

[PATCH] sched: introduce child field in sched_domain · 1a848870

由 Siddha, Suresh B 提交于 10月 03, 2006

Introduce the child field in sched_domain struct and use it in
sched_balance_self().

We will also use this field in cleaning up the sched group cpu_power
setup(done in a different patch) code.
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
Cc: Paul Jackson <pj@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

1a848870

28 6月, 2006 1 次提交

[PATCH] sched: mc/smt power savings sched policy · 5c45bf27

由 Siddha, Suresh B 提交于 6月 27, 2006

sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
/sys/devices/system/cpu/ control the MC/SMT power savings policy for the
scheduler.

Based on the values (1-enable, 0-disable) for these controls, sched groups
cpu power will be determined for different domains.  When power savings
policy is enabled and under light load conditions, scheduler will minimize
the physical packages/cpu cores carrying the load and thus conserving
power(with a perf impact based on the workload characteristics...  see OLS
2005 CMP kernel scheduler paper for more details..)
Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Con Kolivas <kernel@kolivas.org>
Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

5c45bf27

15 6月, 2006 1 次提交

[POWERPC] pcibus_to_node fixes · 357518fa

由 Anton Blanchard 提交于 6月 10, 2006

of_node_to_nid returns -1 if the associativity cannot be found. This
means pcibus_to_cpumask has to be careful not to pass a negative index into
node_to_cpumask.

Since pcibus_to_node could be used a lot, and of_node_to_nid is slow (it
walks a list doing strcmps), lets also cache the node in the
pci_controller struct.
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

357518fa

09 6月, 2006 1 次提交

[PATCH] powerpc: implement pcibus_to_node and pcibus_to_cpumask · 318facbe

由 Christoph Hellwig 提交于 6月 06, 2006

On 64bit powerpc we can find out what node a pci bus hangs off, so
implement the topology.h macros that export this information.

For 32bit this seems a little more difficult, but I don't know of 32bit
powerpc NUMA machines either, so let's leave it out for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

318facbe

02 5月, 2006 1 次提交

[PATCH] powerpc: Allow devices to register with numa topology · 953039c8

由 Jeremy Kerr 提交于 5月 01, 2006

Change of_node_to_nid() to traverse the device tree, looking for a numa id.
Cell uses this to assign ids to SPUs, which are children of the CPU node.
Existing users of of_node_to_nid() are altered to use of_node_to_nid_single(),
which doesn't do the traversal.

Export an attach_sysdev_to_node() function, allowing system devices (eg.
SPUs) to link themselves into the numa topology in sysfs.
Signed-off-by: NArnd Bergmann <arnd.bergmann@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

953039c8

26 4月, 2006 1 次提交
- D
  Don't include linux/config.h from anywhere else in include/ · 62c4f0a2
  由 David Woodhouse 提交于 4月 26, 2006
```
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
```
  62c4f0a2
13 1月, 2006 1 次提交

[PATCH] scheduler cache-hot-autodetect · 198e2f18

由 akpm@osdl.org 提交于 1月 12, 2006

)

From: Ingo Molnar <mingo@elte.hu>

This is the latest version of the scheduler cache-hot-auto-tune patch.

The first problem was that detection time scaled with O(N^2), which is
unacceptable on larger SMP and NUMA systems. To solve this:

- I've added a 'domain distance' function, which is used to cache
  measurement results. Each distance is only measured once. This means
  that e.g. on NUMA distances of 0, 1 and 2 might be measured, on HT
  distances 0 and 1, and on SMP distance 0 is measured. The code walks
  the domain tree to determine the distance, so it automatically follows
  whatever hierarchy an architecture sets up. This cuts down on the boot
  time significantly and removes the O(N^2) limit. The only assumption
  is that migration costs can be expressed as a function of domain
  distance - this covers the overwhelming majority of existing systems,
  and is a good guess even for more assymetric systems.

  [ People hacking systems that have assymetries that break this
    assumption (e.g. different CPU speeds) should experiment a bit with
    the cpu_distance() function. Adding a ->migration_distance factor to
    the domain structure would be one possible solution - but lets first
    see the problem systems, if they exist at all. Lets not overdesign. ]

Another problem was that only a single cache-size was used for measuring
the cost of migration, and most architectures didnt set that variable
up. Furthermore, a single cache-size does not fit NUMA hierarchies with
L3 caches and does not fit HT setups, where different CPUs will often
have different 'effective cache sizes'. To solve this problem:

- Instead of relying on a single cache-size provided by the platform and
  sticking to it, the code now auto-detects the 'effective migration
  cost' between two measured CPUs, via iterating through a wide range of
  cachesizes. The code searches for the maximum migration cost, which
  occurs when the working set of the test-workload falls just below the
  'effective cache size'. I.e. real-life optimized search is done for
  the maximum migration cost, between two real CPUs.

  This, amongst other things, has the positive effect hat if e.g. two
  CPUs share a L2/L3 cache, a different (and accurate) migration cost
  will be found than between two CPUs on the same system that dont share
  any caches.

(The reliable measurement of migration costs is tricky - see the source
for details.)

Furthermore i've added various boot-time options to override/tune
migration behavior.

Firstly, there's a blanket override for autodetection:

	migration_cost=1000,2000,3000

will override the depth 0/1/2 values with 1msec/2msec/3msec values.

Secondly, there's a global factor that can be used to increase (or
decrease) the autodetected values:

	migration_factor=120

will increase the autodetected values by 20%. This option is useful to
tune things in a workload-dependent way - e.g. if a workload is
cache-insensitive then CPU utilization can be maximized by specifying
migration_factor=0.

I've tested the autodetection code quite extensively on x86, on 3
P3/Xeon/2MB, and the autodetected values look pretty good:

Dual Celeron (128K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 131072, cpu: 467 MHz):
 ---------------------
           [00]    [01]
 [00]:     -     1.7(1)
 [01]:   1.7(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 1.7 (1784008)
 ---------------------

Here the slow memory subsystem dominates system performance, and even
though caches are small, the migration cost is 1.7 msecs.

Dual HT P4 (512K L2 cache):

 ---------------------
 migration cost matrix (max_cache_size: 524288, cpu: 2379 MHz):
 ---------------------
           [00]    [01]    [02]    [03]
 [00]:     -     0.4(1)  0.0(0)  0.4(1)
 [01]:   0.4(1)    -     0.4(1)  0.0(0)
 [02]:   0.0(0)  0.4(1)    -     0.4(1)
 [03]:   0.4(1)  0.0(0)  0.4(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (33900) 0.4 (448514)
 ---------------------

Here it can be seen that there is no migration cost between two HT
siblings (CPU#0/2 and CPU#1/3 are separate physical CPUs). A fast memory
system makes inter-physical-CPU migration pretty cheap: 0.4 msecs.

8-way P3/Xeon [2MB L2 cache]:

 ---------------------
 migration cost matrix (max_cache_size: 2097152, cpu: 700 MHz):
 ---------------------
           [00]    [01]    [02]    [03]    [04]    [05]    [06]    [07]
 [00]:     -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [01]:  19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [02]:  19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [03]:  19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1) 19.2(1)
 [04]:  19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1) 19.2(1)
 [05]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1) 19.2(1)
 [06]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -    19.2(1)
 [07]:  19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1) 19.2(1)    -
 ---------------------
 cacheflush times [2]: 0.0 (0) 19.2 (19281756)
 ---------------------

This one has huge caches and a relatively slow memory subsystem - so the
migration cost is 19 msecs.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NAshok Raj <ashok.raj@intel.com>
Signed-off-by: NKen Chen <kenneth.w.chen@intel.com>
Cc: <wilder@us.ibm.com>
Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
Signed-off-by: NAndrew Morton <akpm@osdl.org>
Signed-off-by: NLinus Torvalds <torvalds@osdl.org>

198e2f18

09 1月, 2006 2 次提交

[PATCH] powerpc: sanitize header files for user space includes · 88ced031

由 Arnd Bergmann 提交于 12月 16, 2005

include/asm-ppc/ had #ifdef __KERNEL__ in all header files that
are not meant for use by user space, include/asm-powerpc does
not have this yet.

This patch gets us a lot closer there. There are a few cases
where I was not sure, so I left them out. I have verified
that no CONFIG_* symbols are used outside of __KERNEL__
any more and that there are no obvious compile errors when
including any of the headers in user space libraries.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

88ced031

[PATCH] ppc64: Add NUMA cpu summary at boot · 4b703a23

由 Anton Blanchard 提交于 12月 13, 2005

We used to print a NUMA cpu summary at boot before the hotplug cpu code
was added. This has been useful for catching machine configuration as
well as firmware bugs in the past.

This patch restores that functionality. An example of the output is:

Node 0 CPUs: 0-7
Node 1 CPUs: 8-15
Signed-off-by: NAnton Blanchard <anton@samba.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

4b703a23