提交 · 4a55bd5e97b1775913f88f11108a4f144f590e89 · openanolis / cloud-kernel

20 4月, 2008 40 次提交

sched: fair-group: de-couple load-balancing from the rb-trees · 4a55bd5e

由 Peter Zijlstra 提交于 4月 19, 2008

De-couple load-balancing from the rb-trees, so that I can change their
organization.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4a55bd5e

sched: fair-group scheduling vs latency · ac884dec

由 Peter Zijlstra 提交于 4月 19, 2008

Currently FAIR_GROUP sched grows the scheduler latency outside of
sysctl_sched_latency, invert this so it stays within.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac884dec

sched: rt-group: optimize dequeue_rt_stack · 58d6c2d7

由 Peter Zijlstra 提交于 4月 19, 2008

Now that the group hierarchy can have an arbitrary depth the O(n^2) nature
of RT task dequeues will really hurt. Optimize this by providing space to
store the tree path, so we can walk it the other way.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

58d6c2d7

sched: debug: add some debug code to handle the full hierarchy · d19ca308

由 Peter Zijlstra 提交于 4月 19, 2008

Add some extra debug output so we can get a better overview of the
full hierarchy.

We print the cgroup path after each cfs_rq, so we can see what group
we're looking at.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d19ca308

sched: fair-group: SMP-nice for group scheduling · 18d95a28

由 Peter Zijlstra 提交于 4月 19, 2008

Implement SMP nice support for the full group hierarchy.

On each load-balance action, compile a sched_domain wide view of the full
task_group tree. We compute the domain wide view when walking down the
hierarchy, and readjust the weights when walking back up.

After collecting and readjusting the domain wide view, we try to balance the
tasks within the task_groups. The current approach is a naively balance each
task group until we've moved the targeted amount of load.

Inspired by Srivatsa Vaddsgiri's previous code and Abhishek Chandra's H-SMP
paper.

XXX: there will be some numerical issues due to the limited nature of
     SCHED_LOAD_SCALE wrt to representing a task_groups influence on the
     total weight. When the tree is deep enough, or the task weight small
     enough, we'll run out of bits.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
CC: Abhishek Chandra <chandra@cs.umn.edu>
CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

18d95a28

sched, cpuset: customize sched domains, core · 1d3504fc

由 Hidetoshi Seto 提交于 4月 15, 2008

[rebased for sched-devel/latest]

 - Add a new cpuset file, having levels:
     sched_relax_domain_level

 - Modify partition_sched_domains() and build_sched_domains()
   to take attributes parameter passed from cpuset.

 - Fill newidle_idx for node domains which currently unused but
   might be required if sched_relax_domain_level become higher.

 - We can change the default level by boot option 'relax_domain_level='.
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1d3504fc

sched: prepatory code movement · b758149c

由 Peter Zijlstra 提交于 4月 19, 2008

Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b758149c

sched: rt: multi level group constraints · b40b2e8e

由 Peter Zijlstra 提交于 4月 19, 2008

multi level rt constraints
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b40b2e8e

sched: task_group hierarchy · f473aa5e

由 Peter Zijlstra 提交于 4月 19, 2008

Add the full parent<->child relation thing into task_groups as well.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f473aa5e

sched: fix the task_group hierarchy for UID grouping · eff766a6

由 Peter Zijlstra 提交于 4月 19, 2008

UID grouping doesn't actually have a task_group representing the root of
the task_group tree. Add one.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

eff766a6

sched: allow the group scheduler to have multiple levels · ec7dc8ac

由 Dhaval Giani 提交于 4月 19, 2008

This patch makes the group scheduler multi hierarchy aware.

[a.p.zijlstra@chello.nl: rt-parts and assorted fixes]
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ec7dc8ac

sched: mix tasks and groups · 354d60c2

由 Dhaval Giani 提交于 4月 19, 2008

This patch allows tasks and groups to exist in the same cfs_rq. With this
change the CFS group scheduling follows a 1/(M+N) model from a 1/(1+N)
fairness model where M tasks and N groups exist at the cfs_rq level.

[a.p.zijlstra@chello.nl: rt bits and assorted fixes]
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

354d60c2

I
sched: fix checks · ea736ed5
由 Ingo Molnar 提交于 3月 25, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
ea736ed5

sched: old sleeper bonus · 112f53f5

由 Peter Zijlstra 提交于 3月 19, 2008

Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

112f53f5

sched: add new set_cpus_allowed_ptr function · cd8ba7cd

由 Mike Travis 提交于 3月 26, 2008

Add a new function that accepts a pointer to the "newly allowed cpus"
cpumask argument.

int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

The current set_cpus_allowed() function is modified to use the above
but this does not result in an ABI change.  And with some compiler
optimization help, it may not introduce any additional overhead.

Additionally, to enforce the read only nature of the new_mask arg, the
"const" property is migrated to sub-functions called by set_cpus_allowed.
This silences compiler warnings.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cd8ba7cd

init: move setup of nr_cpu_ids to as early as possible · e0982e90

由 Mike Travis 提交于 3月 26, 2008

Move the setting of nr_cpu_ids from sched_init() to start_kernel()
so that it's available as early as possible.

Note that an arch has the option of setting it even earlier if need be,
but it should not result in a different value than the setup_nr_cpu_ids()
function.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e0982e90

sched: remove another cpumask_t variable from stack · 4bdbaad3

由 Mike Travis 提交于 4月 15, 2008

    * Remove another cpumask_t variable from stack that was missed in the
      last kernel_sched_c updates.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4bdbaad3

cpumask: use new cpus_scnprintf function · 39106dcf

由 Mike Travis 提交于 4月 08, 2008

  * Cleaned up references to cpumask_scnprintf() and added new
    cpulist_scnprintf() interfaces where appropriate.

  * Fix some small bugs (or code efficiency improvments) for various uses
    of cpumask_scnprintf.

  * Clean up some checkpatch errors.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39106dcf

cpumask: reduce stack usage in SD_x_INIT initializers · 7c16ec58

由 Mike Travis 提交于 4月 04, 2008

  * Remove empty cpumask_t (and all non-zero/non-null) variables
    in SD_*_INIT macros.  Use memset(0) to clear.  Also, don't
    inline the initializer functions to save on stack space in
    build_sched_domains().

  * Merge change to include/linux/topology.h that uses the new
    node_to_cpumask_ptr function in the nr_cpus_node macro into
    this patch.

Depends on:
	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7c16ec58

nodemask: use new node_to_cpumask_ptr function · c5f59f08

由 Mike Travis 提交于 4月 04, 2008

  * Use new node_to_cpumask_ptr.  This creates a pointer to the
    cpumask for a given node.  This definition is in mm patch:

	asm-generic-add-node_to_cpumask_ptr-macro.patch

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
	[x86/latest]: x86: add cpus_scnprintf function

Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Greg Banks <gnb@melbourne.sgi.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c5f59f08

generic: reduce stack pressure in sched_affinity · b53e921b

由 Mike Travis 提交于 4月 04, 2008

  * Modify sched_affinity functions to pass cpumask_t variables by reference
    instead of by value.

  * Use new set_cpus_allowed_ptr function.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function

Cc: Paul Jackson <pj@sgi.com>
Cc: Cliff Wickman <cpw@sgi.com>
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b53e921b

cpuset: modify cpuset_set_cpus_allowed to use cpumask pointer · f9a86fcb

由 Mike Travis 提交于 4月 04, 2008

  * Modify cpuset_cpus_allowed to return the currently allowed cpuset
    via a pointer argument instead of as the function return value.

  * Use new set_cpus_allowed_ptr function.

  * Cleanup CPU_MASK_ALL and NODE_MASK_ALL uses.

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f9a86fcb

generic: use new set_cpus_allowed_ptr function · f70316da

由 Mike Travis 提交于 4月 04, 2008

  * Use new set_cpus_allowed_ptr() function added by previous patch,
    which instead of passing the "newly allowed cpus" cpumask_t arg
    by value,  pass it by pointer:

    -int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
    +int set_cpus_allowed_ptr(struct task_struct *p, const cpumask_t *new_mask)

  * Modify CPU_MASK_ALL

Depends on:
	[sched-devel]: sched: add new set_cpus_allowed_ptr function
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f70316da

sched: remove fixed NR_CPUS sized arrays in kernel_sched_c · 434d53b0

由 Mike Travis 提交于 4月 04, 2008

 * Change fixed size arrays to per_cpu variables or dynamically allocated
   arrays in sched_init() and sched_init_smp().

     (1) static struct sched_entity *init_sched_entity_p[NR_CPUS];
     (1) static struct cfs_rq *init_cfs_rq_p[NR_CPUS];
     (1) static struct sched_rt_entity *init_sched_rt_entity_p[NR_CPUS];
     (1) static struct rt_rq *init_rt_rq_p[NR_CPUS];
	 static struct sched_group **sched_group_nodes_bycpu[NR_CPUS];

     (1) - these arrays are allocated via alloc_bootmem_low()

 * Change sched_domain_debug_one() to use cpulist_scnprintf instead of
   cpumask_scnprintf.  This reduces the output buffer required and improves
   readability when large NR_CPU count machines arrive.

 * In sched_create_group() we allocate new arrays based on nr_cpu_ids.
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

434d53b0

cpumask: Cleanup more uses of CPU_MASK and NODE_MASK · d366f8cb

由 Mike Travis 提交于 4月 04, 2008

 *  Replace usages of CPU_MASK_NONE, CPU_MASK_ALL, NODE_MASK_NONE,
    NODE_MASK_ALL to reduce stack requirements for large NR_CPUS
    and MAXNODES counts.

 *  In some cases, the cpumask variable was initialized but then overwritten
    with another value.  This is the case for changes like this:

    -       cpumask_t oldmask = CPU_MASK_ALL;
    +       cpumask_t oldmask;
Signed-off-by: NMike Travis <travis@sgi.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d366f8cb

sched: fix cpus_allowed settings · 9f0e738f

由 Gregory Haskins 提交于 2月 12, 2008

Signed-off-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9f0e738f

sched: allow cpuacct stats to be reset · 0297b803

由 Dhaval Giani 提交于 2月 29, 2008

Currently the schedstats implementation does not allow the statistics
to be reset. This patch aims to allow that.

  echo 0 > cpuacct.usage

resets the usage. Any other value is not allowed and returns -EINVAL.
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0297b803

sched: cleanup cpuacct variable names · 32cd756a

由 Dhaval Giani 提交于 2月 29, 2008

Change the variable names to the common convention for the cpuacct
subsystem.
Signed-off-by: NDhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

32cd756a

tasklets: execute tasklets in the same order they were queued · 48f20a9a

由 Olof Johansson 提交于 3月 04, 2008

I noticed this when looking at an openswan issue.  Openswan (ab?)uses the
tasklet API to defer processing of packets in some situations, with one
packet per tasklet_action().  I started noticing sequences of
backwards-ordered sequence numbers coming over the wire, since new tasklets
are always queued at the head of the list but processed sequentially.

Convert it to instead append new entries to the tail of the list.  As an
extra bonus, the splicing code in takeover_tasklets() no longer has to
iterate over the list.
Signed-off-by: NOlof Johansson <olof@lixom.net>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

48f20a9a

sched: rt-group: smp balancing · ac086bc2

由 Peter Zijlstra 提交于 4月 19, 2008

Currently the rt group scheduling does a per cpu runtime limit, however
the rt load balancer makes no guarantees about an equal spread of real-
time tasks, just that at any one time, the highest priority tasks run.

Solve this by making the runtime limit a global property by borrowing
excessive runtime from the other cpus once the local limit runs out.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac086bc2

sched: rt-group: synchonised bandwidth period · d0b27fa7

由 Peter Zijlstra 提交于 4月 19, 2008

Various SMP balancing algorithms require that the bandwidth period
run in sync.

Possible improvements are moving the rt_bandwidth thing into root_domain
and keeping a span per rt_bandwidth which marks throttled cpus.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d0b27fa7

sched: fix regression with sched yield · 79b3feff

由 Peter Zijlstra 提交于 2月 18, 2008

Balbir Singh reported:

> 1:mon> t
> [c0000000e7677da0] c000000000067de0 .sys_sched_yield+0x6c/0xbc
> [c0000000e7677e30] c000000000008748 syscall_exit+0x0/0x40
> --- Exception: c01 (System Call) at 00000400001d09e4
> SP (4000664cb10) is in userspace
> 1:mon> r
> cpu 0x1: Vector: 300 (Data Access) at [c0000000e7677aa0]
>     pc: c000000000068e50: .yield_task_fair+0x94/0xc4
>     lr: c000000000067de0: .sys_sched_yield+0x6c/0xbc

the check that should have avoided that is:

        /*
         * Are we the only task in the tree?
         */
        if (unlikely(rq->load.weight == curr->se.load.weight))
                return;

But I guess that overlooks rt tasks, they also increase the load.
So I guess something like this ought to fix it..
Signed-off-by: NIngo Molnar <mingo@elte.hu>

79b3feff

latencytop: optimize LT_BACKTRACEDEPTH loops a bit · 19fb518c

由 Dmitry Adamushko 提交于 2月 17, 2008

There is no need to loop any longer when 'same == 0'.
Signed-off-by: NDmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

19fb518c

I
sched: remove sysctl_sched_batch_wakeup_granularity · 50df5d6a
由 Ingo Molnar 提交于 3月 14, 2008
```
it's unused.
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
50df5d6a
I
sched: reenable sync wakeups · 02e2b83b
由 Ingo Molnar 提交于 3月 19, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
02e2b83b
I
sched: cache hot buddy · d25ce4cd
由 Ingo Molnar 提交于 3月 17, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
d25ce4cd
I
sched: feat affine wakeups · 1fc8afa4
由 Ingo Molnar 提交于 3月 19, 2008
```
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
1fc8afa4

sched: introduce SCHED_FEAT_SYNC_WAKEUPS, turn it off · b85d0667

由 Ingo Molnar 提交于 3月 16, 2008

turn off sync wakeups by default. They are not needed anymore - the
buddy logic should be smart enough to keep the system from
overscheduling.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b85d0667

sched: fix wakeup granularity for buddies · 0bbd3336

由 Peter Zijlstra 提交于 4月 19, 2008

The wakeup buddy logic didn't use the same wakeup granularity logic as the
wakeup preemption did, this might cause the ->next buddy to be selected past
the point where we would have preempted had the task been a single running
instance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

0bbd3336

sched: fix rq->clock overflows detection with CONFIG_NO_HZ · 15934a37

由 Guillaume Chazarain 提交于 4月 19, 2008

When using CONFIG_NO_HZ, rq->tick_timestamp is not updated every TICK_NSEC.
We check that the number of skipped ticks matches the clock jump seen in
__update_rq_clock().
Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

15934a37

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功