提交 · cb5ef42a03a13f95a9ea94e6cda4f7a47497871f · openeuler / raspberrypi-kernel

27 6月, 2008 23 次提交

sched: optimize effective_load() · cb5ef42a

由 Peter Zijlstra 提交于 6月 27, 2008

s_i = S * rw_i / \Sum_j rw_j

 -> \Sum_j rw_j = S * rw_i / s_i

 -> s'_i = S * (rw_i + w) / (\Sum_j rw_j + w)

delta s = s' - s = S * (rw + w) / ((S * rw / s) + w)
        = s * (S * (rw + w) / (S * rw + s * w) - 1)

 a = S*(rw+w), b = S*rw + s*w

delta s = s * (a-b) / b

IOW, trade one divide for two multiplies
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cb5ef42a

sched: remove prio preference from balance decisions · 051c6764

由 Peter Zijlstra 提交于 6月 27, 2008

Priority looses much of its meaning in a hierarchical context. So don't
use it in balance decisions.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

051c6764

sched: fix task_h_load() · 4be9daaa

由 Peter Zijlstra 提交于 6月 27, 2008

Currently task_h_load() computes the load of a task and uses that to either
subtract it from the total, or add to it.

However, removing or adding a task need not have any effect on the total load
at all. Imagine adding a task to a group that is local to one cpu - in that
case the total load of that cpu is unaffected.

So properly compute addition/removal:

 s_i = S * rw_i / \Sum_j rw_j
 s'_i = S * (rw_i + wl) / (\Sum_j rw_j + wg)

then s'_i - s_i gives the change in load.

Where s_i is the shares for cpu i, S the group weight, rw_i the runqueue weight
for that cpu, wl the weight we add (subtract) and wg the weight contribution to
the runqueue.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4be9daaa

sched: fix load scaling in group balancing · 42a3ac7d

由 Peter Zijlstra 提交于 6月 27, 2008

doing the load balance will change cfs_rq->load.weight (that's the whole point)
but since that's part of the scale factor, we'll scale back with a different
amount.

Weight getting smaller would result in an inflated moved_load which causes
it to stop balancing too soon.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

42a3ac7d

sched: hierarchical load vs find_busiest_group · 408ed066

由 Peter Zijlstra 提交于 6月 27, 2008

find_busiest_group() has some assumptions about task weight being in the
NICE_0_LOAD range. Hierarchical task groups break this assumption - fix this
by replacing it with the average task weight, which will adapt the situation.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

408ed066

sched: hierarchical load vs affine wakeups · bb3469ac

由 Peter Zijlstra 提交于 6月 27, 2008

With hierarchical grouping we can't just compare task weight to rq weight - we
need to scale the weight appropriately.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bb3469ac

sched: persistent average load per task · a8a51d5e

由 Peter Zijlstra 提交于 6月 27, 2008

Remove the fall-back to SCHED_LOAD_SCALE by remembering the previous value of
cpu_avg_load_per_task() - this is useful because of the hierarchical group
model in which task weight can be much smaller.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a8a51d5e

sched: fix sched_balance_self() smp group balancing · 039a1c41

由 Peter Zijlstra 提交于 6月 27, 2008

Finding the least idle cpu is more accurate when done with updated shares.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

039a1c41

sched: fix newidle smp group balancing · 3e5459b4

由 Peter Zijlstra 提交于 6月 27, 2008

Re-compute the shares on newidle - so we can make a decision based on
recent data.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

3e5459b4

sched: simplify the group load balancer · c8cba857

由 Peter Zijlstra 提交于 6月 27, 2008

While thinking about the previous patch - I realized that using per domain
aggregate load values in load_balance_fair() is wrong. We should use the
load value for that CPU.

By not needing per domain hierarchical load values we don't need to store
per domain aggregate shares, which greatly simplifies all the math.

It basically falls apart in two separate computations:
 - per domain update of the shares
 - per CPU update of the hierarchical load

Also get rid of the move_group_shares() stuff - just re-compute the shares
again after a successful load balance.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c8cba857

sched: no need to aggregate task_weight · a25b5aca

由 Peter Zijlstra 提交于 6月 27, 2008

We only need to know the task_weight of the busiest rq - nothing to do
if there are no tasks there.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a25b5aca

sched: dont micro manage share losses · d3f40dba

由 Peter Zijlstra 提交于 6月 27, 2008

We used to try and contain the loss of 'shares' by playing arithmetic
games. Replace that by noticing that at the top sched_domain we'll
always have the full weight in shares to distribute.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d3f40dba

sched: kill task_group balancing · 53fecd8a

由 Srivatsa Vaddagiri 提交于 6月 27, 2008

The idea was to balance groups until we've reached the global goal, however
Vatsa rightly pointed out that we might never reach that goal this way -
hence take out this logic.

[ the initial rationale for this 'feature' was to promote max concurrency
  within a group - it does not however affect fairness ]
Reported-by: NSrivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

53fecd8a

sched: update aggregate when holding the RQs · 4d8d595d

由 Peter Zijlstra 提交于 6月 27, 2008

It was observed that in __update_group_shares_cpu()

  rq_weight > aggregate()->rq_weight

This is caused by forks/wakeups in between the initial aggregate pass and
locking of the RQs for load balance. To avoid this situation partially re-do
the aggregation once we have the RQs locked (which avoids new tasks from
appearing).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4d8d595d

sched: fix sched_domain aggregation · b6a86c74

由 Peter Zijlstra 提交于 6月 27, 2008

Keeping the aggregate on the first cpu of the sched domain has two problems:
 - it could collide between different sched domains on different cpus
 - it could slow things down because of the remote accesses
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b6a86c74

sched: add full schedstats to /proc/sched_debug · 32df2ee8

由 Peter Zijlstra 提交于 6月 27, 2008

show all the schedstats in /debug/sched_debug as well.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

32df2ee8

sched: fix wakeup granularity and buddy granularity · 103638d9

由 Peter Zijlstra 提交于 6月 27, 2008

Uncouple buddy selection from wakeup granularity.

The initial idea was that buddies could run ahead as far as a normal task
can - do this by measuring a pair 'slice' just as we do for a normal task.

This means we can drop the wakeup_granularity back to 5ms.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

103638d9

sched: sched_clock_cpu() based cpu_clock() · 76a2a6ee

由 Peter Zijlstra 提交于 6月 27, 2008

with sched_clock_cpu() being reasonably in sync between cpus (max 1 jiffy
difference) use this to provide cpu_clock().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

76a2a6ee

sched: revert revert of: fair-group: SMP-nice for group scheduling · c09595f6

由 Peter Zijlstra 提交于 6月 27, 2008

Try again..

Initial commit: 18d95a28
Revert: 6363ca57Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c09595f6

sched: fix calc_delta_asym, · ced8aa16

由 Peter Zijlstra 提交于 6月 27, 2008

Ok, so why are we in this mess, it was:

  1/w

but now we mixed that rw in the mix like:

 rw/w

rw being \Sum w suggests: fiddling w, we should also fiddle rw, humm?
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ced8aa16

sched: fix calc_delta_asym() · c9c294a6

由 Peter Zijlstra 提交于 6月 27, 2008

calc_delta_asym() is supposed to do the same as calc_delta_fair() except
linearly shrink the result for negative nice processes - this causes them
to have a smaller preemption threshold so that they are more easily preempted.

The problem is that for task groups se->load.weight is the per cpu share of
the actual task group weight; take that into account.

Also provide a debug switch to disable the asymmetry (which I still don't
like - but it does greatly benefit some workloads)

This would explain the interactivity issues reported against group scheduling.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c9c294a6

sched: revert the revert of: weight calculations · a7be37ac

由 Peter Zijlstra 提交于 6月 27, 2008

Try again..

initial commit: 8f1bc385
revert: f9305d4aSigned-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a7be37ac

sched: clean up some unused variables · bf647b62

由 Peter Zijlstra 提交于 6月 27, 2008

In file included from /mnt/build/linux-2.6/kernel/sched.c:1496:
/mnt/build/linux-2.6/kernel/sched_rt.c: In function '__enable_runtime':
/mnt/build/linux-2.6/kernel/sched_rt.c:339: warning: unused variable 'rd'
/mnt/build/linux-2.6/kernel/sched_rt.c: In function 'requeue_rt_entity':
/mnt/build/linux-2.6/kernel/sched_rt.c:692: warning: unused variable 'queue'
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

bf647b62

25 6月, 2008 17 次提交

I
Merge branch 'linus' into sched/devel · f57aec5a
由 Ingo Molnar 提交于 6月 25, 2008
```
Conflicts:

	kernel/sched_rt.c
Signed-off-by: NIngo Molnar <mingo@elte.hu>
```
f57aec5a
L

Linux 2.6.26-rc8 · 543cf4cb
由 Linus Torvalds 提交于 6月 24, 2008

543cf4cb

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 · bd8c540f

由 Linus Torvalds 提交于 6月 24, 2008

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte()
  [IA64] Handle count==0 in sn2_ptc_proc_write()
  [IA64] Fix boot failure on ia64/sn2

bd8c540f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes · 035cfc61

由 Linus Torvalds 提交于 6月 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-fixes:
  [GFS2] fix gfs2 block allocation (cleaned up)
  [GFS2] BUG: unable to handle kernel paging request at ffff81002690e000

035cfc61

Merge branch 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm · 919c0d14

由 Linus Torvalds 提交于 6月 24, 2008

* 'kvm-updates-2.6.26' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm:
  KVM: Remove now unused structs from kvm_para.h
  x86: KVM guest: Use the paravirt clocksource structs and functions
  KVM: Make kvm host use the paravirt clocksource structs
  x86: Make xen use the paravirt clocksource structs and functions
  x86: Add structs and functions for paravirt clocksource
  KVM: VMX: Fix host msr corruption with preemption enabled
  KVM: ioapic: fix lost interrupt when changing a device's irq
  KVM: MMU: Fix oops on guest userspace access to guest pagetable
  KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)
  KVM: MMU: Fix rmap_write_protect() hugepage iteration bug
  KVM: close timer injection race window in __vcpu_run
  KVM: Fix race between timer migration and vcpu migration

919c0d14

Merge git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog · de08341a

由 Linus Torvalds 提交于 6月 24, 2008

* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
  Revert "[WATCHDOG] hpwdt: Add CFLAGS to get driver working"

de08341a

Merge branch 'x86-fixes-for-linus' of... · 9bf8a943

由 Linus Torvalds 提交于 6月 24, 2008

Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  xen: remove support for non-PAE 32-bit

9bf8a943

L
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb · 3b968b7c
由 Linus Torvalds 提交于 6月 24, 2008
```
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  kgdb: sparse fix
  kgdb: documentation update - remove kgdboe
```
3b968b7c

enable bus mastering on i915 at resume time · ea7b44c8

由 Jie Luo 提交于 6月 24, 2008

On 9xx chips, bus mastering needs to be enabled at resume time for much of the
chip to function. With this patch, vblank interrupts will work as expected
on resume, along with other chip functions. Fixes kernel bugzilla #10844.
Signed-off-by: NJie Luo <clotho67@gmail.com>
Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea7b44c8

KVM: Remove now unused structs from kvm_para.h · 6b1ed908

由 Gerd Hoffmann 提交于 6月 03, 2008

The kvm_* structs are obsoleted by the pvclock_* ones.
Now all users have been switched over and the old structs
can be dropped.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

6b1ed908

x86: KVM guest: Use the paravirt clocksource structs and functions · f6e16d5a

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the kvm host code to use the pvclock structs
and functions, thereby making it compatible with Xen.

The patch also fixes an initialization bug: on SMP systems the
per-cpu has two different locations early at boot and after CPU
bringup.  kvmclock must take that in account when registering the
physical address within the host.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

f6e16d5a

KVM: Make kvm host use the paravirt clocksource structs · 50d0a0f9

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the kvm host code to use the pvclock structs.
It also makes the paravirt clock compatible with Xen.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

50d0a0f9

x86: Make xen use the paravirt clocksource structs and functions · 1c7b67f7

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch updates the xen guest to use the pvclock structs
and helper functions.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

1c7b67f7

x86: Add structs and functions for paravirt clocksource · 7af192c9

由 Gerd Hoffmann 提交于 6月 03, 2008

This patch adds structs for the paravirt clocksource ABI
used by both xen and kvm (pvclock-abi.h).

It also adds some helper functions to read system time and
wall clock time from a paravirtual clocksource (pvclock.[ch]).
They are based on the xen code.  They are enabled using
CONFIG_PARAVIRT_CLOCK.

Subsequent patches of this series will put the code in use.
Signed-off-by: NGerd Hoffmann <kraxel@redhat.com>
Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: NAvi Kivity <avi@qumranet.com>

7af192c9

[GFS2] fix gfs2 block allocation (cleaned up) · 5af4e7a0

由 Benjamin Marzinski 提交于 6月 24, 2008

This patch fixes bz 450641.

This patch changes the computation for zero_metapath_length(), which it
renames to metapath_branch_start(). When you are extending the metadata
tree, The indirect blocks that point to the new data block must either
diverge from the existing tree either at the inode, or at the first
indirect block. They can diverge at the first indirect block because the
inode has room for 483 pointers while the indirect blocks have room for
509 pointers, so when the tree is grown, there is some free space in the
first indirect block. What metapath_branch_start() now computes is the
height where the first indirect block for the new data block is located.
It can either be 1 (if the indirect block diverges from the inode) or 2
(if it diverges from the first indirect block).
Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>

5af4e7a0

[IA64] Eliminate NULL test after alloc_bootmem in iosapic_alloc_rte() · e2569b7e

由 Julia Lawall 提交于 6月 24, 2008

As noted by Akinobu Mita alloc_bootmem and related functions never return
NULL and always return a zeroed region of memory.  Thus a NULL test or
memset after calls to these functions is unnecessary.
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NTony Luck <tony.luck@intel.com>

e2569b7e

[IA64] Handle count==0 in sn2_ptc_proc_write() · 8097110d

由 Cliff Wickman 提交于 6月 24, 2008

The fix applied in e0c6d97c
"security hole in sn2_ptc_proc_write" didn't take into account
the case where count==0 (which results in a buffer underrun
when adding the trailing '\0').  Thanks to Andi Kleen for
pointing this out.
Signed-off-by: NCliff Wickman <cpw@sgi.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>

8097110d