提交 · 36b7b6d465489c4754c4fd66fcec6086eba87896 · OpenHarmony / kernel_linux

12 6月, 2009 1 次提交

sched: use kzalloc() instead of the bootmem allocator · 36b7b6d4

由 Pekka Enberg 提交于 6月 10, 2009

Now that kmem_cache_init() happens before sched_init(), we should use kzalloc()
and not the bootmem allocator.
Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>

36b7b6d4

19 5月, 2009 1 次提交

sched: properly define the sched_group::cpumask and sched_domain::span fields · 4200efd9

由 Ingo Molnar 提交于 5月 19, 2009

Properly document the variable-size structure tricks we are doing
wrt. struct sched_group and sched_domain, and use the field[0] GCC
extension instead of defining a vla array.

Dont use unions for this, as pointed out by Linus.

[ Impact: cleanup, un-confuse Sparse and LLVM ]
Reported-by: NJeff Garzik <jeff@garzik.org>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.01.0905180850110.3301@localhost.localdomain>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4200efd9

15 5月, 2009 2 次提交

sched, timers: cleanup avenrun users · 2d02494f

由 Thomas Gleixner 提交于 5月 02, 2009

avenrun is an rough estimate so we don't have to worry about
consistency of the three avenrun values. Remove the xtime lock
dependency and provide a function to scale the values. Cleanup the
users.

[ Impact: cleanup ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

2d02494f

sched, timers: move calc_load() to scheduler · dce48a84

由 Thomas Gleixner 提交于 4月 11, 2009

Dimitri Sivanich noticed that xtime_lock is held write locked across
calc_load() which iterates over all online CPUs. That can cause long
latencies for xtime_lock readers on large SMP systems. 

The load average calculation is an rough estimate anyway so there is
no real need to protect the readers vs. the update. It's not a problem
when the avenrun array is updated while a reader copies the values.

Instead of iterating over all online CPUs let the scheduler_tick code
update the number of active tasks shortly before the avenrun update
happens. The avenrun update itself is handled by the CPU which calls
do_timer().

[ Impact: reduce xtime_lock write locked section ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>

dce48a84

07 5月, 2009 1 次提交

sched: emit thread info flags with stack trace · aa47b7e0

由 David Rientjes 提交于 5月 04, 2009

When a thread is oom killed and fails to exit, it's helpful to know which
threads have access to memory reserves if the machine livelocks.  This is
done by testing for the TIF_MEMDIE thread info flag and should be
displayed alongside stack traces to identify tasks that have access to
such reserves but are still stuck allocating pages, for instance.

It would probably be helpful in other cases as well, so all thread info
flags are emitted when showing a task.

( v2: fix warning reported by Stephen Rothwell )

[ Impact: extend debug printout info ]
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
LKML-Reference: <alpine.DEB.2.00.0905040136390.15831@chino.kir.corp.google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

aa47b7e0

06 5月, 2009 2 次提交

tracepoint: trace_sched_migrate_task(): remove parameter · de1d7286

由 Mathieu Desnoyers 提交于 5月 05, 2009

The orig_cpu parameter in trace_sched_migrate_task() is not necessary,
it can be got by using task_cpu(p) in the probe.

[ Impact: micro-optimization ]
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
[ modified from Mathieu's patch. The original patch is at:
  http://marc.info/?l=linux-kernel&m=123791201716239&w=2 ]
Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Cc: fweisbec@gmail.com
Cc: rostedt@goodmis.org
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: zhaolei@cn.fujitsu.com
Cc: laijs@cn.fujitsu.com
LKML-Reference: <49FFFDB7.1050402@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

de1d7286

sched: rt: document the risk of small values in the bandwidth settings · 60aa605d

由 Peter Zijlstra 提交于 5月 05, 2009

Thomas noted that we should disallow sysctl_sched_rt_runtime == 0 for
(!RT_GROUP) since the root group always has some RT tasks in it.

Further, update the documentation to inspire clue.

[ Impact: exclude corner-case sysctl_sched_rt_runtime value ]
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090505155436.863098054@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

60aa605d

29 4月, 2009 2 次提交

sched: account system time properly · f5f293a4

由 Eric Dumazet 提交于 4月 29, 2009

Andrew Gallatin reported that IRQ and SOFTIRQ times were
sometime not reported correctly on recent kernels, and even
bisected to commit 457533a7
([PATCH] fix scaled & unscaled cputime accounting) as the first
bad commit.

Further analysis pointed that commit
79741dd3 ([PATCH] idle cputime
accounting) was the real cause of the problem.

account_process_tick() was not taking into account timer IRQ
interrupting the idle task servicing a hard or soft irq.

On mostly idle cpu, irqs were thus not accounted and top or
mpstat could tell user/admin that cpu was 100 % idle, 0.00 %
irq, 0.00 % softirq, while it was not.

[ Impact: fix occasionally incorrect CPU statistics in top/mpstat ]
Reported-by: NAndrew Gallatin <gallatin@myri.com>
Re-reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
Cc: rick.jones2@hp.com
Cc: brice@myri.com
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
LKML-Reference: <49F84BC1.7080602@cosmosbay.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f5f293a4

sched: Document memory barriers implied by sleep/wake-up primitives · 50fa610a

由 David Howells 提交于 4月 28, 2009

Add a section to the memory barriers document to note the implied
memory barriers of sleep primitives (set_current_state() and wrappers)
and wake-up primitives (wake_up() and co.).

Also extend the in-code comments on the wake_up() functions to note
these implied barriers.

[ Impact: add documentation ]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090428140138.1192.94723.stgit@warthog.procyon.org.uk>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

50fa610a

21 4月, 2009 2 次提交

sched: Replace first_cpu() with cpumask_first() in ILB nomination code · 6e29ec57

由 Gautham R Shenoy 提交于 4月 21, 2009

Stephen Rothwell reported this build warning:

>  kernel/sched.c: In function 'find_new_ilb':
>  kernel/sched.c:4355: warning: passing argument 1 of '__first_cpu' from incompatible pointer type
>
> Possibly caused by commit f711f609
> ("sched: Nominate idle load balancer from a semi-idle package") from
> the sched tree.  Should this call to first_cpu be cpumask_first?

For !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT), find_new_ilb() nominates the
Idle load balancer as the first cpu from the nohz.cpu_mask.

This code uses the older API first_cpu(). Replace it with cpumask_first(),
which is the correct API here.

[ Impact: cleanup, address build warning ]
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
LKML-Reference: <20090421031049.GA4140@in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6e29ec57

sched: remove extra call overhead for schedule() · ff743345

由 Peter Zijlstra 提交于 3月 13, 2009

Lai Jiangshan's patch reminded me that I promised Nick to remove
that extra call overhead in schedule().
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090313112300.927414207@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ff743345

17 4月, 2009 1 次提交

sched: Avoid printing sched_group::__cpu_power for default case · 381512cf

由 Gautham R Shenoy 提交于 4月 14, 2009

Commit 46e0bb9c ("sched: Print sched_group::__cpu_power
in sched_domain_debug") produces a messy dmesg output while
attempting to print the sched_group::__cpu_power for each
group in the sched_domain hierarchy.

Fix this by avoid printing the __cpu_power for default cases.
(i.e, __cpu_power == SCHED_LOAD_SCALE).

[ Impact: reduce syslog clutter ]
Reported-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Fixed-by: NTony Luck <tony.luck@intel.com>
Cc: a.p.zijlstra@chello.nl
LKML-Reference: <20090414033936.GA534@in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

381512cf

15 4月, 2009 3 次提交

sched: use group_first_cpu() instead of cpumask_first(sched_group_cpus()) · 13318a71

由 Miao Xie 提交于 4月 15, 2009

Impact: cleanup

This patch changes cpumask_first(sched_group_cpus()) to group_first_cpu()
for maintainability.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

13318a71

tracing/events: move trace point headers into include/trace/events · ad8d75ff

由 Steven Rostedt 提交于 4月 14, 2009

Impact: clean up

Create a sub directory in include/trace called events to keep the
trace point headers in their own separate directory. Only headers that
declare trace points should be defined in this directory.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ad8d75ff

tracing: create automated trace defines · a8d154b0

由 Steven Rostedt 提交于 4月 10, 2009

This patch lowers the number of places a developer must modify to add
new tracepoints. The current method to add a new tracepoint
into an existing system is to write the trace point macro in the
trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
DECLARE_TRACE, then they must add the same named item into the C file
with the macro DEFINE_TRACE(name) and then add the trace point.

This change cuts out the needing to add the DEFINE_TRACE(name).
Every file that uses the tracepoint must still include the trace/<type>.h
file, but the one C file must also add a define before the including
of that file.

 #define CREATE_TRACE_POINTS
 #include <trace/mytrace.h>

This will cause the trace/mytrace.h file to also produce the C code
necessary to implement the trace point.

Note, if more than one trace/<type>.h is used to create the C code
it is best to list them all together.

 #define CREATE_TRACE_POINTS
 #include <trace/foo.h>
 #include <trace/bar.h>
 #include <trace/fido.h>

Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
the cleaner solution of the define above the includes over my first
design to have the C code include a "special" header.

This patch converts sched, irq and lockdep and skb to use this new
method.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a8d154b0

14 4月, 2009 4 次提交

wait: don't use __wake_up_common() · 78ddb08f

由 Johannes Weiner 提交于 4月 14, 2009

'777c6c5f wait: prevent exclusive waiter starvation' made
__wake_up_common() global to be used from abort_exclusive_wait().

It was needed to do a wake-up with the waitqueue lock held while
passing down a key to the wake-up function.

Since '4ede816a epoll keyed wakeups: add __wake_up_locked_key() and
__wake_up_sync_key()' there is an appropriate wrapper for this case:
__wake_up_locked_key().

Use it here and make __wake_up_common() private to the scheduler
again.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1239720785-19661-1-git-send-email-hannes@cmpxchg.org>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

78ddb08f

sched: Nominate a power-efficient ilb in select_nohz_balancer() · e790fb0b

由 Gautham R Shenoy 提交于 4月 14, 2009

The CPU that first goes idle becomes the idle-load-balancer and remains
that until either it picks up a task or till all the CPUs of the system
goes idle.

Optimize this further to allow it to relinquish it's post
once all it's siblings in the power-aware sched_domain go idle, thereby
allowing the whole package-core to go idle. While relinquising the post,
nominate another an idle-load balancer from a semi-idle core/package.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090414045535.7645.31641.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

e790fb0b

sched: Nominate idle load balancer from a semi-idle package. · f711f609

由 Gautham R Shenoy 提交于 4月 14, 2009

Currently the nomination of idle-load balancer is done by choosing the first
idle cpu in the nohz.cpu_mask. This may not be power-efficient, since
such an idle cpu could come from a completely idle core/package thereby
preventing the whole core/package from being in a low-power state.

For eg, consider a quad-core dual package system. The cpu numbering need
not be sequential and can something like [0, 2, 4, 6] and [1, 3, 5, 7].
With sched_mc/smt_power_savings and the power-aware IRQ balance, we try to keep
as fewer Packages/Cores active. But the current idle load balancer logic
goes against this by choosing the first_cpu in the nohz.cpu_mask and not
taking the system topology into consideration.

Improve the algorithm to nominate the idle load balancer from a semi idle
cores/packages thereby increasing the probability of the cores/packages being
in deeper sleep states for longer duration.

The algorithm is activated only when sched_mc/smt_power_savings != 0.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090414045530.7645.12175.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f711f609

tracing, sched: mark get_parent_ip() notrace · 132380a0

由 Lai Jiangshan 提交于 4月 02, 2009

Impact: remove overly redundant tracing entries

When tracer is "function" or "function_graph", way too much
"get_parent_ip" entries are recorded in ring_buffer.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Acked-by: NSteven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D458B1.5000703@cn.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

132380a0

07 4月, 2009 1 次提交

sched, hw-branch-tracer: add wait_task_context_switch() function to sched.h · a26b89f0

由 Markus Metzger 提交于 4月 03, 2009

Add a function to wait until some other task has been
switched out at least once.

This differs from wait_task_inactive() subtly, in that the
latter will wait until the task has left the CPU.
Signed-off-by: NMarkus Metzger <markus.t.metzger@intel.com>
Cc: markus.t.metzger@gmail.com
Cc: roland@redhat.com
Cc: eranian@googlemail.com
Cc: oleg@redhat.com
Cc: juan.villacis@intel.com
Cc: ak@linux.jf.intel.com
LKML-Reference: <20090403144549.794157000@intel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a26b89f0

01 4月, 2009 5 次提交

epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a

由 Davide Libenzi 提交于 3月 31, 2009

This patchset introduces wakeup hints for some of the most popular (from
epoll POV) devices, so that epoll code can avoid spurious wakeups on its
waiters.

The problem with epoll is that the callback-based wakeups do not, ATM,
carry any information about the events the wakeup is related to.  So the
only choice epoll has (not being able to call f_op->poll() from inside the
callback), is to add the file* to a ready-list and resolve the real events
later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
spurious wakeups, since the wake_up() itself might be for an event the
caller is not interested into.

The rate of these spurious wakeup can be pretty high in case of many
network sockets being monitored.

By allowing devices to report the events the wakeups refer to (at least
the two major classes - POLLIN/POLLOUT), we are able to spare useless
wakeups by proper handling inside the epoll's poll callback.

Epoll will have in any case to call f_op->poll() on the file* later on,
since the change to be done in order to have the full event set sent via
wakeup, is too invasive for the way our f_op->poll() system works (the
full event set is calculated inside the poll function - there are too many
of them to even start thinking the change - also poll/select would need
change too).

Epoll is changed in a way that both devices which send event hints, and
the ones that don't, are correctly handled.  The former will gain some
efficiency though.

As a general rule for devices, would be to add an event mask by using
key-aware wakeup macros, when making up poll wait queues.  I tested it
(together with the epoll's poll fix patch Andrew has in -mm) and wakeups
for the supported devices are correctly filtered.

Test program available here:

http://www.xmailserver.org/epoll_test.c

This patch:

Nothing revolutionary here.  Just using the available "key" that our
wakeup core already support.  The __wake_up_locked_key() was no brainer,
since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
around __wake_up_common().

The __wake_up_sync() function had a body, so the choice was between
borrowing the body for __wake_up_sync_key() and calling it from
__wake_up_sync(), or make an inline and calling it from both.  I chose the
former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: William Lee Irwin III <wli@movementarian.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ede816a

sched: Print sched_group::__cpu_power in sched_domain_debug · 46e0bb9c

由 Gautham R Shenoy 提交于 3月 30, 2009

Impact: extend debug info /proc/sched_debug

If the user changes the value of the sched_mc/smt_power_savings sysfs
tunable, it'll trigger a rebuilding of the whole sched_domain tree,
with the SD_POWERSAVINGS_BALANCE flag set at certain levels.

As a result, there would be a change in the __cpu_power of sched_groups
in the sched_domain hierarchy.

Print the __cpu_power values for each sched_group in sched_domain_debug
to help verify this change and correlate it with the change in the
load-balancing behavior.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090330045520.2869.24777.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

46e0bb9c

cpuacct: add per-cgroup utime/stime statistics · ef12fefa

由 Bharata B Rao 提交于 3月 31, 2009

Add per-cgroup cpuacct controller statistics like the system and user
time consumed by the group of tasks.

Changelog:

v7
- Changed the name of the statistic from utime to user and from stime to
  system so that in future we could easily add other statistics like irq,
  softirq, steal times etc easily.

v6
- Fixed a bug in the error path of cpuacct_create() (pointed by Li Zefan).

v5
- In cpuacct_stats_show(), use cputime64_to_clock_t() since we are
  operating on a 64bit variable here.

v4
- Remove comments in cpuacct_update_stats() which explained why rcu_read_lock()
  was needed (as per Peter Zijlstra's review comments).
- Don't say that percpu_counter_read() is broken in Documentation/cpuacct.txt
  as per KAMEZAWA Hiroyuki's review comments.

v3
- Fix a small race in the cpuacct hierarchy walk.

v2
- stime and utime now exported in clock_t units instead of msecs.
- Addressed the code review comments from Balbir and Li Zefan.
- Moved to -tip tree.

v1
- Moved the stime/utime accounting to cpuacct controller.

Earlier versions
- http://lkml.org/lkml/2009/2/25/129Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NBalaji Rao <balajirrao@gmail.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
LKML-Reference: <20090331043222.GA4093@in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ef12fefa

posixtimers, sched: Fix posix clock monotonicity · c5f8d995

由 Hidetoshi Seto 提交于 3月 31, 2009

Impact: Regression fix (against clock_gettime() backwarding bug)

This patch re-introduces a couple of functions, task_sched_runtime
and thread_group_sched_runtime, which was once removed at the
time of 2.6.28-rc1.

These functions protect the sampling of thread/process clock with
rq lock.  This rq lock is required not to update rq->clock during
the sampling.

i.e.
  The clock_gettime() may return
   ((accounted runtime before update) + (delta after update))
  that is less than what it should be.

v2 -> v3:
	- Rename static helper function __task_delta_exec()
	  to do_task_delta_exec() since -tip tree already has
	  a __task_delta_exec() of different version.

v1 -> v2:
	- Revises comments of function and patch description.
	- Add note about accuracy of thread group's runtime.
Signed-off-by: NHidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org	[2.6.28.x][2.6.29.x]
LKML-Reference: <49D1CC93.4080401@jp.fujitsu.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c5f8d995

cpuacct: make cpuacct hierarchy walk in cpuacct_charge() safe when rcupreempt is used -v2 · a18b83b7

由 Bharata B Rao 提交于 3月 23, 2009

Impact: fix cgroups race under rcu-preempt

cpuacct_charge() obtains task's ca and does a hierarchy walk upwards.
This can race with the task's movement between cgroups. This race
can cause an access to freed ca pointer in cpuacct_charge() or access
to invalid cgroups pointer of the task. This will not happen with rcu or
tree rcu as cpuacct_charge() is called with preemption disabled. However if
rcupreempt is used, the race is seen. Thanks to Li Zefan for explaining this.

Fix this race by explicitly protecting ca and the hierarchy walk with
rcu_read_lock().

Changes for v2:

 - Update patch descrition (as per Li Zefan's review comments).

 - Remove comments in cpuacct_charge() which explained why rcu_read_lock()
   was needed (as per Peter Zijlstra's review comments).
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a18b83b7

31 3月, 2009 1 次提交

hrtimer: fix rq->lock inversion (again) · 7f1e2ca9

由 Peter Zijlstra 提交于 3月 13, 2009

It appears I inadvertly introduced rq->lock recursion to the
hrtimer_start() path when I delegated running already expired
timers to softirq context.

This patch fixes it by introducing a __hrtimer_start_range_ns()
method that will not use raise_softirq_irqoff() but
__raise_softirq_irqoff() which avoids the wakeup.

It then also changes schedule() to check for pending softirqs and
do the wakeup then, I'm not quite sure I like this last bit, nor
am I convinced its really needed.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
LKML-Reference: <20090313112301.096138802@chello.nl>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7f1e2ca9

30 3月, 2009 3 次提交

Revert "cpuacct: reduce one NULL check in fast-path" · 5a3c8fe7

由 Ingo Molnar 提交于 3月 30, 2009

This reverts commit 7a46c594.

This was applied to the x86 tree mistakenly, it belongs into the
scheduler tree.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5a3c8fe7

x86/paravirt: finish change from lazy cpu to context switch start/end · 224101ed

由 Jeremy Fitzhardinge 提交于 2月 18, 2009

Impact: fix lazy context switch API

Pass the previous and next tasks into the context switch start
end calls, so that the called functions can properly access the
task state (esp in end_context_switch, in which the next task
is not yet completely current).
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

224101ed

x86/pvops: replace arch_enter_lazy_cpu_mode with arch_start_context_switch · 7fd7d83d

由 Jeremy Fitzhardinge 提交于 2月 17, 2009

Impact: simplification, prepare for later changes

Make lazy cpu mode more specific to context switching, so that
it makes sense to do more context-switch specific things in
the callbacks.
Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

7fd7d83d

29 3月, 2009 1 次提交

sched: fix errors in struct & function comments · d5ac537e

由 Randy Dunlap 提交于 3月 28, 2009

Fix kernel-doc errors in sched.c:  the structs don't have
kernel-doc notation and the short function description needs to
be one line only.

  Error(kernel/sched.c:3197): cannot understand prototype: 'struct sd_lb_stats '
  Error(kernel/sched.c:3228): cannot understand prototype: 'struct sg_lb_stats '
  Error(kernel/sched.c:3375): duplicate section name 'Description'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5ac537e

25 3月, 2009 10 次提交

sched: Add comments to find_busiest_group() function · b7bb4c9b

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Add /** style comments around find_busiest_group(). Also add a few
explanatory comments.

This concludes the find_busiest_group() cleanup. The function is
now down to 72 lines from the original 313 lines.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b7bb4c9b

sched: Refactor the power savings balance code · c071df18

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create seperate helper functions to initialize the
power-savings-balance related variables, to update them and
to check if we have a scope for performing power-savings balance.

Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
case.

This will eliminate all the #ifdef jungle in find_busiest_group() and the
other helper functions.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c071df18

sched: Optimize the !power_savings_balance during fbg() · a021dc03

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup, micro-optimization

We don't need to perform power_savings balance if either the
cpu is NOT_IDLE or if the sched_domain doesn't contain the
SD_POWERSAVINGS_BALANCE flag set.

Currently, we check for these conditions multiple number of
times, even though these variables don't change over the scope
of find_busiest_group().

Check once, and store the value in the already exiting
"power_savings_balance" variable.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a021dc03

sched: Create a helper function to calculate imbalance · dbc523a3

由 Gautham R Shenoy 提交于 3月 25, 2009

Move all the imbalance calculation out of find_busiest_group()
through this helper function.

With this change, the structure of find_busiest_group() will be
as follows:

- update_sched_domain_statistics.

- check if imbalance exits.

- update imbalance and return busiest.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dbc523a3

sched: Create helper to calculate small_imbalance in fbg() · 2e6f44ae

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

We have two places in find_busiest_group() where we need to calculate
the minor imbalance before returning the busiest group. Encapsulate
this functionality into a seperate helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e6f44ae

sched: Create a helper function to calculate sched_domain stats for fbg() · 37abe198

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create a helper function named update_sd_lb_stats() to update the
various sched_domain related statistics in find_busiest_group().

With this we would have moved all the statistics computation out of
find_busiest_group().

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

37abe198

sched: Define structure to store the sched_domain statistics for fbg() · 222d656d

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Currently we use a lot of local variables in find_busiest_group()
to capture the various statistics related to the sched_domain.
Group them together into a single data structure.

This will help us to offload the job of updating the sched_domain
statistics to a helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

222d656d

sched: Create a helper function to calculate sched_group stats for fbg() · 1f8c553d

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create a helper function named update_sg_lb_stats() which
can be invoked to calculate the individual group's statistics
in find_busiest_group().

This reduces the lenght of find_busiest_group() considerably.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Aked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1f8c553d

sched: Define structure to store the sched_group statistics for fbg() · 381be78f

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Currently a whole bunch of variables are used to store the
various statistics pertaining to the groups we iterate over
in find_busiest_group().

Group them together in a single data structure and add
appropriate comments.

This will be useful later on when we create helper functions
to calculate the sched_group statistics.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

381be78f

sched: Fix indentations in find_busiest_group() using gotos · 6dfdb062

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Some indentations in find_busiest_group() can minimized by using
early exits with the help of gotos. This improves readability in
a couple of cases.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6dfdb062

OpenHarmony / kernel_linux 上一次同步 大约 4 年

OpenHarmony / kernel_linux
上一次同步大约 4 年