提交 · 833bb3046b6cb320e775ea2160ddca87d53260d5 · openeuler / raspberrypi-kernel

01 4月, 2009 4 次提交

epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a

由 Davide Libenzi 提交于 3月 31, 2009

This patchset introduces wakeup hints for some of the most popular (from
epoll POV) devices, so that epoll code can avoid spurious wakeups on its
waiters.

The problem with epoll is that the callback-based wakeups do not, ATM,
carry any information about the events the wakeup is related to.  So the
only choice epoll has (not being able to call f_op->poll() from inside the
callback), is to add the file* to a ready-list and resolve the real events
later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
spurious wakeups, since the wake_up() itself might be for an event the
caller is not interested into.

The rate of these spurious wakeup can be pretty high in case of many
network sockets being monitored.

By allowing devices to report the events the wakeups refer to (at least
the two major classes - POLLIN/POLLOUT), we are able to spare useless
wakeups by proper handling inside the epoll's poll callback.

Epoll will have in any case to call f_op->poll() on the file* later on,
since the change to be done in order to have the full event set sent via
wakeup, is too invasive for the way our f_op->poll() system works (the
full event set is calculated inside the poll function - there are too many
of them to even start thinking the change - also poll/select would need
change too).

Epoll is changed in a way that both devices which send event hints, and
the ones that don't, are correctly handled.  The former will gain some
efficiency though.

As a general rule for devices, would be to add an event mask by using
key-aware wakeup macros, when making up poll wait queues.  I tested it
(together with the epoll's poll fix patch Andrew has in -mm) and wakeups
for the supported devices are correctly filtered.

Test program available here:

http://www.xmailserver.org/epoll_test.c

This patch:

Nothing revolutionary here.  Just using the available "key" that our
wakeup core already support.  The __wake_up_locked_key() was no brainer,
since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
around __wake_up_common().

The __wake_up_sync() function had a body, so the choice was between
borrowing the body for __wake_up_sync_key() and calling it from
__wake_up_sync(), or make an inline and calling it from both.  I chose the
former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: William Lee Irwin III <wli@movementarian.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ede816a

pm: rework includes, remove arch ifdefs · a8af7898

由 Magnus Damm 提交于 3月 31, 2009

Make the following header file changes:

 - remove arch ifdefs and asm/suspend.h from linux/suspend.h
 - add asm/suspend.h to disk.c (for arch_prepare_suspend())
 - add linux/io.h to swsusp.c (for ioremap())
 - x86 32/64 bit compile fixes
Signed-off-by: NMagnus Damm <damm@igel.co.jp>
Cc: Paul Mundt <lethal@linux-sh.org>
Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8af7898

mm: fix proc_dointvec_userhz_jiffies "breakage" · 704503d8

由 Alexey Dobriyan 提交于 3月 31, 2009

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838

On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange
way from the user's point of view:

	# echo 500 >/proc/sys/vm/dirty_writeback_centisecs
	# cat /proc/sys/vm/dirty_writeback_centisecs
	499

So, we have 5000 jiffies converted to only 499 clock ticks and reported
back.

TICK_NSEC = 999848
ACTHZ = 256039

Keeping in-kernel variable in units passed from userspace will fix issue
of course, but this probably won't be right for every sysctl.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

704503d8

mm: introduce for_each_populated_zone() macro · ee99c71c

由 KOSAKI Motohiro 提交于 3月 31, 2009

Impact: cleanup

In almost cases, for_each_zone() is used with populated_zone().  It's
because almost function doesn't need memoryless node information.
Therefore, for_each_populated_zone() can help to make code simplify.

This patch has no functional change.

[akpm@linux-foundation.org: small cleanup]
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee99c71c

31 3月, 2009 6 次提交

lockdep: fix deadlock in lockdep_trace_alloc · 2f850181

由 Peter Zijlstra 提交于 3月 20, 2009

Heiko reported that we grab the graph lock with irqs enabled.

Fix this by providng the same wrapper as all other lockdep entry
functions have.
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1237544000.24626.52.camel@twins>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2f850181

kexec: Change kexec jump code ordering · 749b0afc

由 Rafael J. Wysocki 提交于 3月 16, 2009

Change the ordering of the kexec jump code so that the nonboot CPUs
are disabled after calling device drivers' "late suspend" methods.

This change reflects the recent modifications of the power management
code that is also used by kexec jump.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NIngo Molnar <mingo@elte.hu>

749b0afc

PM: Change hibernation code ordering · 4aecd671

由 Rafael J. Wysocki 提交于 3月 16, 2009

Change the ordering of the hibernation core code so that the platform
"prepare" callbacks are executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change (along with the previous analogous change of the suspend
core code) will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NIngo Molnar <mingo@elte.hu>

4aecd671

PM: Change suspend code ordering · 900af0d9

由 Rafael J. Wysocki 提交于 3月 16, 2009

Change the ordering of the suspend core code so that the platform
"prepare" callback is executed and the nonboot CPUs are disabled
after calling device drivers' "late suspend" methods.

This change will allow us to rework the PCI PM core so that the power
state of devices is changed in the "late" phase of suspend (and
analogously in the "early" phase of resume), which in turn will allow
us to avoid the race condition where a device using shared interrupts
is put into a low power state with interrupts enabled and then an
interrupt (for another device) comes in and confuses its driver.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NIngo Molnar <mingo@elte.hu>

900af0d9

PM: Rework handling of interrupts during suspend-resume · 2ed8d2b3

由 Rafael J. Wysocki 提交于 3月 16, 2009

Use the functions introduced in by the previous patch,
suspend_device_irqs(), resume_device_irqs() and check_wakeup_irqs(),
to rework the handling of interrupts during suspend (hibernation) and
resume.  Namely, interrupts will only be disabled on the CPU right
before suspending sysdevs, while device drivers will be prevented
from receiving interrupts, with the help of the new helper function,
before their "late" suspend callbacks run (and analogously during
resume).

In addition, since the device interrups are now disabled before the
CPU has turned all interrupts off and the CPU will ACK the interrupts
setting the IRQ_PENDING bit for them, check in sysdev_suspend() if
any wake-up interrupts are pending and abort suspend if that's the
case.
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NIngo Molnar <mingo@elte.hu>

2ed8d2b3

PM: Introduce functions for suspending and resuming device interrupts · 0a0c5168

由 Rafael J. Wysocki 提交于 3月 16, 2009

Introduce helper functions allowing us to prevent device drivers from
getting any interrupts (without disabling interrupts on the CPU)
during suspend (or hibernation) and to make them start to receive
interrupts again during the subsequent resume. These functions make it
possible to keep timer interrupts enabled while the "late" suspend and
"early" resume callbacks provided by device drivers are being
executed. In turn, this allows device drivers' "late" suspend and
"early" resume callbacks to sleep, execute ACPI callbacks etc.

The functions introduced here will be used to rework the handling of
interrupts during suspend (hibernation) and resume. Namely,
interrupts will only be disabled on the CPU right before suspending
sysdevs, while device drivers will be prevented from receiving
interrupts, with the help of the new helper function, before their
"late" suspend callbacks run (and analogously during resume).
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NIngo Molnar <mingo@elte.hu>

0a0c5168

30 3月, 2009 7 次提交

Revert "cpuacct: reduce one NULL check in fast-path" · 5a3c8fe7

由 Ingo Molnar 提交于 3月 30, 2009

This reverts commit 7a46c594.

This was applied to the x86 tree mistakenly, it belongs into the
scheduler tree.
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5a3c8fe7

cpumask: remove cpumask_t from core · 612a726f

由 Rusty Russell 提交于 3月 30, 2009

Impact: cleanup

struct cpumask is nicer, and we use it to make where we've made code
safe for CONFIG_CPUMASK_OFFSTACK=y.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

612a726f

cpumask: convert rcutorture.c · 73d0a4b1

由 Rusty Russell 提交于 3月 30, 2009

We're getting rid of cpumasks on the stack.

Simply change tmp_mask to a global, and allocate it in
rcu_torture_init().
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@freedesktop.org>

73d0a4b1

cpumask: use new cpumask_ functions in core code. · aa85ea5b

由 Rusty Russell 提交于 3月 30, 2009

Impact: cleanup

Time to clean up remaining laggards using the old cpu_ functions.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Trond.Myklebust@netapp.com

aa85ea5b

cpumask: use mm_cpumask() wrapper: kernel/fork.c · 94894244

由 Rusty Russell 提交于 3月 30, 2009

Impact: futureproof

Makes code futureproof against the impending change to mm->cpu_vm_mask.

It's also a chance to use the new cpumask_ ops which take a pointer.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

94894244

cpumask: use set_cpu_active in init/main.c · 2b17fa50

由 Rusty Russell 提交于 3月 30, 2009

cpu_active_map is deprecated in favor of cpu_active_mask, which is
const for safety: we use accessors now (set_cpu_active) is we really
want to make a change.
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

2b17fa50

cpumask: remove dangerous CPU_MASK_ALL_PTR, &CPU_MASK_ALL · 1a2142af

由 Rusty Russell 提交于 3月 30, 2009

Impact: cleanup

(Thanks to Al Viro for reminding me of this, via Ingo)

CPU_MASK_ALL is the (deprecated) "all bits set" cpumask, defined as so:

	#define CPU_MASK_ALL (cpumask_t) { { ... } }

Taking the address of such a temporary is questionable at best,
unfortunately 321a8e9d (cpumask: add CPU_MASK_ALL_PTR macro) added
CPU_MASK_ALL_PTR:

	#define CPU_MASK_ALL_PTR (&CPU_MASK_ALL)

Which formalizes this practice.  One day gcc could bite us over this
usage (though we seem to have gotten away with it so far).

So replace everywhere which used &CPU_MASK_ALL or CPU_MASK_ALL_PTR
with the modern "cpu_all_mask" (a real const struct cpumask *).
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Acked-by: NIngo Molnar <mingo@elte.hu>
Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Mike Travis <travis@sgi.com>

1a2142af

29 3月, 2009 2 次提交

sched: fix errors in struct & function comments · d5ac537e

由 Randy Dunlap 提交于 3月 28, 2009

Fix kernel-doc errors in sched.c:  the structs don't have
kernel-doc notation and the short function description needs to
be one line only.

  Error(kernel/sched.c:3197): cannot understand prototype: 'struct sd_lb_stats '
  Error(kernel/sched.c:3228): cannot understand prototype: 'struct sg_lb_stats '
  Error(kernel/sched.c:3375): duplicate section name 'Description'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5ac537e

async: remove the temporary (2.6.29) "async is off by default" code · 97107943

由 Arjan van de Ven 提交于 3月 15, 2009

Now that everyone has been able to test the async code (and it's being used
in the Moblin betas by default), we can enable it by default.
The various fixes needed have gone into 2.6.29 already.

[With an important bugfix from Stefan Richter]
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>

97107943

28 3月, 2009 2 次提交

vfs: simple_set_mnt() should return void · a3ec947c

由 Sukadev Bhattiprolu 提交于 3月 04, 2009

simple_set_mnt() is defined as returning 'int' but always returns 0.
Callers assume simple_set_mnt() never fails and don't properly cleanup if
it were to _ever_ fail.  For instance, get_sb_single() and get_sb_nodev()
should:

        up_write(sb->s_unmount);
        deactivate_super(sb);

if simple_set_mnt() fails.

Since simple_set_mnt() never fails, would be cleaner if it did not
return anything.

[akpm@linux-foundation.org: fix build]
Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a3ec947c

A
constify dentry_operations: rest · 3ba13d17
由 Al Viro 提交于 2月 20, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
3ba13d17

25 3月, 2009 13 次提交

sched: Add comments to find_busiest_group() function · b7bb4c9b

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Add /** style comments around find_busiest_group(). Also add a few
explanatory comments.

This concludes the find_busiest_group() cleanup. The function is
now down to 72 lines from the original 313 lines.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091427.13992.18933.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

b7bb4c9b

sched: Refactor the power savings balance code · c071df18

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create seperate helper functions to initialize the
power-savings-balance related variables, to update them and
to check if we have a scope for performing power-savings balance.

Add no-op inline functions for the !(CONFIG_SCHED_MC || CONFIG_SCHED_SMT)
case.

This will eliminate all the #ifdef jungle in find_busiest_group() and the
other helper functions.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091422.13992.73616.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c071df18

sched: Optimize the !power_savings_balance during fbg() · a021dc03

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup, micro-optimization

We don't need to perform power_savings balance if either the
cpu is NOT_IDLE or if the sched_domain doesn't contain the
SD_POWERSAVINGS_BALANCE flag set.

Currently, we check for these conditions multiple number of
times, even though these variables don't change over the scope
of find_busiest_group().

Check once, and store the value in the already exiting
"power_savings_balance" variable.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091417.13992.2657.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a021dc03

sched: Create a helper function to calculate imbalance · dbc523a3

由 Gautham R Shenoy 提交于 3月 25, 2009

Move all the imbalance calculation out of find_busiest_group()
through this helper function.

With this change, the structure of find_busiest_group() will be
as follows:

- update_sched_domain_statistics.

- check if imbalance exits.

- update imbalance and return busiest.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091411.13992.43293.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

dbc523a3

sched: Create helper to calculate small_imbalance in fbg() · 2e6f44ae

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

We have two places in find_busiest_group() where we need to calculate
the minor imbalance before returning the busiest group. Encapsulate
this functionality into a seperate helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091406.13992.54316.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2e6f44ae

sched: Create a helper function to calculate sched_domain stats for fbg() · 37abe198

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create a helper function named update_sd_lb_stats() to update the
various sched_domain related statistics in find_busiest_group().

With this we would have moved all the statistics computation out of
find_busiest_group().

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091401.13992.88737.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

37abe198

sched: Define structure to store the sched_domain statistics for fbg() · 222d656d

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Currently we use a lot of local variables in find_busiest_group()
to capture the various statistics related to the sched_domain.
Group them together into a single data structure.

This will help us to offload the job of updating the sched_domain
statistics to a helper function.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091356.13992.25970.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

222d656d

sched: Create a helper function to calculate sched_group stats for fbg() · 1f8c553d

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Create a helper function named update_sg_lb_stats() which
can be invoked to calculate the individual group's statistics
in find_busiest_group().

This reduces the lenght of find_busiest_group() considerably.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Aked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091351.13992.43461.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

1f8c553d

sched: Define structure to store the sched_group statistics for fbg() · 381be78f

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Currently a whole bunch of variables are used to store the
various statistics pertaining to the groups we iterate over
in find_busiest_group().

Group them together in a single data structure and add
appropriate comments.

This will be useful later on when we create helper functions
to calculate the sched_group statistics.

Credit: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
LKML-Reference: <20090325091345.13992.20099.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

381be78f

sched: Fix indentations in find_busiest_group() using gotos · 6dfdb062

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Some indentations in find_busiest_group() can minimized by using
early exits with the help of gotos. This improves readability in
a couple of cases.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091340.13992.45062.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6dfdb062

sched: Simple helper functions for find_busiest_group() · 67bb6c03

由 Gautham R Shenoy 提交于 3月 25, 2009

Impact: cleanup

Currently the load idx calculation code is in find_busiest_group().
Move that to a static inline helper function.

Similary, to find the first cpu of a sched_group we use
cpumask_first(sched_group_cpus(group))

Use a helper to that. It improves readability in some cases.
Signed-off-by: NGautham R Shenoy <ego@in.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "Balbir Singh" <balbir@in.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: "Dhaval Giani" <dhaval@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Cc: "Vaidyanathan Srinivasan" <svaidy@linux.vnet.ibm.com>
LKML-Reference: <20090325091335.13992.55424.stgit@sofia.in.ibm.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

67bb6c03

dynamic debug: combine dprintk and dynamic printk · e9d376f0

由 Jason Baron 提交于 2月 05, 2009

This patch combines Greg Bank's dprintk() work with the existing dynamic
printk patchset, we are now calling it 'dynamic debug'.

The new feature of this patchset is a richer /debugfs control file interface,
(an example output from my system is at the bottom), which allows fined grained
control over the the debug output. The output can be controlled by function,
file, module, format string, and line number.

for example, enabled all debug messages in module 'nf_conntrack':

echo -n 'module nf_conntrack +p' > /mnt/debugfs/dynamic_debug/control

to disable them:

echo -n 'module nf_conntrack -p' > /mnt/debugfs/dynamic_debug/control

A further explanation can be found in the documentation patch.
Signed-off-by: NGreg Banks <gnb@sgi.com>
Signed-off-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>

e9d376f0

sched: remove unused fields from struct rq · 67aa0f76

由 Luis Henriques 提交于 3月 24, 2009

Impact: cleanup, new schedstat ABI

Since they are used on in statistics and are always set to zero, the
following fields from struct rq have been removed: yld_exp_empty,
yld_act_empty and yld_both_empty.

Both Sched Debug and SCHEDSTAT_VERSION versions has also been
incremented since ABIs have been changed.

The schedtop tool has been updated to properly handle new version of
schedstat:

   http://rt.wiki.kernel.org/index.php/Schedtop_utilitySigned-off-by: NLuis Henriques <henrix@sapo.pt>
Acked-by: NGregory Haskins <ghaskins@novell.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090324221002.GA10061@hades.domain.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

67aa0f76

24 3月, 2009 2 次提交

posix timers: fix RLIMIT_CPU && fork() · 37bebc70

由 Oleg Nesterov 提交于 3月 23, 2009

See http://bugzilla.kernel.org/show_bug.cgi?id=12911

copy_signal() copies signal->rlim, but RLIMIT_CPU is "lost". Because
posix_cpu_timers_init_group() sets cputime_expires.prof_exp = 0 and thus
fastpath_timer_check() returns false unless we have other cpu timers.

This is the minimal fix for 2.6.29 (tested) and 2.6.28. The patch is not
optimal, we need further cleanups here. With this patch update_rlimit_cpu()
is not really needed, but I don't think it should be removed.

The proper fix (I think) is:

	- set_process_cpu_timer() should just start the cputimer->running
	  logic (it does), no need to change cputime_expires.xxx_exp

	- posix_cpu_timers_init_group() should set ->running when needed

	- fastpath_timer_check() can check ->running instead of
	  task_cputime_zero(signal->cputime_expires)
Reported-by: NPeter Lojkin <ia6432@inbox.ru>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: <stable@kernel.org> [for 2.6.29.x]
LKML-Reference: <20090323193411.GA17514@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

37bebc70

fix ptrace slowness · 53da1d94

由 Miklos Szeredi 提交于 3月 23, 2009

This patch fixes bug #12208:

  Bug-Entry       : http://bugzilla.kernel.org/show_bug.cgi?id=12208
  Subject         : uml is very slow on 2.6.28 host

This turned out to be not a scheduler regression, but an already
existing problem in ptrace being triggered by subtle scheduler
changes.

The problem is this:

 - task A is ptracing task B
 - task B stops on a trace event
 - task A is woken up and preempts task B
 - task A calls ptrace on task B, which does ptrace_check_attach()
 - this calls wait_task_inactive(), which sees that task B is still on the runq
 - task A goes to sleep for a jiffy
 - ...

Since UML does lots of the above sequences, those jiffies quickly add
up to make it slow as hell.

This patch solves this by not rescheduling in read_unlock() after
ptrace_stop() has woken up the tracer.

Thanks to Oleg Nesterov and Ingo Molnar for the feedback.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

53da1d94

18 3月, 2009 2 次提交

sched: jiffies not printed per CPU · af66df5e

由 Luis Henriques 提交于 3月 18, 2009

The jiffies value was being printed for each CPU, which does not seem to make
sense.  Moved jiffies to system section.
Signed-off-by: NLuis Henriques <henrix@sapo.pt>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
LKML-Reference: <20090318000425.GA2228@hades.domain.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

af66df5e

module: fix refptr allocation and release order · 6e2b7574

由 Masami Hiramatsu 提交于 3月 16, 2009

Impact: fix ref-after-free crash on failed module load

Fix refptr bug: Change refptr allocation and release order not to access a module
data structure pointed by 'mod' after freeing mod->module_core.
This bug will cause kernel panic(e.g. failed to find undefined symbols).

This bug was reported on systemtap bugzilla.
http://sources.redhat.com/bugzilla/show_bug.cgi?id=9927Signed-off-by: NMasami Hiramatsu <mhiramat@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>

6e2b7574

17 3月, 2009 2 次提交

sched: small optimisation of can_migrate_task() · 708dc512

由 Luis Henriques 提交于 3月 16, 2009

There were 3 invocations of task_hot() in can_migrate_task().

Replace these 3 invocations by only one invocation, cached in
a local variable.
Signed-off-by: NLuis Henriques <henrix@sapo.pt>
LKML-Reference: <20090316195902.GA6197@hades.domain.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

708dc512

sched: fix typos in documentation · 80dd99b3

由 Luis Henriques 提交于 3月 16, 2009

Fixed typos in function documentation.
Signed-off-by: NLuis Henriques <henrix@sapo.pt>
LKML-Reference: <20090316195809.GA6073@hades.domain.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

80dd99b3