提交 · 866ab43efd325fae8889ea77a744d03f2b957e38 · openeuler / raspberrypi-kernel

23 2月, 2011 3 次提交

sched: Fix the group_imb logic · 866ab43e

由 Peter Zijlstra 提交于 2月 21, 2011

On a 2*6*2 machine something like:

 taskset -c 3-11 bash -c 'for ((i=0;i<9;i++)) do while :; do :; done & done'

_should_ result in 9 busy CPUs, each running 1 task.

However it didn't quite work reliably, most of the time one cpu of the
second socket (6-11) would be idle and one cpu of the first socket
(0-5) would have two tasks on it.

The group_imb logic is supposed to deal with this and detect when a
particular group is imbalanced (like in our case, 0-2 are idle but 3-5
will have 4 tasks on it).

The detection phase needed a bit of a tweak as it was too weak and
required more than 2 avg weight tasks difference between idle and busy
cpus in the group which won't trigger for our test-case. So cure that
to be one or more avg task weight difference between cpus.

Once the detection phase worked, it was then defeated by the f_b_g()
tests trying to avoid ping-pongs. In particular, this_load >= max_load
triggered because the pulling cpu (the (first) idle cpu in on the
second socket, say 6) would find this_load to be 5 and max_load to be
4 (there'd be 5 tasks running on our socket and only 4 on the other
socket).
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

866ab43e

sched: Clean up some f_b_g() comments · cc57aa8f

由 Peter Zijlstra 提交于 2月 21, 2011

The existing comment tends to grow state (as it already has), split it
up and place it near the actual tests.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

cc57aa8f

sched: Clean up remnants of sd_idle · c186fafe

由 Peter Zijlstra 提交于 2月 21, 2011

With the wholesale removal of the sd_idle SMT logic we can clean up
some more.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Nikhil Rao <ncrao@google.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c186fafe

17 2月, 2011 3 次提交

PM / Hibernate: Return error code when alloc_image_page() fails · 2e725a06

由 Stanislaw Gruszka 提交于 2月 12, 2011

Currently we return 0 in swsusp_alloc() when alloc_image_page() fails.
Fix that.  Also remove unneeded "error" variable since the only
useful value of error is -ENOMEM.

[rjw: Fixed up the changelog and changed subject.]
Signed-off-by: NStanislaw Gruszka <stf_xl@wp.pl>
Cc: stable@kernel.org
Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>

2e725a06

workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long · 3233cdbd

由 Tejun Heo 提交于 2月 16, 2011

MAYDAY_INITIAL_TIMEOUT is defined as HZ / 100 and depending on
configuration may end up 0 or 1.  Even when it's 1, depending on when
the mayday timer is added in the current jiffy interval, it may expire
way before a jiffy has passed.

Make sure MAYDAY_INITIAL_TIMEOUT is at least two to guarantee that at
least a full jiffy has passed before calling rescuers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRay Jui <rjui@broadcom.com>
Cc: stable@kernel.org

3233cdbd

workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable' · 58a69cb4

由 Tejun Heo 提交于 2月 16, 2011

There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'.  The former is the more prominent one.  The latter is
mostly used by workqueue and in a few other odd places.  Unify the
spelling to 'freezable'.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAlan Stern <stern@rowland.harvard.edu>
Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Acked-by: NDmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>

58a69cb4

16 2月, 2011 1 次提交

sched: Wholesale removal of sd_idle logic · 46e49b38

由 Venkatesh Pallipadi 提交于 2月 14, 2011

sd_idle logic was introduced way back in 2005 (commit 5969fe06),
as an HT optimization.

As per the discussion in the thread here:

  lkml - sched: Resolve sd_idle and first_idle_cpu Catch-22 - v1
  https://patchwork.kernel.org/patch/532501/

The capacity based logic in the load balancer right now handles this
in a much cleaner way, handling more than 2 SMT siblings etc, and sd_idle
does not seem to bring any additional benefits. sd_idle logic also has
some bugs that has performance impact. Here is the patch that removes
the sd_idle logic altogether.

Also, there was a dependency of sched_mc_power_savings == 2, with sd_idle
logic.
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Acked-by: NVaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1297723130-693-1-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

46e49b38

14 2月, 2011 1 次提交

workqueue: wake up a worker when a rescuer is leaving a gcwq · 7576958a

由 Tejun Heo 提交于 2月 14, 2011

After executing the matching works, a rescuer leaves the gcwq whether
there are more pending works or not.  This may decrease the
concurrency level to zero and stall execution until a new work item is
queued on the gcwq.

Make rescuer wake up a regular worker when it leaves a gcwq if there
are more works to execute, so that execution isn't stalled.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NRay Jui <rjui@broadcom.com>
Cc: stable@kernel.org

7576958a

12 2月, 2011 2 次提交

timer debug: Hide kernel addresses via %pK in /proc/timer_list · f5903085

由 Kees Cook 提交于 2月 11, 2011

In the continuing effort to avoid kernel addresses leaking to
unprivileged users, this patch switches to %pK for
/proc/timer_list reporting.
Signed-off-by: NKees Cook <kees.cook@canonical.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20110212032125.GA23571@outflux.net>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

f5903085

ptrace: use safer wake up on ptrace_detach() · 01e05e9a

由 Tejun Heo 提交于 2月 10, 2011

The wake_up_process() call in ptrace_detach() is spurious and not
interlocked with the tracee state.  IOW, the tracee could be running or
sleeping in any place in the kernel by the time wake_up_process() is
called.  This can lead to the tracee waking up unexpectedly which can be
dangerous.

The wake_up is spurious and should be removed but for now reduce its
toxicity by only waking up if the tracee is in TRACED or STOPPED state.

This bug can possibly be used as an attack vector.  I don't think it
will take too much effort to come up with an attack which triggers oops
somewhere.  Most sleeps are wrapped in condition test loops and should
be safe but we have quite a number of places where sleep and wakeup
conditions are expected to be interlocked.  Although the window of
opportunity is tiny, ptrace can be used by non-privileged users and with
some loading the window can definitely be extended and exploited.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NRoland McGrath <roland@redhat.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

01e05e9a

11 2月, 2011 2 次提交

security: add cred argument to security_capable() · 6037b715

由 Chris Wright 提交于 2月 09, 2011

Expand security_capable() to include cred, so that it can be usable in a
wider range of call sites.
Signed-off-by: NChris Wright <chrisw@sous-sol.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NJames Morris <jmorris@namei.org>

6037b715

cap_syslog: accept CAP_SYS_ADMIN for now · ee24aebf

由 Linus Torvalds 提交于 2月 10, 2011

In commit ce6ada35 ("security: Define CAP_SYSLOG") Serge Hallyn
introduced CAP_SYSLOG, but broke backwards compatibility by no longer
accepting CAP_SYS_ADMIN as an override (it would cause a warning and
then reject the operation).

Re-instate CAP_SYS_ADMIN - but keeping the warning - as an acceptable
capability until any legacy applications have been updated.  There are
apparently applications out there that drop all capabilities except for
CAP_SYS_ADMIN in order to access the syslog.

(This is a re-implementation of a patch by Serge, cleaning the logic up
and making the code more readable)
Acked-by: NSerge Hallyn <serge@hallyn.com>
Reviewed-by: NJames Morris <jmorris@namei.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee24aebf

10 2月, 2011 1 次提交

watchdog, nmi: Lower the severity of error messages · 5651f7f4

由 Don Zickus 提交于 2月 09, 2011

During boot if the hardlockup detector fails to initialize, it
complains very loudly.  Some failures should be expected under
certain situations, ie no lapics, or resource in-use.  Tone
those error messages down a bit.  Keep the rest at a high level.
Reported-by: NPaul Bolle <pebolle@tiscali.nl>
Tested-by: NPaul Bolle <pebolle@tiscali.nl>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <1297278153-21111-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

5651f7f4

08 2月, 2011 3 次提交

Revert "lockdep, timer: Fix del_timer_sync() annotation" · 7ff20792

由 Peter Zijlstra 提交于 2月 08, 2011

Both attempts at trying to allow softirq usage for
del_timer_sync() failed (produced bogus warnings),
so revert the commit for this release:

  f266a511: lockdep, timer: Fix del_timer_sync() annotation

and try again later.
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1297174680.13327.107.camel@laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

7ff20792

CRED: Fix memory and refcount leaks upon security_prepare_creds() failure · fb2b2a1d

由 Tetsuo Handa 提交于 2月 07, 2011

In prepare_kernel_cred() since 2.6.29, put_cred(new) is called without
assigning new->usage when security_prepare_creds() returned an error. As a
result, memory for new and refcount for new->{user,group_info,tgcred} are
leaked because put_cred(new) won't call __put_cred() unless old->usage == 1.

Fix these leaks by assigning new->usage (and new->subscribers which was added
in 2.6.32) before calling security_prepare_creds().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fb2b2a1d

CRED: Fix BUG() upon security_cred_alloc_blank() failure · 2edeaa34

由 Tetsuo Handa 提交于 2月 07, 2011

In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with
new->security == NULL and new->magic == 0 when security_cred_alloc_blank()
returns an error.  As a result, BUG() will be triggered if SELinux is enabled
or CONFIG_DEBUG_CREDENTIALS=y.

If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because
cred->magic == 0.  Failing that, BUG() is called from selinux_cred_free()
because selinux_cred_free() is not expecting cred->security == NULL.  This does
not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free().

Fix these bugs by

(1) Set new->magic before calling security_cred_alloc_blank().

(2) Handle null cred->security in creds_are_invalid() and selinux_cred_free().
Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2edeaa34

04 2月, 2011 1 次提交

lockdep, timer: Fix del_timer_sync() annotation · f266a511

由 Peter Zijlstra 提交于 2月 03, 2011

Calling local_bh_enable() will want to actually start processing
softirqs, which isn't a good idea since this can get called with IRQs
disabled.

Cure this by using _local_bh_enable() which doesn't start processing
softirqs, and use raw_local_irq_save() to avoid any softirqs from
happening without letting lockdep think IRQs are in fact disabled.
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <20110203141548.039540914@chello.nl>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f266a511

03 2月, 2011 10 次提交

tracing: Replace syscall_meta_data struct array with pointer array · 3d56e331

由 Steven Rostedt 提交于 2月 02, 2011

Currently the syscall_meta structures for the syscall tracepoints are
placed in the __syscall_metadata section, and at link time, the linker
makes one large array of all these syscall metadata structures. On boot
up, this array is read (much like the initcall sections) and the syscall
data is processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the __syscall_metadata section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The __syscall_metadata section is also moved into the .init.data section
as it is now only needed at boot up.
Suggested-by: NDavid Miller <davem@davemloft.net>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3d56e331

tracepoints: Fix section alignment using pointer array · 65498646

由 Mathieu Desnoyers 提交于 1月 26, 2011

Make the tracepoints more robust, making them solid enough to handle compiler
changes by not relying on anything based on compiler-specific behavior with
respect to structure alignment. Implement an approach proposed by David Miller:
use an array of const pointers to refer to the individual structures, and export
this pointer array through the linker script rather than the structures per se.
It will consume 32 extra bytes per tracepoint (24 for structure padding and 8
for the pointers), but are less likely to break due to compiler changes.

History:

commit 7e066fb8 tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()
added the aligned(32) type and variable attribute to the tracepoint structures
to deal with gcc happily aligning statically defined structures on 32-byte
multiples.

One attempt was to use a 8-byte alignment for tracepoint structures by applying
both the variable and type attribute to tracepoint structures definitions and
declarations. It worked fine with gcc 4.5.1, but broke with gcc 4.4.4 and 4.4.5.

The reason is that the "aligned" attribute only specify the _minimum_ alignment
for a structure, leaving both the compiler and the linker free to align on
larger multiples. Because tracepoint.c expects the structures to be placed as an
array within each section, up-alignment cause NULL-pointer exceptions due to the
extra unexpected padding.

(this patch applies on top of -tip)
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
LKML-Reference: <20110126222622.GA10794@Krystal>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Ingo Molnar <mingo@elte.hu>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

65498646

sched: Add yield_to(task, preempt) functionality · d95f4122

由 Mike Galbraith 提交于 2月 01, 2011

Currently only implemented for fair class tasks.

Add a yield_to_task method() to the fair scheduling class. allowing the
caller of yield_to() to accelerate another thread in it's thread group,
task group.

Implemented via a scheduler hint, using cfs_rq->next to encourage the
target being selected.  We can rely on pick_next_entity to keep things
fair, so noone can accelerate a thread that has already used its fair
share of CPU time.

This also means callers should only call yield_to when they really
mean it.  Calling it too often can result in the scheduler just
ignoring the hint.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NMike Galbraith <efault@gmx.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201095051.4ddb7738@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

d95f4122

sched: Use a buddy to implement yield_task_fair() · ac53db59

由 Rik van Riel 提交于 2月 01, 2011

Use the buddy mechanism to implement yield_task_fair.  This
allows us to skip onto the next highest priority se at every
level in the CFS tree, unless doing so would introduce gross
unfairness in CPU time distribution.

We order the buddy selection in pick_next_entity to check
yield first, then last, then next.  We need next to be able
to override yield, because it is possible for the "next" and
"yield" task to be different processen in the same sub-tree
of the CFS tree.  When they are, we need to go into that
sub-tree regardless of the "yield" hint, and pick the correct
entity once we get to the right level.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201095103.3a79e92a@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

ac53db59

sched: Limit the scope of clear_buddies · 2c13c919

由 Rik van Riel 提交于 2月 01, 2011

The clear_buddies function does not seem to play well with the concept
of hierarchical runqueues.  In the following tree, task groups are
represented by 'G', tasks by 'T', next by 'n' and last by 'l'.

     (nl)
    /    \
   G(nl)  G
   / \     \
 T(l) T(n)  T

This situation can arise when a task is woken up T(n), and the previously
running task T(l) is marked last.

When clear_buddies is called from either T(l) or T(n), the next and last
buddies of the group G(nl) will be cleared.  This is not the desired
result, since we would like to be able to find the other type of buddy
in many cases.

This especially a worry when implementing yield_task_fair through the
buddy system.

The fix is simple: only clear the buddy type that the task itself
is indicated to be.  As an added bonus, we stop walking up the tree
when the buddy has already been cleared or pointed elsewhere.
Signed-off-by: NRik van Riel <riel@redhat.coM>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201094837.6b0962a9@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

2c13c919

sched: Check the right ->nr_running in yield_task_fair() · 725e7580

由 Rik van Riel 提交于 2月 01, 2011

With CONFIG_FAIR_GROUP_SCHED, each task_group has its own cfs_rq.
Yielding to a task from another cfs_rq may be worthwhile, since
a process calling yield typically cannot use the CPU right now.

Therefor, we want to check the per-cpu nr_running, not the
cgroup local one.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20110201094715.798c4f86@annuminas.surriel.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

725e7580

sched: Fix update_curr_rt() · 06c3bc65

由 Peter Zijlstra 提交于 2月 02, 2011

cpu_stopper_thread()
  migration_cpu_stop()
    __migrate_task()
      deactivate_task()
        dequeue_task()
          dequeue_task_rq()
            update_curr_rt()

Will call update_curr_rt() on rq->curr, which at that time is
rq->stop. The problem is that rq->stop.prio matches an RT prio and
thus falsely assumes its a rt_sched_class task.
Reported-Debuged-Tested-Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Cc: stable@kernel.org # .37
Signed-off-by: NIngo Molnar <mingo@elte.hu>

06c3bc65

perf: Fix reading in perf_event_read() · 542e72fc

由 Peter Zijlstra 提交于 1月 26, 2011

It is quite possible for the event to have been disabled between
perf_event_read() sending the IPI and the CPU servicing the IPI and
calling __perf_event_read(), hence revalidate the state.
Reported-by: NStephane Eranian <eranian@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

542e72fc

tracing: Replace trace_event struct array with pointer array · e4a9ea5e

由 Steven Rostedt 提交于 1月 27, 2011

Currently the trace_event structures are placed in the _ftrace_events
section, and at link time, the linker makes one large array of all
the trace_event structures. On boot up, this array is read (much like
the initcall sections) and the events are processed.

The problem is that there is no guarantee that gcc will place complex
structures nicely together in an array format. Two structures in the
same file may be placed awkwardly, because gcc has no clue that they
are suppose to be in an array.

A hack was used previous to force the alignment to 4, to pack the
structures together. But this caused alignment issues with other
architectures (sparc).

Instead of packing the structures into an array, the structures' addresses
are now put into the _ftrace_event section. As pointers are always the
natural alignment, gcc should always pack them tightly together
(otherwise initcall, extable, etc would also fail).

By having the pointers to the structures in the section, we can still
iterate the trace_events without causing unnecessary alignment problems
with other architectures, or depending on the current behaviour of
gcc that will likely change in the future just to tick us kernel developers
off a little more.

The _ftrace_event section is also moved into the .init.data section
as it is now only needed at boot up.
Suggested-by: NDavid Miller <davem@davemloft.net>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e4a9ea5e

genirq: Prevent irq storm on migration · f1a06390

由 Thomas Gleixner 提交于 1月 28, 2011

move_native_irq() masks and unmasks the interrupt line
unconditionally, but the interrupt line might be masked due to a
threaded oneshot handler in progress. Unmasking the line in that case
can lead to interrupt storms. Observed on PREEMPT_RT.

Originally-from: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@kernel.org

f1a06390

31 1月, 2011 4 次提交

watchdog: Don't change watchdog state on read of sysctl · 9ffdc6c3

由 Marcin Slusarz 提交于 1月 28, 2011

Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
[ add {}'s to fix a warning ]
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-3-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

9ffdc6c3

watchdog: Fix sysctl consistency · 39735766

由 Marcin Slusarz 提交于 1月 28, 2011

If it was not possible to enable watchdog for any cpu, switch
watchdog_enabled back to 0, because it's visible via
kernel.watchdog sysctl.
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-2-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

39735766

watchdog: Fix broken nowatchdog logic · 4135038a

由 Marcin Slusarz 提交于 1月 28, 2011

Passing nowatchdog to kernel disables 2 things: creation of
watchdog threads AND initialization of percpu watchdog_hrtimer.
As hrtimers are initialized only at boot it's not possible to
enable watchdog later - for me all watchdog threads started to
eat 100% of CPU time, but they could just crash.

Additionally, even if these threads would start properly,
watchdog_disable_all_cpus was guarded by no_watchdog check, so
you couldn't disable watchdog.

To fix this, remove no_watchdog variable and use already
existing watchdog_enabled variable.
Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
[ removed another no_watchdog instance ]
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: <stable@kernel.org>
LKML-Reference: <1296230433-6261-1-git-send-email-dzickus@redhat.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4135038a

Fix prlimit64 for suid/sgid processes · aa5bd67d

由 Kacper Kornet 提交于 1月 29, 2011

Since check_prlimit_permission always fails in the case of SUID/GUID
processes, such processes are not able to read or set their own limits.
This commit changes this by assuming that process can always read/change
its own limits.
Signed-off-by: NKacper Kornet <kornet@camk.edu.pl>
Acked-by: NJiri Slaby <jslaby@suse.cz>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa5bd67d

28 1月, 2011 1 次提交

perf: Fix alloc_callchain_buffers() · 88d4f0db

由 Eric Dumazet 提交于 1月 25, 2011

Commit 927c7a9e ("perf: Fix race in callchains") introduced
a mismatch in the sizing of struct callchain_cpus_entries.

nr_cpu_ids must be used instead of num_possible_cpus(), or we
might get out of bound memory accesses on some machines.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: Stephane Eranian <eranian@google.com>
CC: stable@kernel.org
LKML-Reference: <1295980851.3588.351.camel@edumazet-laptop>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

88d4f0db

27 1月, 2011 1 次提交

sched: Avoid expensive initial update_cfs_load(), on UP too · 6ea72f12

由 Peter Zijlstra 提交于 1月 26, 2011

Fix the build on UP.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Turner <pjt@google.com>
LKML-Reference: <20110122044852.102126037@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

6ea72f12

26 1月, 2011 7 次提交

sched: Fix switch_from_fair() · da7a735e

由 Peter Zijlstra 提交于 1月 17, 2011

When a task is taken out of the fair class we must ensure the vruntime
is properly normalized because when we put it back in it will assume
to be normalized.

The case that goes wrong is when changing away from the fair class
while sleeping. Sleeping tasks have non-normalized vruntime in order
to make sleeper-fairness work. So treat the switch away from fair as a
wakeup and preserve the relative vruntime.

Also update sysrq-n to call the ->switch_{to,from} methods.
Reported-by: NOnkalo Samu <samu.p.onkalo@nokia.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

da7a735e

sched: Simplify the idle scheduling class · a8941d7e

由 Peter Zijlstra 提交于 1月 25, 2011

Since commit 48c5ccae (sched: Simplify cpu-hot-unplug task
migration) this should no longer happen, so remove the code.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a8941d7e

softirqs: Account ksoftirqd time as cpustat softirq · 414bee9b

由 Venkatesh Pallipadi 提交于 12月 21, 2010

softirq time in ksoftirqd context is not accounted in ns granularity
per cpu softirq stats, as we want that to be a part of ksoftirqd
exec_runtime.

Accounting them as softirq on /proc/stat separately.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-6-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

414bee9b

sched: Export ns irqtimes through /proc/stat · abb74cef

由 Venkatesh Pallipadi 提交于 12月 21, 2010

CONFIG_IRQ_TIME_ACCOUNTING adds ns granularity irq time on each CPU.
This info is already used in scheduler to do proper task chargeback
(earlier patches). This patch retro-fits this ns granularity
hardirq and softirq information to /proc/stat irq and softirq fields.

The update is still done on timer tick, where we look at accumulated
ns hardirq/softirq time and account the tick to user/system/irq/hardirq/guest
accordingly.

No new interface added.

Earlier versions looked at adding this as new fields in some /proc
files. This one seems to be the best in terms of impact to existing
apps, even though it has somewhat more kernel code than earlier versions.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-5-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

abb74cef

sched: Refactor account_system_time separating id-update · 70a89a66

由 Venkatesh Pallipadi 提交于 12月 21, 2010

Refactor account_system_time, to separate out the logic of
identifying the update needed and code that does actual update.

This is used by following patch for IRQ_TIME_ACCOUNTING,
which has different identification logic and same update logic.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-4-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

70a89a66

time: Add nsecs_to_cputime64 interface for asm-generic · a1dabb6b

由 Venkatesh Pallipadi 提交于 12月 21, 2010

Add nsecs_to_cputime64 interface. This is used in following patches that
updates cpu irq stat based on ns granularity info in IRQ_TIME_ACCOUNTING.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-3-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

a1dabb6b

softirqs: Free up pf flag PF_KSOFTIRQD · 4dd53d89

由 Venkatesh Pallipadi 提交于 12月 21, 2010

Cleanup patch, freeing up PF_KSOFTIRQD and use per_cpu ksoftirqd pointer
instead, as suggested by Eric Dumazet.
Tested-by: NShaun Ruffell <sruffell@digium.com>
Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1292980144-28796-2-git-send-email-venki@google.com>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

4dd53d89