提交 · 24e55910e4801d772f95becde20b526b8b10388d · OpenHarmony / kernel_linux

24 2月, 2013 3 次提交

cputime: Use local_clock() for full dynticks cputime accounting · 7f6575f1

由 Frederic Weisbecker 提交于 2月 23, 2013

Running the full dynticks cputime accounting with preemptible
kernel debugging trigger the following warning:

	[    4.488303] BUG: using smp_processor_id() in preemptible [00000000] code: init/1
	[    4.490971] caller is native_sched_clock+0x22/0x80
	[    4.493663] Pid: 1, comm: init Not tainted 3.8.0+ #13
	[    4.496376] Call Trace:
	[    4.498996]  [<ffffffff813410eb>] debug_smp_processor_id+0xdb/0xf0
	[    4.501716]  [<ffffffff8101e642>] native_sched_clock+0x22/0x80
	[    4.504434]  [<ffffffff8101db99>] sched_clock+0x9/0x10
	[    4.507185]  [<ffffffff81096ccd>] fetch_task_cputime+0xad/0x120
	[    4.509916]  [<ffffffff81096dd5>] task_cputime+0x35/0x60
	[    4.512622]  [<ffffffff810f146e>] acct_update_integrals+0x1e/0x40
	[    4.515372]  [<ffffffff8117d2cf>] do_execve_common+0x4ff/0x5c0
	[    4.518117]  [<ffffffff8117cf14>] ? do_execve_common+0x144/0x5c0
	[    4.520844]  [<ffffffff81867a10>] ? rest_init+0x160/0x160
	[    4.523554]  [<ffffffff8117d457>] do_execve+0x37/0x40
	[    4.526276]  [<ffffffff810021a3>] run_init_process+0x23/0x30
	[    4.528953]  [<ffffffff81867aac>] kernel_init+0x9c/0xf0
	[    4.531608]  [<ffffffff8188356c>] ret_from_fork+0x7c/0xb0

We use sched_clock() to perform and fixup the cputime
accounting. However we are calling it with preemption enabled
from the read side, which trigger the bug above.

To fix this up, use local_clock() instead. It takes care of
preemption and also provide a more reliable clock source. This
is welcome for this kind of statistic that is widely relied on
in userspace.
Reported-by: NThomas Gleixner <tglx@linutronix.de>
Reported-by: NIngo Molnar <mingo@kernel.org>
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zhong <zhong@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Kevin Hilman <khilman@linaro.org>
Link: http://lkml.kernel.org/r/1361636925-22288-3-git-send-email-fweisbec@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f6575f1

page-writeback.c: subtract min_free_kbytes from dirtyable memory · 75f7ad8e

由 Paul Szabo 提交于 2月 22, 2013

When calculating amount of dirtyable memory, min_free_kbytes should be
subtracted because it is not intended for dirty pages.

Addresses http://bugs.debian.org/695182

[akpm@linux-foundation.org: fix up min_free_kbytes extern declarations]
[akpm@linux-foundation.org: fix min() warning]
Signed-off-by: NPaul Szabo <psz@maths.usyd.edu.au>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75f7ad8e

sched: do not use cpu_to_node() to find an offlined cpu's node. · aa00d89c

由 Tang Chen 提交于 2月 22, 2013

If a cpu is offline, its nid will be set to -1, and cpu_to_node(cpu)
will return -1.  As a result, cpumask_of_node(nid) will return NULL.  In
this case, find_next_bit() in for_each_cpu will get a NULL pointer and
cause panic.

Here is a call trace:
  Call Trace:
   <IRQ>
    select_fallback_rq+0x71/0x190
    try_to_wake_up+0x2cb/0x2f0
    wake_up_process+0x15/0x20
    hrtimer_wakeup+0x22/0x30
    __run_hrtimer+0x83/0x320
    hrtimer_interrupt+0x106/0x280
    smp_apic_timer_interrupt+0x69/0x99
    apic_timer_interrupt+0x6f/0x80

There is a hrtimer process sleeping, whose cpu has already been
offlined.  When it is waken up, it tries to find another cpu to run, and
get a -1 nid.  As a result, cpumask_of_node(-1) returns NULL, and causes
ernel panic.

This patch fixes this problem by judging if the nid is -1.  If nid is
not -1, a cpu on the same node will be picked.  Else, a online cpu on
another node will be picked.
Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa00d89c

22 2月, 2013 9 次提交

sched: Fix /proc/sched_debug failure on very very large systems · bbbfeac9

由 Nathan Zimmer 提交于 2月 21, 2013

On systems with 4096 cores attemping to read /proc/sched_debug
fails because we are trying to push all the data into a single
kmalloc buffer.

The issue is on these very large machines all the data will not
fit in 4mb.

A better solution is to not us the single_open mechanism but to
provide our own seq_operations and treat each cpu as an
individual record.

The output should be identical to the previous version.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>)
[ Whitespace fixlet]
[ Fix spello in comment]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

bbbfeac9

sched: Fix /proc/sched_stat failure on very very large systems · cb152ff2

由 Nathan Zimmer 提交于 2月 21, 2013

On systems with 4096 cores doing a cat /proc/sched_stat fails,
because we are trying to push all the data into a single kmalloc
buffer.

The issue is on these very large machines all the data will not
fit in 4mb.

A better solution is to not use the single_open() mechanism but
to provide our own seq_operations.

The output should be identical to previous version and thus not
need the version number.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
[ Fix memleak]
[ Fix spello in comment]
[ Fix warnings]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

cb152ff2

kernel/nsproxy.c: remove duplicate task_cred_xxx for user_ns · d7d48f62

由 Yuanhan Liu 提交于 2月 21, 2013

We can use user_ns, which is also assigned from task_cred_xxx(tsk,
user_ns), at the beginning of copy_namespaces().
Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7d48f62

sys_prctl(): coding-style cleanup · f3cbd435

由 Andrew Morton 提交于 2月 21, 2013

Remove a tabstop from the switch statement, in the usual fashion.  A few
instances of weirdwrapping were removed as a result.

Cc: Chen Gang <gang.chen@asianux.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3cbd435

sys_prctl(): arg2 is unsigned long which is never < 0 · 7fe5e042

由 Chen Gang 提交于 2月 21, 2013

arg2 will never < 0, for its type is 'unsigned long'

Also, use the provided macros.
Signed-off-by: NChen Gang <gang.chen@asianux.com>
Reported-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7fe5e042

smp: make smp_call_function_many() use logic similar to smp_call_function_single() · 9a46ad6d

由 Shaohua Li 提交于 2月 21, 2013

I'm testing swapout workload in a two-socket Xeon machine.  The workload
has 10 threads, each thread sequentially accesses separate memory
region.  TLB flush overhead is very big in the workload.  For each page,
page reclaim need move it from active lru list and then unmap it.  Both
need a TLB flush.  And this is a multthread workload, TLB flush happens
in 10 CPUs.  In X86, TLB flush uses generic smp_call)function.  So this
workload stress smp_call_function_many heavily.

Without patch, perf shows:
+  24.49%  [k] generic_smp_call_function_interrupt
-  21.72%  [k] _raw_spin_lock
   - _raw_spin_lock
      + 79.80% __page_check_address
      + 6.42% generic_smp_call_function_interrupt
      + 3.31% get_swap_page
      + 2.37% free_pcppages_bulk
      + 1.75% handle_pte_fault
      + 1.54% put_super
      + 1.41% grab_super_passive
      + 1.36% __swap_duplicate
      + 0.68% blk_flush_plug_list
      + 0.62% swap_info_get
+   6.55%  [k] flush_tlb_func
+   6.46%  [k] smp_call_function_many
+   5.09%  [k] call_function_interrupt
+   4.75%  [k] default_send_IPI_mask_sequence_phys
+   2.18%  [k] find_next_bit

swapout throughput is around 1300M/s.

With the patch, perf shows:
-  27.23%  [k] _raw_spin_lock
   - _raw_spin_lock
      + 80.53% __page_check_address
      + 8.39% generic_smp_call_function_single_interrupt
      + 2.44% get_swap_page
      + 1.76% free_pcppages_bulk
      + 1.40% handle_pte_fault
      + 1.15% __swap_duplicate
      + 1.05% put_super
      + 0.98% grab_super_passive
      + 0.86% blk_flush_plug_list
      + 0.57% swap_info_get
+   8.25%  [k] default_send_IPI_mask_sequence_phys
+   7.55%  [k] call_function_interrupt
+   7.47%  [k] smp_call_function_many
+   7.25%  [k] flush_tlb_func
+   3.81%  [k] _raw_spin_lock_irqsave
+   3.78%  [k] generic_smp_call_function_single_interrupt

swapout throughput is around 1400M/s.  So there is around a 7%
improvement, and total cpu utilization doesn't change.

Without the patch, cfd_data is shared by all CPUs.
generic_smp_call_function_interrupt does read/write cfd_data several times
which will create a lot of cache ping-pong.  With the patch, the data
becomes per-cpu.  The ping-pong is avoided.  And from the perf data, this
doesn't make call_single_queue lock contend.

Next step is to remove generic_smp_call_function_interrupt() from arch
code.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a46ad6d

time: don't inline EXPORT_SYMBOL functions · af3b5628

由 Greg Kroah-Hartman 提交于 2月 21, 2013

How is the compiler even handling exported functions that are marked
inline? Anyway, these shouldn't be inline because of that, so remove
that marking.

Based on a larger patch by Mark Charlebois to get LLVM to build the
kernel.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Mark Charlebois <mcharleb@qualcomm.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: hank <pyu@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af3b5628

compat: return -EFAULT on error in waitid() · bffea77c

由 Dan Carpenter 提交于 2月 21, 2013

The copy_to_user() call returns the number of bytes remaining but we
want to return -EFAULT on error.

Fixes "x32: fix waitid()"
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bffea77c

posix-timer: Don't call idr_find() with out-of-range ID · e182bb38

由 Tejun Heo 提交于 2月 20, 2013

When idr_find() was fed a negative ID, it used to look up the ID
ignoring the sign bit before recent ("idr: remove MAX_IDR_MASK and
move left MAX_IDR_* into idr.c") patch. Now a negative ID triggers
a WARN_ON_ONCE().

__lock_timer() feeds timer_id from userland directly to idr_find()
without sanitizing it which can trigger the above malfunctions.  Add a
range check on @timer_id before invoking idr_find() in __lock_timer().

While timer_t is defined as int by all archs at the moment, Andrew
worries that it may be defined as a larger type later on.  Make the
test cover larger integers too so that it at least is guaranteed to
not return the wrong timer.

Note that WARN_ON_ONCE() in idr_find() on id < 0 is transitional
precaution while moving away from ignoring MSB.  Once it's gone we can
remove the guard as long as timer_t isn't larger than int.

Signed-off-by: Tejun Heo <tj@kernel.org>nnn
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130220232412.GL3570@htj.dyndns.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e182bb38

20 2月, 2013 2 次提交

sched/core: Remove the obsolete and unused nr_uninterruptible() function · 1c3e8264

由 Sha Zhengju 提交于 2月 20, 2013

Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1361351678-8065-1-git-send-email-handai.szj@taobao.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1c3e8264

workqueue: un-GPL function delayed_work_timer_fn() · 1438ade5

由 Konstantin Khlebnikov 提交于 1月 24, 2013

commit d8e794df ("workqueue: set
delayed_work->timer function on initialization") exports function
delayed_work_timer_fn() only for GPL modules. This makes delayed-works
unusable for non-GPL modules, because initialization macro now requires
GPL symbol. For example schedule_delayed_work() available for non-GPL.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org # 3.7

1438ade5

19 2月, 2013 12 次提交

futex: Revert "futex: Mark get_robust_list as deprecated" · fe2b05f7

由 Thomas Gleixner 提交于 2月 18, 2013

This reverts commit ec0c4274.

get_robust_list() is in use and a removal would break existing user
space. With the permission checks in place it's not longer a security
hole. Remove the deprecation warnings.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: akpm@linux-foundation.org
Cc: paul.gortmaker@windriver.com
Cc: davej@redhat.com
Cc: keescook@chromium.org
Cc: stable@vger.kernel.org
Cc: ebiederm@xmission.com

fe2b05f7

ntp: Make ntp_lock raw · a6c0c943

由 Thomas Gleixner 提交于 4月 10, 2012

seconds_overflow() is called from hard interrupt context even on
Preempt-RT. This requires the lock to be a raw_spinlock.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

a6c0c943

lockdep: Print more info when MAX_LOCK_DEPTH is exceeded · c0540606

由 Ben Greear 提交于 2月 06, 2013

This helps debug cases where a lock is acquired over and
over without being released.
Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NBen Greear <greearb@candelatech.com>
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/1360176979-4421-1-git-send-email-greearb@candelatech.com
[ Changed the printout ordering. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c0540606

watchdog: Use local_clock for get_timestamp() · c06b4f19

由 Namhyung Kim 提交于 12月 27, 2012

The get_timestamp() function is always called with current cpu,
thus using local_clock() would be more appropriate and it makes
the code shorter and cleaner IMHO.
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Acked-by: NDon Zickus <dzickus@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1356576585-28782-1-git-send-email-namhyung@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c06b4f19

lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug() · f86f7554

由 Srivatsa S. Bhat 提交于 1月 08, 2013

Fix the typo in the function name (s/inbalance/imbalance)
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: rostedt@goodmis.org
Cc: peterz@infradead.org
Link: http://lkml.kernel.org/r/20130108130547.32733.79507.stgit@srivatsabhat.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f86f7554

cputime: Remove irqsave from seqlock readers · cdc4e86b

由 Thomas Gleixner 提交于 2月 15, 2013

The reader side code has no requirement to disable interrupts while
sampling data. The sequence counter is enough to ensure consistency.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

cdc4e86b

ftrace: Call ftrace cleanup module notifier after all other notifiers · 8c189ea6

由 Steven Rostedt (Red Hat) 提交于 2月 13, 2013

Commit: c1bf08ac "ftrace: Be first to run code modification on modules"

changed ftrace module notifier's priority to INT_MAX in order to
process the ftrace nops before anything else could touch them
(namely kprobes). This was the correct thing to do.

Unfortunately, the ftrace module notifier also contains the ftrace
clean up code. As opposed to the set up code, this code should be
run *after* all the module notifiers have run in case a module is doing
correct clean-up and unregisters its ftrace hooks. Basically, ftrace
needs to do clean up on module removal, as it needs to know about code
being removed so that it doesn't try to modify that code. But after it
removes the module from its records, if a ftrace user tries to remove
a probe, that removal will fail due as the record of that code segment
no longer exists.

Nothing really bad happens if the probe removal is called after ftrace
did the clean up, but the ftrace removal function will return an error.
Correct code (such as kprobes) will produce a WARN_ON() if it fails
to remove the probe. As people get annoyed by frivolous warnings, it's
best to do the ftrace clean up after everything else.

By splitting the ftrace_module_notifier into two notifiers, one that
does the module load setup that is run at high priority, and the other
that is called for module clean up that is run at low priority, the
problem is solved.

Cc: stable@vger.kernel.org
Reported-by: NFrank Ch. Eigler <fche@redhat.com>
Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8c189ea6

genirq: Export enable/disable_percpu_irq() · 36a5df85

由 Chris Metcalf 提交于 2月 01, 2013

These functions are used by the tilegx onchip network driver, and it's
useful to be able to load that driver as a module.
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Link: http://lkml.kernel.org/r/201302012043.r11KhNZF024371@farm-0021.internal.tilera.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

36a5df85

cgroup: fail if monitored file and event_control are in different cgroup · f169007b

由 Li Zefan 提交于 2月 18, 2013

If we pass fd of memory.usage_in_bytes of cgroup A to cgroup.event_control
of cgroup B, then we won't get memory usage notification from A but B!

What's worse, if A and B are in different mount hierarchy, we'll end up
accessing NULL pointer!

Disallow this kind of invalid usage.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NTejun Heo <tj@kernel.org>

f169007b

cgroup: fix cgroup_rmdir() vs close(eventfd) race · 810cbee4

由 Li Zefan 提交于 2月 18, 2013

commit 205a872b ("cgroup: fix lockdep
warning for event_control") solved a deadlock by introducing a new
bug.

Move cgrp->event_list to a temporary list doesn't mean you can traverse
this list locklessly, because at the same time cgroup_event_wake() can
be called and remove the event from the list. The result of this race
is disastrous.

We adopt the way how kvm irqfd code implements race-free event removal,
which is now described in the comments in cgroup_event_wake().

v3:
- call eventfd_signal() no matter it's eventfd close or cgroup removal
that removes the cgroup event.
Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

810cbee4

cpuset: fix cpuset_print_task_mems_allowed() vs rename() race · 63f43f55

由 Li Zefan 提交于 1月 25, 2013

rename() will change dentry->d_name. The result of this race can
be worse than seeing partially rewritten name, but we might access
a stale pointer because rename() will re-allocate memory to hold
a longer name.

It's safe in the protection of dentry->d_lock.

v2: check NULL dentry before acquiring dentry lock.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

63f43f55

cgroup: fix exit() vs rmdir() race · 71b5707e

由 Li Zefan 提交于 1月 24, 2013

In cgroup_exit() put_css_set_taskexit() is called without any lock,
which might lead to accessing a freed cgroup:

thread1                           thread2
---------------------------------------------
exit()
  cgroup_exit()
    put_css_set_taskexit()
      atomic_dec(cgrp->count);
                                   rmdir();
      /* not safe !! */
      check_for_release(cgrp);

rcu_read_lock() can be used to make sure the cgroup is alive.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

71b5707e

15 2月, 2013 2 次提交

posix-cpu-timers: Fix nanosleep task_struct leak · e6c42c29

由 Stanislaw Gruszka 提交于 2月 15, 2013

The trinity fuzzer triggered a task_struct reference leak via
clock_nanosleep with CPU_TIMERs. do_cpu_nanosleep() calls
posic_cpu_timer_create(), but misses a corresponding
posix_cpu_timer_del() which leads to the task_struct reference leak.
Reported-and-tested-by: NTommi Rantala <tt.rantala@gmail.com>
Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20130215100810.GF4392@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

e6c42c29

perf/hwbp: Fix cleanup in case of kzalloc failure · 02e176af

由 Daniel Baluta 提交于 2月 06, 2013

Obviously this is a typo and could result in memory leaks if kzalloc
fails on a given cpu.
Signed-off-by: NDaniel Baluta <dbaluta@ixiacom.com>
Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1360186160-7566-1-git-send-email-dbaluta@ixiacom.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

02e176af

14 2月, 2013 8 次提交

stop_machine: Use smpboot threads · 14e568e7

由 Thomas Gleixner 提交于 1月 31, 2013

Use the smpboot thread infrastructure. Mark the stopper thread
selfparking and park it after it has finished the take_cpu_down()
work.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.686315164@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

14e568e7

stop_machine: Store task reference in a separate per cpu variable · 860a0ffa

由 Thomas Gleixner 提交于 1月 31, 2013

To allow the stopper thread being managed by the smpboot thread
infrastructure separate out the task storage from the stopper data
structure.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.626690384@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

860a0ffa

smpboot: Allow selfparking per cpu threads · 7d7e499f

由 Thomas Gleixner 提交于 1月 31, 2013

The stop machine threads are still killed when a cpu goes offline. The
reason is that the thread is used to bring the cpu down, so it can't
be parked along with the other per cpu threads.

Allow a per cpu thread to be excluded from automatic parking, so it
can park itself once it's done

Add a create callback function as well.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Cc: Arjan van de Veen <arjan@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Richard Weinberger <rw@linutronix.de>
Cc: Magnus Damm <magnus.damm@gmail.com>
Link: http://lkml.kernel.org/r/20130131120741.553993267@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

7d7e499f

burying unused conditionals · d64008a8

由 Al Viro 提交于 11月 25, 2012

__ARCH_WANT_SYS_RT_SIGACTION,
__ARCH_WANT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_RT_SIGSUSPEND,
__ARCH_WANT_COMPAT_SYS_SCHED_RR_GET_INTERVAL - not used anymore
CONFIG_GENERIC_{SIGALTSTACK,COMPAT_RT_SIG{ACTION,QUEUEINFO,PENDING,PROCMASK}} -
can be assumed always set.

d64008a8

A
make do_sigaltstack() static · e9b04b5b
由 Al Viro 提交于 11月 20, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e9b04b5b

workqueue: rename cpu_workqueue to pool_workqueue · 112202d9

由 Tejun Heo 提交于 2月 13, 2013

workqueue has moved away from global_cwqs to worker_pools and with the
scheduled custom worker pools, wforkqueues will be associated with
pools which don't have anything to do with CPUs.  The workqueue code
went through significant amount of changes recently and mass renaming
isn't likely to hurt much additionally.  Let's replace 'cpu' with
'pool' so that it reflects the current design.

* s/struct cpu_workqueue_struct/struct pool_workqueue/
* s/cpu_wq/pool_wq/
* s/cwq/pwq/

This patch is purely cosmetic.
Signed-off-by: NTejun Heo <tj@kernel.org>

112202d9

workqueue: reimplement is_chained_work() using current_wq_worker() · 8d03ecfe

由 Tejun Heo 提交于 2月 13, 2013

is_chained_work() was added before current_wq_worker() and implemented
its own ham-fisted way of finding out whether %current is a workqueue
worker - it iterates through all possible workers.

Drop the custom implementation and reimplement using
current_wq_worker().
Signed-off-by: NTejun Heo <tj@kernel.org>

8d03ecfe

workqueue: fix is_chained_work() regression · 1dd63814

由 Tejun Heo 提交于 2月 13, 2013

c9e7cf27 ("workqueue: move busy_hash from global_cwq to
worker_pool") incorrectly converted is_chained_work() to use
get_gcwq() inside for_each_gcwq_cpu() while removing get_gcwq().

As cwq might not exist for all possible workqueue CPUs, @cwq can be
NULL and the following cwq deferences can lead to oops.

Fix it by using for_each_cwq_cpu() instead, which is the better one to
use anyway as we only need to check pools that the wq is associated
with.
Signed-off-by: NTejun Heo <tj@kernel.org>

1dd63814

13 2月, 2013 3 次提交

tracing/syscalls: Allow archs to ignore tracing compat syscalls · f431b634

由 Steven Rostedt 提交于 2月 12, 2013

The tracing of ia32 compat system calls has been a bit of a pain as they
use different system call numbers than the 64bit equivalents.

I wrote a simple 'lls' program that lists files. I compiled it as a i686
ELF binary and ran it under a x86_64 box. This is the result:

echo 0 > /debug/tracing/tracing_on
echo 1 > /debug/tracing/events/syscalls/enable
echo 1 > /debug/tracing/tracing_on ; ./lls ; echo 0 > /debug/tracing/tracing_on

grep lls /debug/tracing/trace

[.. skipping calls before TS_COMPAT is set ...]

lls-1127 [005] d... 936.409188: sys_recvfrom(fd: 0, ubuf: 4d560fc4, size: 0, flags: 8048034, addr: 8, addr_len: f7700420)
lls-1127 [005] d... 936.409190: sys_recvfrom -> 0x8a77000
lls-1127 [005] d... 936.409211: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)
lls-1127 [005] d... 936.409215: sys_lgetxattr -> 0xf76ff000
lls-1127 [005] d... 936.409223: sys_dup2(oldfd: 4d55ae9b, newfd: 4)
lls-1127 [005] d... 936.409228: sys_dup2 -> 0xfffffffffffffffe
lls-1127 [005] d... 936.409236: sys_newfstat(fd: 4d55b085, statbuf: 80000)
lls-1127 [005] d... 936.409242: sys_newfstat -> 0x3
lls-1127 [005] d... 936.409243: sys_removexattr(pathname: 3, name: ffcd0060)
lls-1127 [005] d... 936.409244: sys_removexattr -> 0x0
lls-1127 [005] d... 936.409245: sys_lgetxattr(pathname: 0, name: 19614, value: 1, size: 2)
lls-1127 [005] d... 936.409248: sys_lgetxattr -> 0xf76e5000
lls-1127 [005] d... 936.409248: sys_newlstat(filename: 3, statbuf: 19614)
lls-1127 [005] d... 936.409249: sys_newlstat -> 0x0
lls-1127 [005] d... 936.409262: sys_newfstat(fd: f76fb588, statbuf: 80000)
lls-1127 [005] d... 936.409279: sys_newfstat -> 0x3
lls-1127 [005] d... 936.409279: sys_close(fd: 3)
lls-1127 [005] d... 936.421550: sys_close -> 0x200
lls-1127 [005] d... 936.421558: sys_removexattr(pathname: 3, name: ffcd00d0)
lls-1127 [005] d... 936.421560: sys_removexattr -> 0x0
lls-1127 [005] d... 936.421569: sys_lgetxattr(pathname: 4d564000, name: 1b1abc, value: 5, size: 802)
lls-1127 [005] d... 936.421574: sys_lgetxattr -> 0x4d564000
lls-1127 [005] d... 936.421575: sys_capget(header: 4d70f000, dataptr: 1000)
lls-1127 [005] d... 936.421580: sys_capget -> 0x0
lls-1127 [005] d... 936.421580: sys_lgetxattr(pathname: 4d710000, name: 3000, value: 3, size: 812)
lls-1127 [005] d... 936.421589: sys_lgetxattr -> 0x4d710000
lls-1127 [005] d... 936.426130: sys_lgetxattr(pathname: 4d713000, name: 2abc, value: 3, size: 32)
lls-1127 [005] d... 936.426141: sys_lgetxattr -> 0x4d713000
lls-1127 [005] d... 936.426145: sys_newlstat(filename: 3, statbuf: f76ff3f0)
lls-1127 [005] d... 936.426146: sys_newlstat -> 0x0
lls-1127 [005] d... 936.431748: sys_lgetxattr(pathname: 0, name: 1000, value: 3, size: 22)

Obviously I'm not calling newfstat with a fd of 4d55b085. The calls are
obviously incorrect, and confusing.

Other efforts have been made to fix this:

https://lkml.org/lkml/2012/3/26/367

But the real solution is to rewrite the syscall internals and come up
with a fixed solution. One that doesn't require all the kluge that the
current solution has.

Thus for now, instead of outputting incorrect data, simply ignore them.
With this patch the changes now have:

#> grep lls /debug/tracing/trace
#>

Compat system calls simply are not traced. If users need compat
syscalls, then they should just use the raw syscall tracepoints.

For an architecture to make their compat syscalls ignored, it must
define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS (done in asm/ftrace.h) and also
define an arch_trace_is_compat_syscall() function that will return true
if the current task should ignore tracing the syscall.

I want to stress that this change does not affect actual syscalls in any
way, shape or form. It is only used within the tracing system and
doesn't interfere with the syscall logic at all. The changes are
consolidated nicely into trace_syscalls.c and asm/ftrace.h.

I had to make one small modification to asm/thread_info.h and that was
to remove the include of asm/ftrace.h. As asm/ftrace.h required the
current_thread_info() it was causing include hell. That include was
added back in 2008 when the function graph tracer was added:

commit caf4b323 "tracing, x86: add low level support for ftrace return tracing"

It does not need to be included there.

Link: http://lkml.kernel.org/r/1360703939.21867.99.camel@gandalf.local.homeAcked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f431b634

kernel/pid.c: reenable interrupts when alloc_pid() fails because init has exited · 6e666884

由 Eric W. Biederman 提交于 2月 12, 2013

We're forgetting to reenable local interrupts on an error path.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Reported-by: NJosh Boyer <jwboyer@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e666884

clockevents: Fix generic broadcast for FEAT_C3STOP · 5d1d9a29

由 Mark Rutland 提交于 2月 08, 2013

Commit 12ad1000: "clockevents: Add generic timer broadcast function"
made tick_device_uses_broadcast set up the generic broadcast function
for dummy devices (where !tick_device_is_functional(dev)), but neglected
to set up the broadcast function for devices that stop in low power
states (with the CLOCK_EVT_FEAT_C3STOP flag).

When these devices enter low power states they will not have the generic
broadcast function assigned, and will bring down the system when an
attempt is made to broadcast to them.

This patch ensures that the broadcast function is also assigned for
devices which require broadcast in low power states.
Reported-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NMark Rutland <mark.rutland@arm.com>
Tested-by: NStephen Warren <swarren@nvidia.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: nico@linaro.org
Cc: Marc.Zyngier@arm.com
Cc: Will.Deacon@arm.com
Cc: santosh.shilimkar@ti.com
Cc: john.stultz@linaro.org
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

5d1d9a29

10 2月, 2013 1 次提交

suspend: enable freeze timeout configuration through sys · 957d1282

由 Li Fei 提交于 2月 01, 2013

At present, the value of timeout for freezing is 20s, which is
meaningless in case that one thread is frozen with mutex locked
and another thread is trying to lock the mutex, as this time of
freezing will fail unavoidably.
And if there is no new wakeup event registered, the system will
waste at most 20s for such meaningless trying of freezing.

With this patch, the value of timeout can be configured to smaller
value, so such meaningless trying of freezing will be aborted in
earlier time, and later freezing can be also triggered in earlier
time. And more power will be saved.
In normal case on mobile phone, it costs real little time to freeze
processes. On some platform, it only costs about 20ms to freeze
user space processes and 10ms to freeze kernel freezable threads.
Signed-off-by: NLiu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: NLi Fei <fei.li@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

957d1282

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多