提交 · 6e4664525b1db28f8c4e1130957f70a94c19213e · openanolis / cloud-kernel

29 8月, 2013 3 次提交

cgroup: fix rmdir EBUSY regression in 3.11 · bb78a92f

由 Hugh Dickins 提交于 8月 28, 2013

On 3.11-rc we are seeing cgroup directories left behind when they should
have been removed.  Here's a trivial reproducer:

cd /sys/fs/cgroup/memory
mkdir parent parent/child; rmdir parent/child parent
rmdir: failed to remove `parent': Device or resource busy

It's because cgroup_destroy_locked() (step 1 of destruction) leaves
cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
destruction) remove it; but step 2 is run by work queue, which may not
yet have removed the children when parent destruction checks the list.

Fix that by checking through a non-empty list of children: if every one
of them has already been marked CGRP_DEAD, then it's safe to proceed:
those children are invisible to userspace, and should not obstruct rmdir.

(I didn't see any reason to keep the cgrp->children checks under the
unrelated css_set_lock, so moved them out.)

tj: Flattened nested ifs a bit and updated comment so that it's
    correct on both for-3.11-fixes and for-3.12.
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

bb78a92f

workqueue: cond_resched() after processing each work item · b22ce278

由 Tejun Heo 提交于 8月 28, 2013

If !PREEMPT, a kworker running work items back to back can hog CPU.
This becomes dangerous when a self-requeueing work item which is
waiting for something to happen races against stop_machine.  Such
self-requeueing work item would requeue itself indefinitely hogging
the kworker and CPU it's running on while stop_machine would wait for
that CPU to enter stop_machine while preventing anything else from
happening on all other CPUs.  The two would deadlock.

Jamie Liu reports that this deadlock scenario exists around
scsi_requeue_run_queue() and libata port multiplier support, where one
port may exclude command processing from other ports.  With the right
timing, scsi_requeue_run_queue() can end up requeueing itself trying
to execute an IO which is asked to be retried while another device has
an exclusive access, which in turn can't make forward progress due to
stop_machine.

Fix it by invoking cond_resched() after executing each work item.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NJamie Liu <jamieliu@google.com>
References: http://thread.gmane.org/gmane.linux.kernel/1552567
Cc: stable@vger.kernel.org
--
 kernel/workqueue.c |    9 +++++++++
 1 file changed, 9 insertions(+)

b22ce278

timer_list: correct the iterator for timer_list · 84a78a65

由 Nathan Zimmer 提交于 8月 28, 2013

Correct an issue with /proc/timer_list reported by Holger.

When reading from the proc file with a sufficiently small buffer, 2k so
not really that small, there was one could get hung trying to read the
file a chunk at a time.

The timer_list_start function failed to account for the possibility that
the offset was adjusted outside the timer_list_next.
Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
Reported-by: NHolger Hans Peter Freyther <holger@freyther.de>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Berke Durak <berke.durak@xiphos.com>
Cc: Jeff Layton <jlayton@redhat.com>
Tested-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> # 3.10.x
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84a78a65

28 8月, 2013 1 次提交

Rename nsproxy.pid_ns to nsproxy.pid_ns_for_children · c2b1df2e

由 Andy Lutomirski 提交于 8月 22, 2013

nsproxy.pid_ns is *not* the task's pid namespace.  The name should clarify
that.

This makes it more obvious that setns on a pid namespace is weird --
it won't change the pid namespace shown in procfs.
Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
Reviewed-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2b1df2e

21 8月, 2013 1 次提交

cpuset: fix a regression in validating config change · 1c09b195

由 Li Zefan 提交于 8月 21, 2013

It's not allowed to clear masks of a cpuset if there're tasks in it,
but it's broken:

  # mkdir /cgroup/sub
  # echo 0 > /cgroup/sub/cpuset.cpus
  # echo 0 > /cgroup/sub/cpuset.mems
  # echo $$ > /cgroup/sub/tasks
  # echo > /cgroup/sub/cpuset.cpus
  (should fail)

This bug was introduced by commit 88fa523b
("cpuset: allow to move tasks to empty cpusets").

tj: Dropped temp bool variables and nestes the conditionals directly.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

1c09b195

20 8月, 2013 1 次提交

kernel: fix new kernel-doc warning in wait.c · 2203547f

由 Randy Dunlap 提交于 8月 18, 2013

Fix new kernel-doc warnings in kernel/wait.c:

Warning(kernel/wait.c:374): No description found for parameter 'p'
Warning(kernel/wait.c:374): Excess function parameter 'word' description in 'wake_up_atomic_t'
Warning(kernel/wait.c:374): Excess function parameter 'bit' description in 'wake_up_atomic_t'
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2203547f

14 8月, 2013 2 次提交

microblaze: fix clone syscall · dfa9771a

由 Michal Simek 提交于 8月 13, 2013

Fix inadvertent breakage in the clone syscall ABI for Microblaze that
was introduced in commit f3268edb ("microblaze: switch to generic
fork/vfork/clone").

The Microblaze syscall ABI for clone takes the parent tid address in the
4th argument; the third argument slot is used for the stack size.  The
incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
slot.

This commit restores the original ABI so that existing userspace libc
code will work correctly.

All kernel versions from v3.8-rc1 were affected.
Signed-off-by: NMichal Simek <michal.simek@xilinx.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfa9771a

PM / QoS: Fix workqueue deadlock when using pm_qos_update_request_timeout() · 40fea92f

由 Stephen Boyd 提交于 8月 13, 2013

pm_qos_update_request_timeout() updates a qos and then schedules
a delayed work item to bring the qos back down to the default
after the timeout. When the work item runs, pm_qos_work_fn() will
call pm_qos_update_request() and deadlock because it tries to
cancel itself via cancel_delayed_work_sync(). Future callers of
that qos will also hang waiting to cancel the work that is
canceling itself. Let's extract the little bit of code that does
the real work of pm_qos_update_request() and call it from the
work function so that we don't deadlock.

Before ed1ac6e9 (PM: don't use [delayed_]work_pending()) this didn't
happen because the work function wouldn't try to cancel itself.
Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
Reviewed-by: NTejun Heo <tj@kernel.org>
Cc: 3.9+ <stable@vger.kernel.org> # 3.9+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

40fea92f

13 8月, 2013 2 次提交

sched: fix the theoretical signal_wake_up() vs schedule() race · e0acd0a6

由 Oleg Nesterov 提交于 8月 12, 2013

This is only theoretical, but after try_to_wake_up(p) was changed
to check p->state under p->pi_lock the code like

	__set_current_state(TASK_INTERRUPTIBLE);
	schedule();

can miss a signal. This is the special case of wait-for-condition,
it relies on try_to_wake_up/schedule interaction and thus it does
not need mb() between __set_current_state() and if(signal_pending).

However, this __set_current_state() can move into the critical
section protected by rq->lock, now that try_to_wake_up() takes
another lock we need to ensure that it can't be reordered with
"if (signal_pending(current))" check inside that section.

The patch is actually one-liner, it simply adds smp_wmb() before
spin_lock_irq(rq->lock). This is what try_to_wake_up() already
does by the same reason.

We turn this wmb() into the new helper, smp_mb__before_spinlock(),
for better documentation and to allow the architectures to change
the default implementation.

While at it, kill smp_mb__after_lock(), it has no callers.

Perhaps we can also add smp_mb__before/after_spinunlock() for
prepare_to_wait().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e0acd0a6

cpuset: fix the return value of cpuset_write_u64() · a903f086

由 Li Zefan 提交于 8月 13, 2013

Writing to this file always returns -ENODEV:

  # echo 1 > cpuset.memory_pressure_enabled
  -bash: echo: write error: No such device
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Cc: <stable@vger.kernel.org> # 3.9+
Signed-off-by: NTejun Heo <tj@kernel.org>

a903f086

09 8月, 2013 1 次提交

userns: limit the maximum depth of user_namespace->parent chain · 8742f229

由 Oleg Nesterov 提交于 8月 08, 2013

Ensure that user_namespace->parent chain can't grow too much.
Currently we use the hardroded 32 as limit.
Reported-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8742f229

07 8月, 2013 3 次提交

printk: Fix return of braille_register_console() · 2cfe6c4a

由 Steven Rostedt 提交于 8月 05, 2013

Some of my configs I test with have CONFIG_A11Y_BRAILLE_CONSOLE set.
When I started testing against v3.11-rc4 my console went bonkers.  Using
ktest to bisect the issue, it came down to:

commit bbeddf52 "printk: move braille console support into separate
braille.[ch] files"

Looking into the patch I found the problem.  It's with the return of
braille_register_console().  As anything other than NULL is considered a
failure.

But for those of us that have CONFIG_A11Y_BRAILLE_CONSOLE set but do not
define a "brl" or "brl=" on the command line, we still may want a
console that those with sight can still use.

Return NULL (success) if "brl" or "brl=" is not on the console line.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
Acked-by: NJoe Perches <joe@perches.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2cfe6c4a

Revert "ptrace: PTRACE_DETACH should do flush_ptrace_hw_breakpoint(child)" · 35114fcb

由 Oleg Nesterov 提交于 8月 06, 2013

This reverts commit fab840fc.

This commit even has the test-case to prove that the tracee
can be killed by SIGTRAP if the debugger does not remove the
breakpoints before PTRACE_DETACH.

However, this is exactly what wineserver deliberately does,
set_thread_context() calls PTRACE_ATTACH + PTRACE_DETACH just
for PTRACE_POKEUSER(DR*) in between.

So we should revert this fix and document that PTRACE_DETACH
should keep the breakpoints.
Reported-by: NFelipe Contreras <felipe.contreras@gmail.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35114fcb

userns: unshare_userns(&cred) should not populate cred on failure · 6160968c

由 Oleg Nesterov 提交于 8月 06, 2013

unshare_userns(new_cred) does *new_cred = prepare_creds() before
create_user_ns() which can fail. However, the caller expects that
it doesn't need to take care of new_cred if unshare_userns() fails.

We could change the single caller, sys_unshare(), but I think it
would be more clean to avoid the side effects on failure, so with
this patch unshare_userns() does put_cred() itself and initializes
*new_cred only if create_user_ns() succeeeds.

Cc: stable@vger.kernel.org
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Reviewed-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6160968c

03 8月, 2013 3 次提交

tracing: Fix reset of time stamps during trace_clock changes · 9457158b

由 Alexander Z Lam 提交于 8月 02, 2013

Fixed two issues with changing the timestamp clock with trace_clock:

 - The global buffer was reset on instance clock changes. Change this to pass
   the correct per-instance buffer
 - ftrace_now() is used to set buf->time_start in tracing_reset_online_cpus().
   This was incorrect because ftrace_now() used the global buffer's clock to
   return the current time. Change this to use buffer_ftrace_now() which
   returns the current time for the correct per-instance buffer.

Also removed tracing_reset_current() because it is not used anywhere

Link: http://lkml.kernel.org/r/1375493777-17261-2-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: NAlexander Z Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9457158b

tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer · 711e1243

由 Alexander Z Lam 提交于 8月 02, 2013

Releasing the free_buffer file in an instance causes the global buffer
to be stopped when TRACE_ITER_STOP_ON_FREE is enabled. Operate on the
correct buffer.

Link: http://lkml.kernel.org/r/1375493777-17261-1-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: NAlexander Z Lam <azl@google.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

711e1243

tracing: Fix fields of struct trace_iterator that are zeroed by mistake · ed5467da

由 Andrew Vagin 提交于 8月 02, 2013

tracing_read_pipe zeros all fields bellow "seq". The declaration contains
a comment about that, but it doesn't help.

The first field is "snapshot", it's true when current open file is
snapshot. Looks obvious, that it should not be zeroed.

The second field is "started". It was converted from cpumask_t to
cpumask_var_t (v2.6.28-4983-g4462344e), in other words it was
converted from cpumask to pointer on cpumask.

Currently the reference on "started" memory is lost after the first read
from tracing_read_pipe and a proper object will never be freed.

The "started" is never dereferenced for trace_pipe, because trace_pipe
can't have the TRACE_FILE_ANNOTATE options.

Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org

Cc: stable@vger.kernel.org # 2.6.30
Signed-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

ed5467da

02 8月, 2013 1 次提交

tracing/uprobes: Fail to unregister if probe event files are in use · c6c2401d

由 Steven Rostedt (Red Hat) 提交于 7月 03, 2013

Uprobes suffer the same problem that kprobes have. There's a race between
writing to the "enable" file and removing the probe. The probe checks for
it being in use and if it is not, goes about deleting the probe and the
event that represents it. But the problem with that is, after it checks
if it is in use it can be enabled, and the deletion of the event (access
to the probe) will fail, as it is in use. But the uprobe will still be
deleted. This is a problem as the event can reference the uprobe that
was deleted.

The fix is to remove the event first, and check to make sure the event
removal succeeds. Then it is safe to remove the probe.

When the event exists, either ftrace or perf can enable the probe and
prevent the event from being removed.

Link: http://lkml.kernel.org/r/20130704034038.991525256@goodmis.orgAcked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c6c2401d

01 8月, 2013 10 次提交

workqueue: copy workqueue_attrs with all fields · 2865a8fb

由 Shaohua Li 提交于 8月 01, 2013

 $echo '0' > /sys/bus/workqueue/devices/xxx/numa
 $cat /sys/bus/workqueue/devices/xxx/numa

I got 1. It should be 0, the reason is copy_workqueue_attrs() called
in apply_workqueue_attrs() doesn't copy no_numa field.

Fix it by making copy_workqueue_attrs() copy ->no_numa too.  This
would also make get_unbound_pool() set a pool's ->no_numa attribute
according to the workqueue attributes used when the pool was created.
While harmelss, as ->no_numa isn't a pool attribute, this is a bit
confusing.  Clear it explicitly.

tj: Updated description and comments a bit.
Signed-off-by: NShaohua Li <shli@fusionio.com>
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org

2865a8fb

tracing/kprobes: Fail to unregister if probe event files are in use · 40c32592

由 Steven Rostedt (Red Hat) 提交于 7月 03, 2013

When a probe is being removed, it cleans up the event files that correspond
to the probe. But there is a race between writing to one of these files
and deleting the probe. This is especially true for the "enable" file.

	CPU 0				CPU 1
	-----				-----

				  fd = open("enable",O_WRONLY);

  probes_open()
  release_all_trace_probes()
  unregister_trace_probe()
  if (trace_probe_is_enabled(tp))
	return -EBUSY

				   write(fd, "1", 1)
				   __ftrace_set_clr_event()
				   call->class->reg()
				    (kprobe_register)
				     enable_trace_probe(tp)

  __unregister_trace_probe(tp);
  list_del(&tp->list)
  unregister_probe_event(tp) <-- fails!
  free_trace_probe(tp)

				   write(fd, "0", 1)
				   __ftrace_set_clr_event()
				   call->class->unreg
				    (kprobe_register)
				    disable_trace_probe(tp) <-- BOOM!

A test program was written that used two threads to simulate the
above scenario adding a nanosleep() interval to change the timings
and after several thousand runs, it was able to trigger this bug
and crash:

BUG: unable to handle kernel paging request at 00000005000000f9
IP: [<ffffffff810dee70>] probes_open+0x3b/0xa7
PGD 7808a067 PUD 0
Oops: 0000 [#1] PREEMPT SMP
Dumping ftrace buffer:
---------------------------------
Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6
CPU: 1 PID: 2070 Comm: test-kprobe-rem Not tainted 3.11.0-rc3-test+ #47
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
task: ffff880077756440 ti: ffff880076e52000 task.ti: ffff880076e52000
RIP: 0010:[<ffffffff810dee70>]  [<ffffffff810dee70>] probes_open+0x3b/0xa7
RSP: 0018:ffff880076e53c38  EFLAGS: 00010203
RAX: 0000000500000001 RBX: ffff88007844f440 RCX: 0000000000000003
RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880076e52000
RBP: ffff880076e53c58 R08: ffff880076e53bd8 R09: 0000000000000000
R10: ffff880077756440 R11: 0000000000000006 R12: ffffffff810dee35
R13: ffff880079250418 R14: 0000000000000000 R15: ffff88007844f450
FS:  00007f87a276f700(0000) GS:ffff88007d480000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000005000000f9 CR3: 0000000077262000 CR4: 00000000000007e0
Stack:
 ffff880076e53c58 ffffffff81219ea0 ffff88007844f440 ffffffff810dee35
 ffff880076e53ca8 ffffffff81130f78 ffff8800772986c0 ffff8800796f93a0
 ffffffff81d1b5d8 ffff880076e53e04 0000000000000000 ffff88007844f440
Call Trace:
 [<ffffffff81219ea0>] ? security_file_open+0x2c/0x30
 [<ffffffff810dee35>] ? unregister_trace_probe+0x4b/0x4b
 [<ffffffff81130f78>] do_dentry_open+0x162/0x226
 [<ffffffff81131186>] finish_open+0x46/0x54
 [<ffffffff8113f30b>] do_last+0x7f6/0x996
 [<ffffffff8113cc6f>] ? inode_permission+0x42/0x44
 [<ffffffff8113f6dd>] path_openat+0x232/0x496
 [<ffffffff8113fc30>] do_filp_open+0x3a/0x8a
 [<ffffffff8114ab32>] ? __alloc_fd+0x168/0x17a
 [<ffffffff81131f4e>] do_sys_open+0x70/0x102
 [<ffffffff8108f06e>] ? trace_hardirqs_on_caller+0x160/0x197
 [<ffffffff81131ffe>] SyS_open+0x1e/0x20
 [<ffffffff81522742>] system_call_fastpath+0x16/0x1b
Code: e5 41 54 53 48 89 f3 48 83 ec 10 48 23 56 78 48 39 c2 75 6c 31 f6 48 c7
RIP  [<ffffffff810dee70>] probes_open+0x3b/0xa7
 RSP <ffff880076e53c38>
CR2: 00000005000000f9
---[ end trace 35f17d68fc569897 ]---

The unregister_trace_probe() must be done first, and if it fails it must
fail the removal of the kprobe.

Several changes have already been made by Oleg Nesterov and Masami Hiramatsu
to allow moving the unregister_probe_event() before the removal of
the probe and exit the function if it fails. This prevents the tp
structure from being used after it is freed.

Link: http://lkml.kernel.org/r/20130704034038.819592356@goodmis.orgAcked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

40c32592

printk: rename struct log to struct printk_log · 62e32ac3

由 Joe Perches 提交于 7月 31, 2013

Rename the struct to enable moving portions of
printk.c to separate files.

The rename changes output of /proc/vmcoreinfo.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62e32ac3

printk: use pointer for console_cmdline indexing · 23475408

由 Joe Perches 提交于 7月 31, 2013

Make the code a bit more compact by always using a pointer for the active
console_cmdline.

Move overly indented code to correct indent level.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23475408

printk: move braille console support into separate braille.[ch] files · bbeddf52

由 Joe Perches 提交于 7月 31, 2013

Create files with prototypes and static inlines for braille support.  Make
braille_console functions return 1 on success.

Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
return value to NULL.
Signed-off-by: NJoe Perches <joe@perches.com>
Reviewed-by: NSamuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bbeddf52

printk: add console_cmdline.h · d197c43d

由 Joe Perches 提交于 7月 31, 2013

Add an include file for the console_cmdline struct so that the braille
console driver can be separated.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d197c43d

printk: move to separate directory for easier modification · b9ee979e

由 Joe Perches 提交于 7月 31, 2013

Make it easier to break up printk into bite-sized chunks.

Remove printk path/filename from comment.
Signed-off-by: NJoe Perches <joe@perches.com>
Cc: Samuel Thibault <samuel.thibault@ens-lyon.org>
Cc: Ming Lei <ming.lei@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9ee979e

mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG · 10e84b97

由 Dave Kleikamp 提交于 7月 31, 2013

Commit 3105b86a ("mm: sched: numa: Control enabling and disabling of
NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to
control the enabling and disabling of automatic NUMA balancing, but it
is never used.

I believe the intention was to use this in place of sched_feat_numa(NUMA).

Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will
never be changed from the initial "false".
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
Acked-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10e84b97

tracing: Add comment to describe special break case in probe_remove_event_call() · 2ba64035

由 Steven Rostedt (Red Hat) 提交于 7月 31, 2013

The "break" used in the do_for_each_event_file() is used as an optimization
as the loop is really a double loop. The loop searches all event files
for each trace_array. There's only one matching event file per trace_array
and after we find the event file for the trace_array, the break is used
to jump to the next trace_array and start the search there.

As this is not a standard way of using "break" in C code, it requires
a comment right before the break to let people know what is going on.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2ba64035

tracing: trace_remove_event_call() should fail if call/file is in use · 2816c551

由 Oleg Nesterov 提交于 7月 29, 2013

Change trace_remove_event_call(call) to return the error if this
call is active. This is what the callers assume but can't verify
outside of the tracing locks. Both trace_kprobe.c/trace_uprobe.c
need the additional changes, unregister_trace_probe() should abort
if trace_remove_event_call() fails.

The caller is going to free this call/file so we must ensure that
nobody can use them after trace_remove_event_call() succeeds.
debugfs should be fine after the previous changes and event_remove()
does TRACE_REG_UNREGISTER, but still there are 2 reasons why we need
the additional checks:

- There could be a perf_event(s) attached to this tp_event, so the
  patch checks ->perf_refcount.

- TRACE_REG_UNREGISTER can be suppressed by FTRACE_EVENT_FL_SOFT_MODE,
  so we simply check FTRACE_EVENT_FL_ENABLED protected by event_mutex.

Link: http://lkml.kernel.org/r/20130729175033.GB26284@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

2816c551

31 7月, 2013 4 次提交

cgroup: fix a leak when percpu_ref_init() fails · da0a12ca

由 Li Zefan 提交于 7月 31, 2013

ss->css_free() is not called when perfcpu_ref_init() fails.
Signed-off-by: NLi Zefan <lizefan@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

da0a12ca

ftrace: Check module functions being traced on reload · 8c4f3c3f

由 Steven Rostedt (Red Hat) 提交于 7月 30, 2013

There's been a nasty bug that would show up and not give much info.
The bug displayed the following warning:

 WARNING: at kernel/trace/ftrace.c:1529 __ftrace_hash_rec_update+0x1e3/0x230()
 Pid: 20903, comm: bash Tainted: G           O 3.6.11+ #38405.trunk
 Call Trace:
  [<ffffffff8103e5ff>] warn_slowpath_common+0x7f/0xc0
  [<ffffffff8103e65a>] warn_slowpath_null+0x1a/0x20
  [<ffffffff810c2ee3>] __ftrace_hash_rec_update+0x1e3/0x230
  [<ffffffff810c4f28>] ftrace_hash_move+0x28/0x1d0
  [<ffffffff811401cc>] ? kfree+0x2c/0x110
  [<ffffffff810c68ee>] ftrace_regex_release+0x8e/0x150
  [<ffffffff81149f1e>] __fput+0xae/0x220
  [<ffffffff8114a09e>] ____fput+0xe/0x10
  [<ffffffff8105fa22>] task_work_run+0x72/0x90
  [<ffffffff810028ec>] do_notify_resume+0x6c/0xc0
  [<ffffffff8126596e>] ? trace_hardirqs_on_thunk+0x3a/0x3c
  [<ffffffff815c0f88>] int_signal+0x12/0x17
 ---[ end trace 793179526ee09b2c ]---

It was finally narrowed down to unloading a module that was being traced.

It was actually more than that. When functions are being traced, there's
a table of all functions that have a ref count of the number of active
tracers attached to that function. When a function trace callback is
registered to a function, the function's record ref count is incremented.
When it is unregistered, the function's record ref count is decremented.
If an inconsistency is detected (ref count goes below zero) the above
warning is shown and the function tracing is permanently disabled until
reboot.

The ftrace callback ops holds a hash of functions that it filters on
(and/or filters off). If the hash is empty, the default means to filter
all functions (for the filter_hash) or to disable no functions (for the
notrace_hash).

When a module is unloaded, it frees the function records that represent
the module functions. These records exist on their own pages, that is
function records for one module will not exist on the same page as
function records for other modules or even the core kernel.

Now when a module unloads, the records that represents its functions are
freed. When the module is loaded again, the records are recreated with
a default ref count of zero (unless there's a callback that traces all
functions, then they will also be traced, and the ref count will be
incremented).

The problem is that if an ftrace callback hash includes functions of the
module being unloaded, those hash entries will not be removed. If the
module is reloaded in the same location, the hash entries still point
to the functions of the module but the module's ref counts do not reflect
that.

With the help of Steve and Joern, we found a reproducer:

 Using uinput module and uinput_release function.

 cd /sys/kernel/debug/tracing
 modprobe uinput
 echo uinput_release > set_ftrace_filter
 echo function > current_tracer
 rmmod uinput
 modprobe uinput
 # check /proc/modules to see if loaded in same addr, otherwise try again
 echo nop > current_tracer

 [BOOM]

The above loads the uinput module, which creates a table of functions that
can be traced within the module.

We add uinput_release to the filter_hash to trace just that function.

Enable function tracincg, which increments the ref count of the record
associated to uinput_release.

Remove uinput, which frees the records including the one that represents
uinput_release.

Load the uinput module again (and make sure it's at the same address).
This recreates the function records all with a ref count of zero,
including uinput_release.

Disable function tracing, which will decrement the ref count for uinput_release
which is now zero because of the module removal and reload, and we have
a mismatch (below zero ref count).

The solution is to check all currently tracing ftrace callbacks to see if any
are tracing any of the module's functions when a module is loaded (it already does
that with callbacks that trace all functions). If a callback happens to have
a module function being traced, it increments that records ref count and starts
tracing that function.

There may be a strange side effect with this, where tracing module functions
on unload and then reloading a new module may have that new module's functions
being traced. This may be something that confuses the user, but it's not
a big deal. Another approach is to disable all callback hashes on module unload,
but this leaves some ftrace callbacks that may not be registered, but can
still have hashes tracing the module's function where ftrace doesn't know about
it. That situation can cause the same bug. This solution solves that case too.
Another benefit of this solution, is it is possible to trace a module's
function on unload and load.

Link: http://lkml.kernel.org/r/20130705142629.GA325@redhat.comReported-by: NJörn Engel <joern@logfs.org>
Reported-by: NDave Jones <davej@redhat.com>
Reported-by: NSteve Hodgson <steve@purestorage.com>
Tested-by: NSteve Hodgson <steve@purestorage.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

8c4f3c3f

mutex: Fix w/w mutex deadlock injection · 85f48961

由 Maarten Lankhorst 提交于 7月 30, 2013

The check needs to be for > 1, because ctx->acquired is already incremented.
This will prevent ww_mutex_lock_slow from returning -EDEADLK and not locking
the mutex. It caused a lot of false gpu lockups on radeon with
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y because a function that shouldn't be able
to return -EDEADLK did.
Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/51F775B5.201@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

85f48961

sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks · bf0bd948

由 Peter Zijlstra 提交于 7月 26, 2013

We typically update a task_group's shares within the dequeue/enqueue
path.  However, continuously running tasks sharing a CPU are not
subject to these updates as they are only put/picked.  Unfortunately,
when we reverted f269ae04 (in 17bc14b7), we lost the augmenting
periodic update that was supposed to account for this; resulting in a
potential loss of fairness.

To fix this, re-introduce the explicit update in
update_cfs_rq_blocked_load() [called via entity_tick()].
Reported-by: NMax Hailperin <max@gustavus.edu>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Reviewed-by: NPaul Turner <pjt@google.com>
Link: http://lkml.kernel.org/n/tip-9545m3apw5d93ubyrotrj31y@git.kernel.org
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

bf0bd948

30 7月, 2013 8 次提交

freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes · 2b44c4db

由 Colin Cross 提交于 7月 24, 2013

Calling freeze_processes sets a global flag that will cause any
process that calls try_to_freeze to enter the refrigerator.  It
skips sending a signal to the current task, but if the current
task ever hits try_to_freeze, all threads will be frozen and the
system will deadlock.

Set a new flag, PF_SUSPEND_TASK, on the task that calls
freeze_processes.  The flag notifies the freezer that the thread
is involved in suspend and should not be frozen.  Also add a
WARN_ON in thaw_processes if the caller does not have the
PF_SUSPEND_TASK flag set to catch if a different task calls
thaw_processes than the one that called freeze_processes, leaving
a task with PF_SUSPEND_TASK permanently set on it.

Threads that spawn off a task with PF_SUSPEND_TASK set (which
swsusp does) will also have PF_SUSPEND_TASK set, preventing them
from freezing while they are helping with suspend, but they need
to be dead by the time suspend is triggered, otherwise they may
run when userspace is expected to be frozen.  Add a WARN_ON in
thaw_processes if more than one thread has the PF_SUSPEND_TASK
flag set.
Reported-and-tested-by: NMichael Leun <lkml20130126@newton.leun.net>
Signed-off-by: NColin Cross <ccross@android.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2b44c4db

ftrace: Consolidate some duplicate code for updating ftrace ops · 1c80c432

由 Steven Rostedt (Red Hat) 提交于 7月 25, 2013

When ftrace ops modifies the functions that it will trace, the update
to the function mcount callers may need to be modified. Consolidate
the two places that do the checks to see if an update is required
with a wrapper function for those checks.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1c80c432

tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private · bf682c31

由 Oleg Nesterov 提交于 7月 28, 2013

Change remove_event_file_dir() to clear ->i_private for every
file we are going to remove.

We need to check file->dir != NULL because event_create_dir()
can fail. debugfs_remove_recursive(NULL) is fine but the patch
moves it under the same check anyway for readability.

spin_lock(d_lock) and "d_inode != NULL" check are not needed
afaics, but I do not understand this code enough.

tracing_open_generic_file() and tracing_release_generic_file()
can go away, ftrace_enable_fops and ftrace_event_filter_fops()
use tracing_open_generic() but only to check tracing_disabled.

This fixes all races with event_remove() or instance_delete().
f_op->read/write/whatever can never use the freed file/call,
all event/* files were changed to check and use ->i_private
under event_mutex.

Note: this doesn't not fix other problems, event_remove() can
destroy the active ftrace_event_call, we need more changes but
those changes are completely orthogonal.

Link: http://lkml.kernel.org/r/20130728183527.GB16723@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bf682c31

tracing: Introduce remove_event_file_dir() · f6a84bdc

由 Oleg Nesterov 提交于 7月 26, 2013

Preparation for the next patch. Extract the common code from
remove_event_from_tracers() and __trace_remove_event_dirs()
into the new helper, remove_event_file_dir().

The patch looks more complicated than it actually is, it also
moves remove_subsystem() up to avoid the forward declaration.

Link: http://lkml.kernel.org/r/20130726172547.GA3629@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

f6a84bdc

tracing: Change f_start() to take event_mutex and verify i_private != NULL · c5a44a12

由 Oleg Nesterov 提交于 7月 26, 2013

trace_format_open() and trace_format_seq_ops are racy, nothing
protects ftrace_event_call from trace_remove_event_call().

Change f_start() to take event_mutex and verify i_private != NULL,
change f_stop() to drop this lock.

This fixes nothing, but now we can change debugfs_remove("format")
callers to nullify ->i_private and fix the the problem.

Note: the usage of event_mutex is sub-optimal but simple, we can
change this later.

Link: http://lkml.kernel.org/r/20130726172543.GA3622@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

c5a44a12

tracing: Change event_filter_read/write to verify i_private != NULL · e2912b09

由 Oleg Nesterov 提交于 7月 26, 2013

event_filter_read/write() are racy, ftrace_event_call can be already
freed by trace_remove_event_call() callers.

1. Shift mutex_lock(event_mutex) from print/apply_event_filter to
   the callers.

2. Change the callers, event_filter_read() and event_filter_write()
   to read i_private under this mutex and abort if it is NULL.

This fixes nothing, but now we can change debugfs_remove("filter")
callers to nullify ->i_private and fix the the problem.

Link: http://lkml.kernel.org/r/20130726172540.GA3619@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

e2912b09

tracing: Change event_enable/disable_read() to verify i_private != NULL · bc6f6b08

由 Oleg Nesterov 提交于 7月 26, 2013

tracing_open_generic_file() is racy, ftrace_event_file can be
already freed by rmdir or trace_remove_event_call().

Change event_enable_read() and event_disable_read() to read and
verify "file = i_private" under event_mutex.

This fixes nothing, but now we can change debugfs_remove("enable")
callers to nullify ->i_private and fix the the problem.

Link: http://lkml.kernel.org/r/20130726172536.GA3612@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

bc6f6b08

tracing: Turn event/id->i_private into call->event.type · 1a11126b

由 Oleg Nesterov 提交于 7月 26, 2013

event_id_read() is racy, ftrace_event_call can be already freed
by trace_remove_event_call() callers.

Change event_create_dir() to pass "data = call->event.type", this
is all event_id_read() needs. ftrace_event_id_fops no longer needs
tracing_open_generic().

We add the new helper, event_file_data(), to read ->i_private, it
will have more users.

Note: currently ACCESS_ONCE() and "id != 0" check are not needed,
but we are going to change event_remove/rmdir to clear ->i_private.

Link: http://lkml.kernel.org/r/20130726172532.GA3605@redhat.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

1a11126b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功