1. 16 4月, 2014 1 次提交
  2. 15 4月, 2014 1 次提交
    • M
      user namespace: fix incorrect memory barriers · e79323bd
      Mikulas Patocka 提交于
      smp_read_barrier_depends() can be used if there is data dependency between
      the readers - i.e. if the read operation after the barrier uses address
      that was obtained from the read operation before the barrier.
      
      In this file, there is only control dependency, no data dependecy, so the
      use of smp_read_barrier_depends() is incorrect. The code could fail in the
      following way:
      * the cpu predicts that idx < entries is true and starts executing the
        body of the for loop
      * the cpu fetches map->extent[0].first and map->extent[0].count
      * the cpu fetches map->nr_extents
      * the cpu verifies that idx < extents is true, so it commits the
        instructions in the body of the for loop
      
      The problem is that in this scenario, the cpu read map->extent[0].first
      and map->nr_extents in the wrong order. We need a full read memory barrier
      to prevent it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e79323bd
  3. 13 4月, 2014 1 次提交
  4. 12 4月, 2014 1 次提交
  5. 11 4月, 2014 1 次提交
  6. 10 4月, 2014 1 次提交
  7. 09 4月, 2014 4 次提交
    • L
      futex: avoid race between requeue and wake · 69cd9eba
      Linus Torvalds 提交于
      Jan Stancek reported:
       "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
        occasionally fails, because some threads fail to wake up.
      
        Testcase creates 5 threads, which are all waiting on same condition.
        Main thread then calls pthread_cond_broadcast() without holding mutex,
        which calls:
      
            futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)
      
        This immediately wakes up single thread A, which unlocks mutex and
        tries to wake up another thread:
      
            futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
      
        If thread A manages to call futex_wake() before any waiters are
        requeued for uaddr2, no other thread is woken up"
      
      The ordering constraints for the hash bucket waiter counting are that
      the waiter counts have to be incremented _before_ getting the spinlock
      (because the spinlock acts as part of the memory barrier), but the
      "requeue" operation didn't honor those rules, and nobody had even
      thought about that case.
      
      This fairly simple patch just increments the waiter count for the target
      hash bucket (hb2) when requeing a futex before taking the locks.  It
      then decrements them again after releasing the lock - the code that
      actually moves the futex(es) between hash buckets will do the additional
      required waiter count housekeeping.
      Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 3.14
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69cd9eba
    • M
      tracepoint: Fix sparse warnings in tracepoint.c · b725dfea
      Mathieu Desnoyers 提交于
      Fix the following sparse warnings:
      
        CHECK   kernel/tracepoint.c
      kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:184:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:184:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:216:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:216:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:392:24: error: return expression in void function
        CC      kernel/tracepoint.o
      kernel/tracepoint.c: In function tracepoint_module_going:
      kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
      kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b725dfea
    • S
      tracepoint: Simplify tracepoint module search · eb7d035c
      Steven Rostedt (Red Hat) 提交于
      Instead of copying the num_tracepoints and tracepoints_ptrs from
      the module structure to the tp_mod structure, which only uses it to
      find the module associated to tracepoints of modules that are coming
      and going, simply copy the pointer to the module struct to the tracepoint
      tp_module structure.
      
      Also removed un-needed brackets around an if statement.
      
      Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.homeAcked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb7d035c
    • M
      tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints · de7b2973
      Mathieu Desnoyers 提交于
      Register/unregister tracepoint probes with struct tracepoint pointer
      rather than tracepoint name.
      
      This change, which vastly simplifies tracepoint.c, has been proposed by
      Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
      size.
      
      From this point on, the tracers need to pass a struct tracepoint pointer
      to probe register/unregister. A probe can now only be connected to a
      tracepoint that exists. Moreover, tracers are responsible for
      unregistering the probe before the module containing its associated
      tracepoint is unloaded.
      
         text    data     bss     dec     hex filename
      10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
      10434930        4282848 10391552        25109330        17f2352 vmlinux
      
      Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com
      
      CC: Ingo Molnar <mingo@kernel.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Frank Ch. Eigler <fche@redhat.com>
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [ SDR - fixed return val in void func in tracepoint_module_going() ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de7b2973
  8. 08 4月, 2014 21 次提交
    • J
      lglock: map to spinlock when !CONFIG_SMP · 64b47e8f
      Josh Triplett 提交于
      When the system has only one CPU, lglock is effectively a spinlock; map
      it directly to spinlock to eliminate the indirection and duplicate code.
      
      In addition to removing overhead, this drops 1.6k of code with a
      defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64b47e8f
    • C
      modules: use raw_cpu_write for initialization of per cpu refcount. · 08f141d3
      Christoph Lameter 提交于
      The initialization of a structure is not subject to synchronization.
      The use of __this_cpu would trigger a false positive with the additional
      preemption checks for __this_cpu ops.
      
      So simply disable the check through the use of raw_cpu ops.
      
      Trace:
      
        __this_cpu_write operation in preemptible [00000000] code: modprobe/286
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          dump_stack+0x4e/0x82
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          load_module+0xcfd/0x2650
          SyS_init_module+0xa6/0xd0
          tracesys+0xe1/0xe6
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08f141d3
    • G
      kernel: use macros from compiler.h instead of __attribute__((...)) · 52f5684c
      Gideon Israel Dsouza 提交于
      To increase compiler portability there is <linux/compiler.h> which
      provides convenience macros for various gcc constructs.  Eg: __weak for
      __attribute__((weak)).  I've replaced all instances of gcc attributes
      with the right macro in the kernel subsystem.
      Signed-off-by: NGideon Israel Dsouza <gidisrael@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52f5684c
    • F
      kernel/panic.c: display reason at end + pr_emerg · d7c0847f
      Fabian Frederick 提交于
      Currently, booting without initrd specified on 80x25 screen gives a call
      trace followed by atkbd : Spurious ACK.  Original message ("VFS: Unable
      to mount root fs") is not available.  Of course this could happen in
      other situations...
      
      This patch displays panic reason after call trace which could help lot
      of people even if it's not the very last line on screen.
      
      Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
      
      [akpm@linux-foundation.org: missed a couple of pr_ conversions]
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7c0847f
    • L
      hung_task: check the value of "sysctl_hung_task_timeout_sec" · 80df2847
      Liu Hua 提交于
      As sysctl_hung_task_timeout_sec is unsigned long, when this value is
      larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
      watchdog will return immediately without sleep and with print :
      
        schedule_timeout: wrong timeout value ffffffffffffff83
      
      and then the funtion watchdog will call schedule_timeout_interruptible
      again and again.  The screen will be filled with
      
      	"schedule_timeout: wrong timeout value ffffffffffffff83"
      
      This patch does some check and correction in sysctl, to let the function
      schedule_timeout_interruptible allways get the valid parameter.
      Signed-off-by: NLiu Hua <sdu.liu@huawei.com>
      Tested-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80df2847
    • O
      wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process · 7c733eb3
      Oleg Nesterov 提交于
      Even if the main thread is dead the process still can stop/continue.
      However, if the leader is ptraced wait_consider_task(ptrace => false)
      always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
      WCONTINUED can never work for the natural parent in this case.
      
      Move the "A zombie ptracee is only visible to its ptracer" check into the
      "if (!delay_group_leader(p))" block.  ->notask_error is cleared by the
      "fall through" code below.
      
      This depends on the previous change, wait_task_stopped/continued must be
      avoided if !delay_group_leader() and the tracer is ->real_parent.
      Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
      is already dead (single-threaded or not).  If it is traced by another task
      then the "stopped" state is fine until the debugger detaches and reveals a
      zombie state.
      
      Stupid test-case:
      
      	void *tfunc(void *arg)
      	{
      		sleep(1);	// wait for zombie leader
      		raise(SIGSTOP);
      		exit(0x13);
      		return NULL;
      	}
      
      	int run_child(void)
      	{
      		pthread_t thread;
      
      		if (!fork()) {
      			int tracee = getppid();
      
      			assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
      			do
      				ptrace(PTRACE_CONT, tracee, 0,0);
      			while (wait(NULL) > 0);
      
      			return 0;
      		}
      
      		sleep(1);	// wait for PTRACE_ATTACH
      		assert(pthread_create(&thread, NULL, tfunc, NULL) == 0);
      		pthread_exit(NULL);
      	}
      
      	int main(void)
      	{
      		int child, stat;
      
      		child = fork();
      		if (!child)
      			return run_child();
      
      		assert(child == waitpid(-1, &stat, WSTOPPED));
      		assert(stat == 0x137f);
      
      		kill(child, SIGCONT);
      
      		assert(child == waitpid(-1, &stat, WCONTINUED));
      		assert(stat == 0xffff);
      
      		assert(child == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
      never called.
      
      Note: this doesn't fix all problems with a zombie delay_group_leader(),
      WCONTINUED | WEXITED check is not exactly right.  debugger can't assume it
      will be notified if another thread reaps the whole thread group.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c733eb3
    • O
      wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent · 377d75da
      Oleg Nesterov 提交于
      "A zombie is only visible to its ptracer" logic in wait_consider_task()
      is very wrong. Trivial test-case:
      
      	#include <unistd.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <assert.h>
      
      	int main(void)
      	{
      		int child = fork();
      
      		if (!child) {
      			assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      			return 0x23;
      		}
      
      		assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0);
      		assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1);
      		return 0;
      	}
      
      it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
      child.  This is because wait_consider_task(ptrace => 0) sees p->ptrace and
      cleares ->notask_error assuming that the debugger should detach and notify
      us.
      
      Change wait_consider_task(ptrace => 0) to pretend that ptrace == T if the
      child is traced by us.  This really simplifies the logic and allows us to
      do more fixes, see the next changes.  This also hides the unwanted group
      stop state automatically, we can remove another ptrace_reparented() check.
      
      Unfortunately, this adds the following behavioural changes:
      
      	1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
      	   a natural child if it is traced by the caller's sub-thread.
      
      	   Hopefully nobody will ever notice this change, and I think
      	   that nobody should rely on this behaviour anyway.
      
      	2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
      	   it is real parent.
      
      	   While this change comes as a side effect, I think it is good
      	   by itself. The group continued state can not be consumed by
      	   another process in this case, it doesn't depend on ptrace,
      	   it doesn't make sense to hide it from real parent.
      
      	   Perhaps we should add the thread_group_leader() check before
      	   wait_task_continued()? May be, but this shouldn't depend on
      	   ptrace_reparented().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      377d75da
    • O
      wait: completely ignore the EXIT_DEAD tasks · b3ab0316
      Oleg Nesterov 提交于
      Now that EXIT_DEAD is the terminal state it doesn't make sense to call
      eligible_child() or security_task_wait() if the task is really dead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3ab0316
    • O
      wait: use EXIT_TRACE only if thread_group_leader(zombie) · b4360690
      Oleg Nesterov 提交于
      wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
      ptrace_reparented().  This is suboptimal and a bit confusing: we do not
      need do_notify_parent(p) if !thread_group_leader(p) and in this case we
      also do not need ptrace_unlink(), we can rely on ptrace_release_task().
      
      Change wait_task_zombie() to check thread_group_leader() along with
      ptrace_reparented() and simplify the final p->exit_state transition.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4360690
    • O
      wait: introduce EXIT_TRACE to avoid the racy EXIT_DEAD->EXIT_ZOMBIE transition · abd50b39
      Oleg Nesterov 提交于
      wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
      drops tasklist_lock.  If this task is not the natural child and it is
      traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
      
      The last transition is racy, this is even documented in 50b8d257
      "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
      race".  wait_consider_task() tries to detect this transition and clear
      ->notask_error but we can't rely on ptrace_reparented(), debugger can
      exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
      
      And there is another problem which were missed before: this transition
      can also race with reparent_leader() which doesn't reset >exit_signal if
      EXIT_DEAD, assuming that this task must be reaped by someone else.  So
      the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
      /sbin/init doesn't use __WALL it becomes unreapable.  This was fixed by
      the previous commit, but it was the temporary hack.
      
      1. Add the new exit_state, EXIT_TRACE. It means that the task is the
         traced zombie, debugger is going to detach and notify its natural
         parent.
      
         This new state is actually EXIT_ZOMBIE | EXIT_DEAD. This way we
         can avoid the changes in proc/kgdb code, get_task_state() still
         reports "X (dead)" in this case.
      
         Note: with or without this change userspace can see Z -> X -> Z
         transition. Not really bad, but probably makes sense to fix.
      
      2. Change wait_task_zombie() to use EXIT_TRACE instead of EXIT_DEAD
         if we need to notify the ->real_parent.
      
      3. Revert the previous hack in reparent_leader(), now that EXIT_DEAD
         is always the final state we can safely ignore such a task.
      
      4. Change wait_consider_task() to check EXIT_TRACE separately and kill
         the racy and no longer needed ptrace_reparented() case.
      
         If ptrace == T an EXIT_TRACE thread should be simply ignored, the
         owner of this state is going to ptrace_unlink() this task. We can
         pretend that it was already removed from ->ptraced list.
      
         Otherwise we should skip this thread too but clear ->notask_error,
         we must be the natural parent and debugger is going to untrace and
         notify us. IOW, this doesn't differ from "EXIT_ZOMBIE && p->ptrace"
         even if the task was already untraced.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Reported-by: NMichal Schmidt <mschmidt@redhat.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abd50b39
    • O
      wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race · dfccbb5e
      Oleg Nesterov 提交于
      wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
      drops tasklist_lock.  If this task is not the natural child and it is
      traced, we change its state back to EXIT_ZOMBIE for ->real_parent.
      
      The last transition is racy, this is even documented in 50b8d257
      "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
      race".  wait_consider_task() tries to detect this transition and clear
      ->notask_error but we can't rely on ptrace_reparented(), debugger can
      exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.
      
      And there is another problem which were missed before: this transition
      can also race with reparent_leader() which doesn't reset >exit_signal if
      EXIT_DEAD, assuming that this task must be reaped by someone else.  So
      the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
      /sbin/init doesn't use __WALL it becomes unreapable.
      
      Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
      Note: this is the simple temporary hack for -stable, it doesn't try to
      solve all problems, it will be reverted by the next changes.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Reported-by: NMichal Schmidt <mschmidt@redhat.com>
      Tested-by: NMichal Schmidt <mschmidt@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfccbb5e
    • G
      kernel/exit.c: call proc_exit_connector() after exit_state is set · ef982393
      Guillaume Morin 提交于
      The process events connector delivers a notification when a process
      exits.  This is really convenient for a process that spawns and wants to
      monitor its children through an epoll-able() interface.
      
      Unfortunately, there is a small window between when the event is
      delivered and the child become wait()-able.
      
      This is creates a race if the parent wants to make sure that it knows
      about the exit, e.g
      
      pid_t pid = fork();
      if (pid > 0) {
      	register_interest_for_pid(pid);
      	if (waitpid(pid, NULL, WNOHANG) > 0)
      	{
      	  /* We might have raced with exit() */
      	}
      	return;
      }
      
      /* Child */
      execve(...)
      
      register_interest_for_pid() would be telling the the connector socket
      reader to pay attention to events related to pid.
      
      Though this is not a bug, I think it would make the connector a bit more
      usable if this race was closed by simply moving the call to
      proc_exit_connector() from just before exit_notify() to right after.
      
      Oleg said:
      
      : Even with this patch the code above is still "racy" if the child is
      : multi-threaded.  Plus it should obviously filter-out subthreads.  And
      : afaics there is no way to make it reliable, even if you change the code
      : above so that waitpid() is called only after the last thread exits WNOHANG
      : still can fail.
      Signed-off-by: NGuillaume Morin <guillaume@morinfr.org>
      Cc: Matt Helsley <matt.helsley@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef982393
    • O
      exit: move check_stack_usage() to the end of do_exit() · 4bcb8232
      Oleg Nesterov 提交于
      It is not clear why check_stack_usage() is called so early and thus it
      never checks the stack usage in, say, exit_notify() or
      flush_ptrace_hw_breakpoint() or other functions which are only called by
      do_exit().
      
      Move the callsite down to the last preempt_disable/schedule.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4bcb8232
    • O
      exit: call disassociate_ctty() before exit_task_namespaces() · c39df5fa
      Oleg Nesterov 提交于
      Commit 8aac6270 ("move exit_task_namespaces() outside of
      exit_notify()") breaks pppd and the exiting service crashes the kernel:
      
          BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
          IP: ppp_register_channel+0x13/0x20 [ppp_generic]
          Call Trace:
            ppp_asynctty_open+0x12b/0x170 [ppp_async]
            tty_ldisc_open.isra.2+0x27/0x60
            tty_ldisc_hangup+0x1e3/0x220
            __tty_hangup+0x2c4/0x440
            disassociate_ctty+0x61/0x270
            do_exit+0x7f2/0xa50
      
      ppp_register_channel() needs ->net_ns and current->nsproxy == NULL.
      
      Move disassociate_ctty() before exit_task_namespaces(), it doesn't make
      sense to delay it after perf_event_exit_task() or cgroup_exit().
      
      This also allows to use task_work_add() inside the (nontrivial) code
      paths in disassociate_ctty().
      
      Investigated by Peter Hurley.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NSree Harsha Totakura <sreeharsha@totakura.in>
      Cc: Peter Hurley <peter@hurleysoftware.com>
      Cc: Sree Harsha Totakura <sreeharsha@totakura.in>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Andrey Vagin <avagin@openvz.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[v3.10+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c39df5fa
    • D
      res_counter: remove interface for locked charging and uncharging · 539a13b4
      David Rientjes 提交于
      The res_counter_{charge,uncharge}_locked() variants are not used in the
      kernel outside of the resource counter code itself, so remove the
      interface.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      539a13b4
    • D
      mm, mempolicy: remove per-process flag · f0432d15
      David Rientjes 提交于
      PF_MEMPOLICY is an unnecessary optimization for CONFIG_SLAB users.
      There's no significant performance degradation to checking
      current->mempolicy rather than current->flags & PF_MEMPOLICY in the
      allocation path, especially since this is considered unlikely().
      
      Running TCP_RR with netperf-2.4.5 through localhost on 16 cpu machine with
      64GB of memory and without a mempolicy:
      
      	threads		before		after
      	16		1249409		1244487
      	32		1281786		1246783
      	48		1239175		1239138
      	64		1244642		1241841
      	80		1244346		1248918
      	96		1266436		1254316
      	112		1307398		1312135
      	128		1327607		1326502
      
      Per-process flags are a scarce resource so we should free them up whenever
      possible and make them available.  We'll be using it shortly for memcg oom
      reserves.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0432d15
    • D
      fork: collapse copy_flags into copy_process · 514ddb44
      David Rientjes 提交于
      copy_flags() does not use the clone_flags formal and can be collapsed
      into copy_process() for cleaner code.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Tim Hockin <thockin@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514ddb44
    • D
      mm: per-thread vma caching · 615d6e87
      Davidlohr Bueso 提交于
      This patch is a continuation of efforts trying to optimize find_vma(),
      avoiding potentially expensive rbtree walks to locate a vma upon faults.
      The original approach (https://lkml.org/lkml/2013/11/1/410), where the
      largest vma was also cached, ended up being too specific and random,
      thus further comparison with other approaches were needed.  There are
      two things to consider when dealing with this, the cache hit rate and
      the latency of find_vma().  Improving the hit-rate does not necessarily
      translate in finding the vma any faster, as the overhead of any fancy
      caching schemes can be too high to consider.
      
      We currently cache the last used vma for the whole address space, which
      provides a nice optimization, reducing the total cycles in find_vma() by
      up to 250%, for workloads with good locality.  On the other hand, this
      simple scheme is pretty much useless for workloads with poor locality.
      Analyzing ebizzy runs shows that, no matter how many threads are
      running, the mmap_cache hit rate is less than 2%, and in many situations
      below 1%.
      
      The proposed approach is to replace this scheme with a small per-thread
      cache, maximizing hit rates at a very low maintenance cost.
      Invalidations are performed by simply bumping up a 32-bit sequence
      number.  The only expensive operation is in the rare case of a seq
      number overflow, where all caches that share the same address space are
      flushed.  Upon a miss, the proposed replacement policy is based on the
      page number that contains the virtual address in question.  Concretely,
      the following results are seen on an 80 core, 8 socket x86-64 box:
      
      1) System bootup: Most programs are single threaded, so the per-thread
         scheme does improve ~50% hit rate by just adding a few more slots to
         the cache.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 50.61%   | 19.90            |
      | patched        | 73.45%   | 13.58            |
      +----------------+----------+------------------+
      
      2) Kernel build: This one is already pretty good with the current
         approach as we're dealing with good locality.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 75.28%   | 11.03            |
      | patched        | 88.09%   | 9.31             |
      +----------------+----------+------------------+
      
      3) Oracle 11g Data Mining (4k pages): Similar to the kernel build workload.
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 70.66%   | 17.14            |
      | patched        | 91.15%   | 12.57            |
      +----------------+----------+------------------+
      
      4) Ebizzy: There's a fair amount of variation from run to run, but this
         approach always shows nearly perfect hit rates, while baseline is just
         about non-existent.  The amounts of cycles can fluctuate between
         anywhere from ~60 to ~116 for the baseline scheme, but this approach
         reduces it considerably.  For instance, with 80 threads:
      
      +----------------+----------+------------------+
      | caching scheme | hit-rate | cycles (billion) |
      +----------------+----------+------------------+
      | baseline       | 1.06%    | 91.54            |
      | patched        | 99.97%   | 14.18            |
      +----------------+----------+------------------+
      
      [akpm@linux-foundation.org: fix nommu build, per Davidlohr]
      [akpm@linux-foundation.org: document vmacache_valid() logic]
      [akpm@linux-foundation.org: attempt to untangle header files]
      [akpm@linux-foundation.org: add vmacache_find() BUG_ON]
      [hughd@google.com: add vmacache_valid_mm() (from Oleg)]
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: adjust and enhance comments]
      Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Tested-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      615d6e87
    • A
      mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE · a0715cc2
      Alex Thorlton 提交于
      Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
      also adds a prctl control which allows us to set the THP disable bit in
      mm->def_flags so that VMs will pick up the setting as they are created.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0715cc2
    • T
      cgroup: newly created dirs and files should be owned by the creator · 49957f8e
      Tejun Heo 提交于
      While converting cgroup to kernfs, 2bd59d48 ("cgroup: convert to
      kernfs") accidentally dropped the logic which makes newly created
      cgroup dirs and files owned by the current uid / gid.  This broke
      cases where cgroup subtree management is delegated to !root as the sub
      manager wouldn't be able to create more than single level of hierarchy
      or put tasks into child cgroups it created.
      
      Among other things, this breaks user session management in systemd and
      one of the symptoms was 90s hang during shutdown.  User session
      systemd running as the user creates a sub-service to initiate shutdown
      and tries to put kill(1) into it but fails because cgroup.procs is
      owned by root.  This leads to 90s hang during shutdown.
      
      Implement cgroup_kn_set_ugid() which sets a kn's uid and gid to those
      of the caller and use it from file and dir creation paths.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49957f8e
    • A
      sched: remove sleep_on() and friends · b8780c36
      Arnd Bergmann 提交于
      This is the final piece in the puzzle, as all patches to remove the
      last users of \(interruptible_\|\)sleep_on\(_timeout\|\) have made it
      into the 3.15 merge window. The work was long overdue, and this
      interface in particular should not have survived the BKL removal
      that was done a couple of years ago.
      
      Citing Jon Corbet from http://lwn.net/2001/0201/kernel.php3":
      
       "[...] it was suggested that the janitors look for and fix all code
        that calls sleep_on() [...] since (1) almost all such code is
        incorrect, and (2) Linus has agreed that those functions should
        be removed in the 2.5 development series".
      
      We haven't quite made it for 2.5, but maybe we can merge this for 3.15.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8780c36
  9. 04 4月, 2014 9 次提交
    • L
      cgroup: fix top cgroup refcnt leak · c6b3d5bc
      Li Zefan 提交于
      As mount() and kill_sb() is not a one-to-one match, If we mount the same
      cgroupfs in serveral mount points, and then umount all of them, kill_sb()
      will be called only once.
      
      Try:
              # mount -t cgroup -o cpuacct xxx /cgroup
              # mount -t cgroup -o cpuacct xxx /cgroup2
              # cat /proc/cgroups | grep cpuacct
              cpuacct 2       1       1
              # umount /cgroup
              # umount /cgroup2
              # cat /proc/cgroups | grep cpuacct
              cpuacct 2       1       1
      
      You'll see cgroupfs will never be freed.
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c6b3d5bc
    • J
      printk: fix one circular lockdep warning about console_lock · 72581487
      Jane Li 提交于
      Fix a warning about possible circular locking dependency.
      
      If do in following sequence:
      
          enter suspend ->  resume ->  plug-out CPUx (echo 0 > cpux/online)
      
      lockdep will show warning as following:
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.10.0 #2 Tainted: G           O
        -------------------------------------------------------
        sh/1271 is trying to acquire lock:
        (console_lock){+.+.+.}, at: console_cpu_notify+0x20/0x2c
        but task is already holding lock:
        (cpu_hotplug.lock){+.+.+.}, at: cpu_hotplug_begin+0x2c/0x58
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
        -> #2 (cpu_hotplug.lock){+.+.+.}:
          lock_acquire+0x98/0x12c
          mutex_lock_nested+0x50/0x3d8
          cpu_hotplug_begin+0x2c/0x58
          _cpu_up+0x24/0x154
          cpu_up+0x64/0x84
          smp_init+0x9c/0xd4
          kernel_init_freeable+0x78/0x1c8
          kernel_init+0x8/0xe4
          ret_from_fork+0x14/0x2c
      
        -> #1 (cpu_add_remove_lock){+.+.+.}:
          lock_acquire+0x98/0x12c
          mutex_lock_nested+0x50/0x3d8
          disable_nonboot_cpus+0x8/0xe8
          suspend_devices_and_enter+0x214/0x448
          pm_suspend+0x1e4/0x284
          try_to_suspend+0xa4/0xbc
          process_one_work+0x1c4/0x4fc
          worker_thread+0x138/0x37c
          kthread+0xa4/0xb0
          ret_from_fork+0x14/0x2c
      
        -> #0 (console_lock){+.+.+.}:
          __lock_acquire+0x1b38/0x1b80
          lock_acquire+0x98/0x12c
          console_lock+0x54/0x68
          console_cpu_notify+0x20/0x2c
          notifier_call_chain+0x44/0x84
          __cpu_notify+0x2c/0x48
          cpu_notify_nofail+0x8/0x14
          _cpu_down+0xf4/0x258
          cpu_down+0x24/0x40
          store_online+0x30/0x74
          dev_attr_store+0x18/0x24
          sysfs_write_file+0x16c/0x19c
          vfs_write+0xb4/0x190
          SyS_write+0x3c/0x70
          ret_fast_syscall+0x0/0x48
      
        Chain exists of:
           console_lock --> cpu_add_remove_lock --> cpu_hotplug.lock
      
        Possible unsafe locking scenario:
               CPU0                    CPU1
               ----                    ----
        lock(cpu_hotplug.lock);
                                       lock(cpu_add_remove_lock);
                                       lock(cpu_hotplug.lock);
        lock(console_lock);
          *** DEADLOCK ***
      
      There are three locks involved in two sequence:
      a) pm suspend:
      	console_lock (@suspend_console())
      	cpu_add_remove_lock (@disable_nonboot_cpus())
      	cpu_hotplug.lock (@_cpu_down())
      b) Plug-out CPUx:
      	cpu_add_remove_lock (@(cpu_down())
      	cpu_hotplug.lock (@_cpu_down())
      	console_lock (@console_cpu_notify()) => Lockdeps prints warning log.
      
      There should be not real deadlock, as flag of console_suspended can
      protect this.
      
      Although console_suspend() releases console_sem, it doesn't tell lockdep
      about it.  That results in the lockdep warning about circular locking
      when doing the following: enter suspend -> resume -> plug-out CPUx (echo
      0 > cpux/online)
      
      Fix the problem by telling lockdep we actually released the semaphore in
      console_suspend() and acquired it again in console_resume().
      Signed-off-by: NJane Li <jiel@marvell.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72581487
    • P
      printk: do not compute the size of the message twice · fce6e033
      Petr Mladek 提交于
      This is just a tiny optimization.  It removes duplicate computation of
      the message size.
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fce6e033
    • P
      printk: use also the last bytes in the ring buffer · 39b25109
      Petr Mladek 提交于
      It seems that we have newer used the last byte in the ring buffer.  In
      fact, we have newer used the last 4 bytes because of padding.
      
      First problem is in the check for free space.  The exact number of free
      bytes is enough to store the length of data.
      
      Second problem is in the check where the ring buffer is rotated.  The
      left side counts the first unused index.  It is unused, so it might be
      the same as the size of the buffer.
      
      Note that the first problem has to be fixed together with the second
      one.  Otherwise, the buffer is rotated even when there is enough space
      on the end of the buffer.  Then the beginning of the buffer is rewritten
      and valid entries get corrupted.
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39b25109
    • P
      printk: add comment about tricky check for text buffer size · e8c42d36
      Petr Mladek 提交于
      There is no check for potential "text_len" overflow.  It is not needed
      because only valid level is detected.  It took me some time to
      understand why.  It would deserve a comment ;-)
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c42d36
    • P
      printk: remove obsolete check for log level "c" · c64730b2
      Petr Mladek 提交于
      The kernel log level "c" was removed in commit 61e99ab8 ("printk:
      remove the now unnecessary "C" annotation for KERN_CONT").  It is no
      longer detected in printk_get_level().  Hence we do not need to check it
      in vprintk_emit.
      Signed-off-by: NPetr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c64730b2
    • D
      kernel/resource.c: make reallocate_resource() static · 28ab49ff
      Daeseok Youn 提交于
      sparse says:
      
      kernel/resource.c:518:5: warning:
       symbol 'reallocate_resource' was not declared. Should it be static?
      Signed-off-by: NDaeseok Youn <daeseok.youn@gmail.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28ab49ff
    • P
      kernel: audit/fix non-modular users of module_init in core code · c96d6660
      Paul Gortmaker 提交于
      Code that is obj-y (always built-in) or dependent on a bool Kconfig
      (built-in or absent) can never be modular.  So using module_init as an
      alias for __initcall can be somewhat misleading.
      
      Fix these up now, so that we can relocate module_init from init.h into
      module.h in the future.  If we don't do this, we'd have to add module.h
      to obviously non-modular code, and that would be a worse thing.
      
      The audit targets the following module_init users for change:
       kernel/user.c                  obj-y
       kernel/kexec.c                 bool KEXEC (one instance per arch)
       kernel/profile.c               bool PROFILING
       kernel/hung_task.c             bool DETECT_HUNG_TASK
       kernel/sched/stats.c           bool SCHEDSTATS
       kernel/user_namespace.c        bool USER_NS
      
      Note that direct use of __initcall is discouraged, vs.  one of the
      priority categorized subgroups.  As __initcall gets mapped onto
      device_initcall, our use of subsys_initcall (which makes sense for these
      files) will thus change this registration from level 6-device to level
      4-subsys (i.e.  slightly earlier).  However no observable impact of that
      difference has been observed during testing.
      
      Also, two instances of missing ";" at EOL are fixed in kexec.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c96d6660
    • J
      fs, kernel: permit disabling the uselib syscall · 69369a70
      Josh Triplett 提交于
      uselib hasn't been used since libc5; glibc does not use it.  Support
      turning it off.
      
      When disabled, also omit the load_elf_library implementation from
      binfmt_elf.c, which only uselib invokes.
      
      bloat-o-meter:
      add/remove: 0/4 grow/shrink: 0/1 up/down: 0/-785 (-785)
      function                                     old     new   delta
      padzero                                       39      36      -3
      uselib_flags                                  20       -     -20
      sys_uselib                                   168       -    -168
      SyS_uselib                                   168       -    -168
      load_elf_library                             426       -    -426
      
      The new CONFIG_USELIB defaults to `y'.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69369a70