1. 22 5月, 2014 6 次提交
  2. 07 5月, 2014 7 次提交
  3. 24 4月, 2014 1 次提交
  4. 19 4月, 2014 1 次提交
  5. 17 4月, 2014 4 次提交
  6. 16 4月, 2014 3 次提交
  7. 15 4月, 2014 2 次提交
    • M
      user namespace: fix incorrect memory barriers · e79323bd
      Mikulas Patocka 提交于
      smp_read_barrier_depends() can be used if there is data dependency between
      the readers - i.e. if the read operation after the barrier uses address
      that was obtained from the read operation before the barrier.
      
      In this file, there is only control dependency, no data dependecy, so the
      use of smp_read_barrier_depends() is incorrect. The code could fail in the
      following way:
      * the cpu predicts that idx < entries is true and starts executing the
        body of the for loop
      * the cpu fetches map->extent[0].first and map->extent[0].count
      * the cpu fetches map->nr_extents
      * the cpu verifies that idx < extents is true, so it commits the
        instructions in the body of the for loop
      
      The problem is that in this scenario, the cpu read map->extent[0].first
      and map->nr_extents in the wrong order. We need a full read memory barrier
      to prevent it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e79323bd
    • D
      seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF · 2eac7648
      Daniel Borkmann 提交于
      Linus reports that on 32-bit x86 Chromium throws the following seccomp
      resp. audit log messages:
      
        audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
      syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
      
        audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
      compat=0 ip=0xb2dd9852 code=0x50000
      
      These audit messages are being triggered via audit_seccomp() through
      __secure_computing() in seccomp mode (BPF) filter with seccomp return
      codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
      during filter runtime. Moreover, Linus reports that x86_64 Chromium
      seems fine.
      
      The underlying issue that explains this is that the implementation of
      populate_seccomp_data() is wrong. Our seccomp data structure sd that
      is being shared with user ABI is:
      
        struct seccomp_data {
          int nr;
          __u32 arch;
          __u64 instruction_pointer;
          __u64 args[6];
        };
      
      Therefore, a simple cast to 'unsigned long *' for storing the value of
      the syscall argument via syscall_get_arguments() is just wrong as on
      32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
      at wrong offsets in args[] member, and thus i) could leak stack memory
      to user space and ii) tampers with the logic of seccomp BPF programs
      that read out and check for syscall arguments:
      
        syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
      
      Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
      test machine through slow ssh X forwarding, but it fixes the issue on
      my side. So fix it up by storing args in type correct variables, gcc
      is clever and optimizes the copy away in other cases, e.g. x86_64.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Reported-and-bisected-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eac7648
  8. 13 4月, 2014 1 次提交
  9. 12 4月, 2014 1 次提交
  10. 11 4月, 2014 3 次提交
  11. 10 4月, 2014 1 次提交
  12. 09 4月, 2014 4 次提交
    • L
      futex: avoid race between requeue and wake · 69cd9eba
      Linus Torvalds 提交于
      Jan Stancek reported:
       "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
        occasionally fails, because some threads fail to wake up.
      
        Testcase creates 5 threads, which are all waiting on same condition.
        Main thread then calls pthread_cond_broadcast() without holding mutex,
        which calls:
      
            futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)
      
        This immediately wakes up single thread A, which unlocks mutex and
        tries to wake up another thread:
      
            futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
      
        If thread A manages to call futex_wake() before any waiters are
        requeued for uaddr2, no other thread is woken up"
      
      The ordering constraints for the hash bucket waiter counting are that
      the waiter counts have to be incremented _before_ getting the spinlock
      (because the spinlock acts as part of the memory barrier), but the
      "requeue" operation didn't honor those rules, and nobody had even
      thought about that case.
      
      This fairly simple patch just increments the waiter count for the target
      hash bucket (hb2) when requeing a futex before taking the locks.  It
      then decrements them again after releasing the lock - the code that
      actually moves the futex(es) between hash buckets will do the additional
      required waiter count housekeeping.
      Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 3.14
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69cd9eba
    • M
      tracepoint: Fix sparse warnings in tracepoint.c · b725dfea
      Mathieu Desnoyers 提交于
      Fix the following sparse warnings:
      
        CHECK   kernel/tracepoint.c
      kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:184:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:184:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:216:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:216:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:392:24: error: return expression in void function
        CC      kernel/tracepoint.o
      kernel/tracepoint.c: In function tracepoint_module_going:
      kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
      kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b725dfea
    • S
      tracepoint: Simplify tracepoint module search · eb7d035c
      Steven Rostedt (Red Hat) 提交于
      Instead of copying the num_tracepoints and tracepoints_ptrs from
      the module structure to the tp_mod structure, which only uses it to
      find the module associated to tracepoints of modules that are coming
      and going, simply copy the pointer to the module struct to the tracepoint
      tp_module structure.
      
      Also removed un-needed brackets around an if statement.
      
      Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.homeAcked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb7d035c
    • M
      tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints · de7b2973
      Mathieu Desnoyers 提交于
      Register/unregister tracepoint probes with struct tracepoint pointer
      rather than tracepoint name.
      
      This change, which vastly simplifies tracepoint.c, has been proposed by
      Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
      size.
      
      From this point on, the tracers need to pass a struct tracepoint pointer
      to probe register/unregister. A probe can now only be connected to a
      tracepoint that exists. Moreover, tracers are responsible for
      unregistering the probe before the module containing its associated
      tracepoint is unloaded.
      
         text    data     bss     dec     hex filename
      10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
      10434930        4282848 10391552        25109330        17f2352 vmlinux
      
      Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com
      
      CC: Ingo Molnar <mingo@kernel.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Frank Ch. Eigler <fche@redhat.com>
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [ SDR - fixed return val in void func in tracepoint_module_going() ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de7b2973
  13. 08 4月, 2014 6 次提交
    • J
      lglock: map to spinlock when !CONFIG_SMP · 64b47e8f
      Josh Triplett 提交于
      When the system has only one CPU, lglock is effectively a spinlock; map
      it directly to spinlock to eliminate the indirection and duplicate code.
      
      In addition to removing overhead, this drops 1.6k of code with a
      defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: David Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64b47e8f
    • C
      modules: use raw_cpu_write for initialization of per cpu refcount. · 08f141d3
      Christoph Lameter 提交于
      The initialization of a structure is not subject to synchronization.
      The use of __this_cpu would trigger a false positive with the additional
      preemption checks for __this_cpu ops.
      
      So simply disable the check through the use of raw_cpu ops.
      
      Trace:
      
        __this_cpu_write operation in preemptible [00000000] code: modprobe/286
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 3 PID: 286 Comm: modprobe Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          dump_stack+0x4e/0x82
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          load_module+0xcfd/0x2650
          SyS_init_module+0xa6/0xd0
          tracesys+0xe1/0xe6
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08f141d3
    • G
      kernel: use macros from compiler.h instead of __attribute__((...)) · 52f5684c
      Gideon Israel Dsouza 提交于
      To increase compiler portability there is <linux/compiler.h> which
      provides convenience macros for various gcc constructs.  Eg: __weak for
      __attribute__((weak)).  I've replaced all instances of gcc attributes
      with the right macro in the kernel subsystem.
      Signed-off-by: NGideon Israel Dsouza <gidisrael@gmail.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52f5684c
    • F
      kernel/panic.c: display reason at end + pr_emerg · d7c0847f
      Fabian Frederick 提交于
      Currently, booting without initrd specified on 80x25 screen gives a call
      trace followed by atkbd : Spurious ACK.  Original message ("VFS: Unable
      to mount root fs") is not available.  Of course this could happen in
      other situations...
      
      This patch displays panic reason after call trace which could help lot
      of people even if it's not the very last line on screen.
      
      Also, convert all panic.c printk(KERN_EMERG to pr_emerg(
      
      [akpm@linux-foundation.org: missed a couple of pr_ conversions]
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7c0847f
    • L
      hung_task: check the value of "sysctl_hung_task_timeout_sec" · 80df2847
      Liu Hua 提交于
      As sysctl_hung_task_timeout_sec is unsigned long, when this value is
      larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
      watchdog will return immediately without sleep and with print :
      
        schedule_timeout: wrong timeout value ffffffffffffff83
      
      and then the funtion watchdog will call schedule_timeout_interruptible
      again and again.  The screen will be filled with
      
      	"schedule_timeout: wrong timeout value ffffffffffffff83"
      
      This patch does some check and correction in sysctl, to let the function
      schedule_timeout_interruptible allways get the valid parameter.
      Signed-off-by: NLiu Hua <sdu.liu@huawei.com>
      Tested-by: NSatoru Takeuchi <satoru.takeuchi@gmail.com>
      Cc: <stable@vger.kernel.org>	[3.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80df2847
    • O
      wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process · 7c733eb3
      Oleg Nesterov 提交于
      Even if the main thread is dead the process still can stop/continue.
      However, if the leader is ptraced wait_consider_task(ptrace => false)
      always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
      WCONTINUED can never work for the natural parent in this case.
      
      Move the "A zombie ptracee is only visible to its ptracer" check into the
      "if (!delay_group_leader(p))" block.  ->notask_error is cleared by the
      "fall through" code below.
      
      This depends on the previous change, wait_task_stopped/continued must be
      avoided if !delay_group_leader() and the tracer is ->real_parent.
      Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
      is already dead (single-threaded or not).  If it is traced by another task
      then the "stopped" state is fine until the debugger detaches and reveals a
      zombie state.
      
      Stupid test-case:
      
      	void *tfunc(void *arg)
      	{
      		sleep(1);	// wait for zombie leader
      		raise(SIGSTOP);
      		exit(0x13);
      		return NULL;
      	}
      
      	int run_child(void)
      	{
      		pthread_t thread;
      
      		if (!fork()) {
      			int tracee = getppid();
      
      			assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
      			do
      				ptrace(PTRACE_CONT, tracee, 0,0);
      			while (wait(NULL) > 0);
      
      			return 0;
      		}
      
      		sleep(1);	// wait for PTRACE_ATTACH
      		assert(pthread_create(&thread, NULL, tfunc, NULL) == 0);
      		pthread_exit(NULL);
      	}
      
      	int main(void)
      	{
      		int child, stat;
      
      		child = fork();
      		if (!child)
      			return run_child();
      
      		assert(child == waitpid(-1, &stat, WSTOPPED));
      		assert(stat == 0x137f);
      
      		kill(child, SIGCONT);
      
      		assert(child == waitpid(-1, &stat, WCONTINUED));
      		assert(stat == 0xffff);
      
      		assert(child == waitpid(-1, &stat, 0));
      		assert(stat == 0x1300);
      
      		return 0;
      	}
      
      Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
      never called.
      
      Note: this doesn't fix all problems with a zombie delay_group_leader(),
      WCONTINUED | WEXITED check is not exactly right.  debugger can't assume it
      will be notified if another thread reaps the whole thread group.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Lennart Poettering <lpoetter@redhat.com>
      Cc: Michal Schmidt <mschmidt@redhat.com>
      Cc: Roland McGrath <roland@hack.frob.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c733eb3