1. 20 8月, 2010 8 次提交
  2. 18 8月, 2010 3 次提交
    • N
      fs: fs_struct rwlock to spinlock · 2a4419b5
      Nick Piggin 提交于
      fs: fs_struct rwlock to spinlock
      
      struct fs_struct.lock is an rwlock with the read-side used to protect root and
      pwd members while taking references to them. Taking a reference to a path
      typically requires just 2 atomic ops, so the critical section is very small.
      Parallel read-side operations would have cacheline contention on the lock, the
      dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
      real parallelism increase.
      
      Replace it with a spinlock to avoid one or two atomic operations in typical
      path lookup fastpath.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2a4419b5
    • D
      Fix unprotected access to task credentials in waitid() · f362b732
      Daniel J Blueman 提交于
      Using a program like the following:
      
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <sys/types.h>
      	#include <sys/wait.h>
      
      	int main() {
      		id_t id;
      		siginfo_t infop;
      		pid_t res;
      
      		id = fork();
      		if (id == 0) { sleep(1); exit(0); }
      		kill(id, SIGSTOP);
      		alarm(1);
      		waitid(P_PID, id, &infop, WCONTINUED);
      		return 0;
      	}
      
      to call waitid() on a stopped process results in access to the child task's
      credentials without the RCU read lock being held - which may be replaced in the
      meantime - eliciting the following warning:
      
      	===================================================
      	[ INFO: suspicious rcu_dereference_check() usage. ]
      	---------------------------------------------------
      	kernel/exit.c:1460 invoked rcu_dereference_check() without protection!
      
      	other info that might help us debug this:
      
      	rcu_scheduler_active = 1, debug_locks = 1
      	2 locks held by waitid02/22252:
      	 #0:  (tasklist_lock){.?.?..}, at: [<ffffffff81061ce5>] do_wait+0xc5/0x310
      	 #1:  (&(&sighand->siglock)->rlock){-.-...}, at: [<ffffffff810611da>]
      	wait_consider_task+0x19a/0xbe0
      
      	stack backtrace:
      	Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
      	Call Trace:
      	 [<ffffffff81095da4>] lockdep_rcu_dereference+0xa4/0xc0
      	 [<ffffffff81061b31>] wait_consider_task+0xaf1/0xbe0
      	 [<ffffffff81061d15>] do_wait+0xf5/0x310
      	 [<ffffffff810620b6>] sys_waitid+0x86/0x1f0
      	 [<ffffffff8105fce0>] ? child_wait_callback+0x0/0x70
      	 [<ffffffff81003282>] system_call_fastpath+0x16/0x1b
      
      This is fixed by holding the RCU read lock in wait_task_continued() to ensure
      that the task's current credentials aren't destroyed between us reading the
      cred pointer and us reading the UID from those credentials.
      
      Furthermore, protect wait_task_stopped() in the same way.
      
      We don't need to keep holding the RCU read lock once we've read the UID from
      the credentials as holding the RCU read lock doesn't stop the target task from
      changing its creds under us - so the credentials may be outdated immediately
      after we've read the pointer, lock or no lock.
      Signed-off-by: NDaniel J Blueman <daniel.blueman@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f362b732
    • D
      Make do_execve() take a const filename pointer · d7627467
      David Howells 提交于
      Make do_execve() take a const filename pointer so that kernel_execve() compiles
      correctly on ARM:
      
      arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type
      
      This also requires the argv and envp arguments to be consted twice, once for
      the pointer array and once for the strings the array points to.  This is
      because do_execve() passes a pointer to the filename (now const) to
      copy_strings_kernel().  A simpler alternative would be to cast the filename
      pointer in do_execve() when it's passed to copy_strings_kernel().
      
      do_execve() may not change any of the strings it is passed as part of the argv
      or envp lists as they are some of them in .rodata, so marking these strings as
      const should be fine.
      
      Further kernel_execve() and sys_execve() need to be changed to match.
      
      This has been test built on x86_64, frv, arm and mips.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7627467
  3. 17 8月, 2010 1 次提交
    • J
      kdb: fix compile error without CONFIG_KALLSYMS · b590cddf
      Jason Wessel 提交于
      If CONFIG_KGDB_KDB is set and CONFIG_KALLSYMS is not set the kernel
      will fail to build with the error:
      
      kernel/built-in.o: In function `kallsyms_symbol_next':
      kernel/debug/kdb/kdb_support.c:237: undefined reference to `kdb_walk_kallsyms'
      kernel/built-in.o: In function `kallsyms_symbol_complete':
      kernel/debug/kdb/kdb_support.c:193: undefined reference to `kdb_walk_kallsyms'
      
      The kdb_walk_kallsyms needs a #ifdef proper header to match the C
      implementation.  This patch also fixes the compiler warnings in
      kdb_support.c when compiling without CONFIG_KALLSYMS set.  The
      compiler warnings are a result of the kallsyms_lookup() macro not
      initializing the two of the pass by reference variables.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      Reported-by: NMichal Simek <monstr@monstr.eu>
      b590cddf
  4. 14 8月, 2010 2 次提交
    • M
      tracing: Sanitize value returned from write(trace_marker, "...", len) · 1aa54bca
      Marcin Slusarz 提交于
      When userspace code writes non-new-line-terminated string to trace_marker
      file, write handler appends new-line and returns number of bytes written
      to trace buffer, so
      write(fd, "abc", 3) will return 4
      
      That's unexpected and unfortunately it confuses glibc's fprintf function.
      
      Example:
      int main() {
        fprintf(stderr, "abc");
        return 0;
      }
      
      $ gcc test.c -o test
      $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
      $ ./test 2>/sys/kernel/debug/tracing/trace_marker
      
      results in infinite loop:
      write(fd, "abc", 3) = 4
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      write(fd, "", 1) = 0
      (...)
      
      ...and kernel trace buffer full of empty markers.
      
      Fix it by sanitizing write return value.
      Signed-off-by: NMarcin Slusarz <marcin.slusarz@gmail.com>
      LKML-Reference: <20100727231801.GB2826@joi.lan>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1aa54bca
    • J
      time: Workaround gcc loop optimization that causes 64bit div errors · c7dcf87a
      John Stultz 提交于
      Early 4.3 versions of gcc apparently aggressively optimize the raw
      time accumulation loop, replacing it with a divide.
      
      On 32bit systems, this causes the following link errors:
      	undefined reference to `__umoddi3'
      	undefined reference to `__udivdi3'
      
      The gcc issue has been fixed in 4.4 and greater.
      
      This patch replaces the accumulation loop with a do_div, as suggested
      by Linus.
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      CC: Jason Wessel <jason.wessel@windriver.com>
      CC: Larry Finger <Larry.Finger@lwfinger.net>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7dcf87a
  5. 13 8月, 2010 4 次提交
    • L
      Revert "fsnotify: store struct file not struct path" · 2069601b
      Linus Torvalds 提交于
      This reverts commit 3bcf3860 (and the
      accompanying commit c1e5c954 "vfs/fsnotify: fsnotify_close can delay
      the final work in fput" that was a horribly ugly hack to make it work at
      all).
      
      The 'struct file' approach not only causes that disgusting hack, it
      somehow breaks pulseaudio, probably due to some other subtlety with
      f_count handling.
      
      Fix up various conflicts due to later fsnotify work.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2069601b
    • S
      tracing/events: Convert format output to seq_file · 2a37a3df
      Steven Rostedt 提交于
      Two new events were added that broke the current format output.
      
      Both from the SCSI system: scsi_dispatch_cmd_done and scsi_dispatch_cmd_timeout
      
      The reason is that their print_fmt exceeded a page size. Since the output
      of the format used simple_read_from_buffer and trace_seq, it was limited
      to a page size in output.
      
      This patch converts the printing of the format of an event into seq_file,
      which allows greater than a page size to be shown.
      
      I diffed all event formats comparing the output with and without this
      patch. All matched except for the above two, which showed just:
      
        FORMAT TOO BIG
      
      without this patch, but now properly displays the output with this patch.
      
      v2: Remove updating *pos in seq start function.
         [ Thanks to Li Zefan for pointing that out ]
      Reviewed-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com>
      Cc: James Bottomley <James.Bottomley@suse.de>
      Cc: Tomohiro Kusumi <kusumi.tomohiro@jp.fujitsu.com>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2a37a3df
    • J
      timekeeping: Fix overflow in rawtime tv_nsec on 32 bit archs · deda2e81
      Jason Wessel 提交于
      The tv_nsec is a long and when added to the shifted interval it can wrap
      and become negative which later causes looping problems in the
      getrawmonotonic().  The edge case occurs when the system has slept for
      a short period of time of ~2 seconds.
      
      A trace printk of the values in this patch illustrate the problem:
      
      ftrace time stamp: log
      43.716079: logarithmic_accumulation: raw: 3d0913 tv_nsec d687faa
      43.718513: logarithmic_accumulation: raw: 3d0913 tv_nsec da588bd
      43.722161: logarithmic_accumulation: raw: 3d0913 tv_nsec de291d0
      46.349925: logarithmic_accumulation: raw: 7a122600 tv_nsec e1f9ae3b
      46.349930: logarithmic_accumulation: raw: 1e848980 tv_nsec 8831c0e3
      
      The kernel starts looping at 46.349925 in the getrawmonotonic() due to
      the negative value from adding the raw value to tv_nsec.
      
      A simple solution is to accumulate into a u64, and then normalize it
      to a timespec_t.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
       [ Reworked variable names and simplified some of the code. - John ]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      deda2e81
    • D
      Add a dummy printk function for the maintenance of unused printks · 12fdff3f
      David Howells 提交于
      Add a dummy printk function for the maintenance of unused printks through gcc
      format checking, and also so that side-effect checking is maintained too.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12fdff3f
  6. 12 8月, 2010 2 次提交
  7. 11 8月, 2010 18 次提交
  8. 10 8月, 2010 2 次提交