1. 13 1月, 2012 1 次提交
  2. 12 1月, 2012 1 次提交
  3. 11 1月, 2012 5 次提交
    • S
      user namespace: make signal.c respect user namespaces · 6b550f94
      Serge E. Hallyn 提交于
      ipc/mqueue.c: for __SI_MESQ, convert the uid being sent to recipient's
      user namespace. (new, thanks Oleg)
      
      __send_signal: convert current's uid to the recipient's user namespace
      for any siginfo which is not SI_FROMKERNEL (patch from Oleg, thanks
      again :)
      
      do_notify_parent and do_notify_parent_cldstop: map task's uid to parent's
      user namespace
      
      ptrace_signal maps parent's uid into current's user namespace before
      including in signal to current.  IIUC Oleg has argued that this shouldn't
      matter as the debugger will play with it, but it seems like not converting
      the value currently being set is misleading.
      
      Changelog:
      Sep 20: Inspired by Oleg's suggestion, define map_cred_ns() helper to
      	simplify callers and help make clear what we are translating
              (which uid into which namespace).  Passing the target task would
      	make callers even easier to read, but we pass in user_ns because
      	current_user_ns() != task_cred_xxx(current, user_ns).
      Sep 20: As recommended by Oleg, also put task_pid_vnr() under rcu_read_lock
      	in ptrace_signal().
      Sep 23: In send_signal(), detect when (user) signal is coming from an
      	ancestor or unrelated user namespace.  Pass that on to __send_signal,
      	which sets si_uid to 0 or overflowuid if needed.
      Oct 12: Base on Oleg's fixup_uid() patch.  On top of that, handle all
      	SI_FROMKERNEL cases at callers, because we can't assume sender is
      	current in those cases.
      Nov 10: (mhelsley) rename fixup_uid to more meaningful usern_fixup_signal_uid
      Nov 10: (akpm) make the !CONFIG_USER_NS case clearer
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Serge Hallyn <serge.hallyn@canonical.com>
      Subject: __send_signal: pass q->info, not info, to userns_fixup_signal_uid (v2)
      
      Eric Biederman pointed out that passing info is a bug and could lead to a
      NULL pointer deref to boot.
      
      A collection of signal, securebits, filecaps, cap_bounds, and a few other
      ltp tests passed with this kernel.
      
      Changelog:
          Nov 18: previous patch missed a leading '&'
      Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      From: Dan Carpenter <dan.carpenter@oracle.com>
      Subject: ipc/mqueue: lock() => unlock() typo
      
      There was a double lock typo introduced in b085f4bd6b21 "user namespace:
      make signal.c respect user namespaces"
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NSerge Hallyn <serge@hallyn.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b550f94
    • T
      workqueue: make alloc_workqueue() take printf fmt and args for name · b196be89
      Tejun Heo 提交于
      alloc_workqueue() currently expects the passed in @name pointer to remain
      accessible.  This is inconvenient and a bit silly given that the whole wq
      is being dynamically allocated.  This patch updates alloc_workqueue() and
      friends to take printf format string instead of opaque string and matching
      varargs at the end.  The name is allocated together with the wq and
      formatted.
      
      alloc_ordered_workqueue() is converted to a macro to unify varargs
      handling with alloc_workqueue(), and, while at it, add comment to
      alloc_workqueue().
      
      None of the current in-kernel users pass in string with '%' as constant
      name and this change shouldn't cause any problem.
      
      [akpm@linux-foundation.org: use __printf]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b196be89
    • M
      signal: add block_sigmask() for adding sigmask to current->blocked · 5e6292c0
      Matt Fleming 提交于
      Abstract the code sequence for adding a signal handler's sa_mask to
      current->blocked because the sequence is identical for all architectures.
      Furthermore, in the past some architectures actually got this code wrong,
      so introduce a wrapper that all architectures can use.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e6292c0
    • K
      tracepoint: add tracepoints for debugging oom_score_adj · 43d2b113
      KAMEZAWA Hiroyuki 提交于
      oom_score_adj is used for guarding processes from OOM-Killer.  One of
      problem is that it's inherited at fork().  When a daemon set oom_score_adj
      and make children, it's hard to know where the value is set.
      
      This patch adds some tracepoints useful for debugging. This patch adds
      3 trace points.
        - creating new task
        - renaming a task (exec)
        - set oom_score_adj
      
      To debug, users need to enable some trace pointer. Maybe filtering is useful as
      
      # EVENT=/sys/kernel/debug/tracing/events/task/
      # echo "oom_score_adj != 0" > $EVENT/task_newtask/filter
      # echo "oom_score_adj != 0" > $EVENT/task_rename/filter
      # echo 1 > $EVENT/enable
      # EVENT=/sys/kernel/debug/tracing/events/oom/
      # echo 1 > $EVENT/enable
      
      output will be like this.
      # grep oom /sys/kernel/debug/tracing/trace
      bash-7699  [007] d..3  5140.744510: oom_score_adj_update: pid=7699 comm=bash oom_score_adj=-1000
      bash-7699  [007] ...1  5151.818022: task_newtask: pid=7729 comm=bash clone_flags=1200011 oom_score_adj=-1000
      ls-7729  [003] ...2  5151.818504: task_rename: pid=7729 oldcomm=bash newcomm=ls oom_score_adj=-1000
      bash-7699  [002] ...1  5175.701468: task_newtask: pid=7730 comm=bash clone_flags=1200011 oom_score_adj=-1000
      grep-7730  [007] ...2  5175.701993: task_rename: pid=7730 oldcomm=bash newcomm=grep oom_score_adj=-1000
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43d2b113
    • S
      PM/Hibernate: do not count debug pages as savable · c6968e73
      Stanislaw Gruszka 提交于
      When debugging with CONFIG_DEBUG_PAGEALLOC and debug_guardpage_minorder >
      0, we have lot of free pages that are not marked so.  Snapshot code
      account them as savable, what cause hibernate memory preallocation
      failure.
      
      It is pretty hard to make hibernate allocation succeed with
      debug_guardpage_minorder=1.  This change at least make it possible when
      system has relatively big amount of RAM.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6968e73
  4. 10 1月, 2012 1 次提交
  5. 09 1月, 2012 1 次提交
  6. 07 1月, 2012 2 次提交
  7. 06 1月, 2012 1 次提交
    • L
      cgroup: fix to allow mounting a hierarchy by name · 0d19ea86
      Li Zefan 提交于
      If we mount a hierarchy with a specified name, the name is unique,
      and we can use it to mount the hierarchy without specifying its
      set of subsystem names. This feature is documented is
      Documentation/cgroups/cgroups.txt section 2.3
      
      Here's an example:
      
      	# mount -t cgroup -o cpuset,name=myhier xxx /cgroup1
      	# mount -t cgroup -o name=myhier xxx /cgroup2
      
      But it was broken by commit 32a8cf23
      (cgroup: make the mount options parsing more accurate)
      
      This fixes the regression.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      0d19ea86
  8. 05 1月, 2012 3 次提交
    • B
      PM / Hibernate: Implement compat_ioctl for /dev/snapshot · c336078b
      Ben Hutchings 提交于
      This allows uswsusp built for i386 to run on an x86_64 kernel (tested
      with Debian package version 1.0+20110509-2).
      
      References: http://bugs.debian.org/502816Signed-off-by: NBen Hutchings <ben@decadent.org.uk>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      c336078b
    • O
      ptrace: ensure JOBCTL_STOP_SIGMASK is not zero after detach · 8a88951b
      Oleg Nesterov 提交于
      This is the temporary simple fix for 3.2, we need more changes in this
      area.
      
      1. do_signal_stop() assumes that the running untraced thread in the
         stopped thread group is not possible. This was our goal but it is
         not yet achieved: a stopped-but-resumed tracee can clone the running
         thread which can initiate another group-stop.
      
         Remove WARN_ON_ONCE(!current->ptrace).
      
      2. A new thread always starts with ->jobctl = 0. If it is auto-attached
         and this group is stopped, __ptrace_unlink() sets JOBCTL_STOP_PENDING
         but JOBCTL_STOP_SIGMASK part is zero, this triggers WANR_ON(!signr)
         in do_jobctl_trap() if another debugger attaches.
      
         Change __ptrace_unlink() to set the artificial SIGSTOP for report.
      
         Alternatively we could change ptrace_init_task() to copy signr from
         current, but this means we can copy it for no reason and hide the
         possible similar problems.
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.1]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a88951b
    • O
      ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race · 50b8d257
      Oleg Nesterov 提交于
      Test-case:
      
      	int main(void)
      	{
      		int pid, status;
      
      		pid = fork();
      		if (!pid) {
      			for (;;) {
      				if (!fork())
      					return 0;
      				if (waitpid(-1, &status, 0) < 0) {
      					printf("ERR!! wait: %m\n");
      					return 0;
      				}
      			}
      		}
      
      		assert(ptrace(PTRACE_ATTACH, pid, 0,0) == 0);
      		assert(waitpid(-1, NULL, 0) == pid);
      
      		assert(ptrace(PTRACE_SETOPTIONS, pid, 0,
      					PTRACE_O_TRACEFORK) == 0);
      
      		do {
      			ptrace(PTRACE_CONT, pid, 0, 0);
      			pid = waitpid(-1, NULL, 0);
      		} while (pid > 0);
      
      		return 1;
      	}
      
      It fails because ->real_parent sees its child in EXIT_DEAD state
      while the tracer is going to change the state back to EXIT_ZOMBIE
      in wait_task_zombie().
      
      The offending commit is 823b018e which moved the EXIT_DEAD check,
      but in fact we should not blame it. The original code was not
      correct as well because it didn't take ptrace_reparented() into
      account and because we can't really trust ->ptrace.
      
      This patch adds the additional check to close this particular
      race but it doesn't solve the whole problem. We simply can't
      rely on ->ptrace in this case, it can be cleared if the tracer
      is multithreaded by the exiting ->parent.
      
      I think we should kill EXIT_DEAD altogether, we should always
      remove the soon-to-be-reaped child from ->children or at least
      we should never do the DEAD->ZOMBIE transition. But this is too
      complex for 3.2.
      Reported-and-tested-by: NDenys Vlasenko <vda.linux@googlemail.com>
      Tested-by: NLukasz Michalik <lmi@ift.uni.wroc.pl>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: <stable@kernel.org>		[3.0+]
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50b8d257
  9. 04 1月, 2012 11 次提交
  10. 02 1月, 2012 1 次提交
  11. 01 1月, 2012 1 次提交
    • H
      futex: Fix uninterruptible loop due to gate_area · e6780f72
      Hugh Dickins 提交于
      It was found (by Sasha) that if you use a futex located in the gate
      area we get stuck in an uninterruptible infinite loop, much like the
      ZERO_PAGE issue.
      
      While looking at this problem, PeterZ realized you'll get into similar
      trouble when hitting any install_special_pages() mapping.  And are there
      still drivers setting up their own special mmaps without page->mapping,
      and without special VM or pte flags to make get_user_pages fail?
      
      In most cases, if page->mapping is NULL, we do not need to retry at all:
      Linus points out that even /proc/sys/vm/drop_caches poses no problem,
      because it ends up using remove_mapping(), which takes care not to
      interfere when the page reference count is raised.
      
      But there is still one case which does need a retry: if memory pressure
      called shmem_writepage in between get_user_pages_fast dropping page
      table lock and our acquiring page lock, then the page gets switched from
      filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
      Fault it back in to get the page->mapping needed for key->shared.inode.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6780f72
  12. 31 12月, 2011 1 次提交
  13. 28 12月, 2011 4 次提交
  14. 27 12月, 2011 1 次提交
  15. 24 12月, 2011 2 次提交
  16. 23 12月, 2011 1 次提交
  17. 22 12月, 2011 3 次提交