1. 24 1月, 2014 4 次提交
    • O
      proc: fix ->f_pos overflows in first_tid() · 9f6e963f
      Oleg Nesterov 提交于
      1. proc_task_readdir()->first_tid() path truncates f_pos to int, this
         is wrong even on 64bit.
      
         We could check that f_pos < PID_MAX or even INT_MAX in
         proc_task_readdir(), but this patch simply checks the potential
         overflow in first_tid(), this check is nop on 64bit.  We do not care if
         it was negative and the new unsigned value is huge, all we need to
         ensure is that we never wrongly return !NULL.
      
      2. Remove the 2nd "nr != 0" check before get_nr_threads(),
         nr_threads == 0 is not distinguishable from !pid_task() above.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Sameer Nanda <snanda@chromium.org>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f6e963f
    • O
      proc: don't (ab)use ->group_leader in proc_task_readdir() paths · d855a4b7
      Oleg Nesterov 提交于
      proc_task_readdir() does not really need "leader", first_tid() has to
      revalidate it anyway.  Just pass proc_pid(inode) to first_tid() instead,
      it can do pid_task(PIDTYPE_PID) itself and read ->group_leader only if
      necessary.
      
      The patch also extracts the "inode is dead" code from
      pid_delete_dentry(dentry) into the new trivial helper,
      proc_inode_is_dead(inode), proc_task_readdir() uses it to return -ENOENT
      if this dir was removed.
      
      This is a bit racy, but the race is very inlikely and the getdents() after
      openndir() can see the empty "." + ".." dir only once.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Sameer Nanda <snanda@chromium.org>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d855a4b7
    • O
      proc: change first_tid() to use while_each_thread() rather than next_thread() · c986c14a
      Oleg Nesterov 提交于
      Rerwrite the main loop to use while_each_thread() instead of
      next_thread().  We are going to fix or replace while_each_thread(),
      next_thread() should be avoided whenever possible.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Sameer Nanda <snanda@chromium.org>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c986c14a
    • O
      proc: fix the potential use-after-free in first_tid() · 940fe479
      Oleg Nesterov 提交于
      proc_task_readdir() verifies that the result of get_proc_task() is
      pid_alive() and thus its ->group_leader is fine too.  However this is not
      necessarily true after rcu_read_unlock(), we need to recheck this again
      after first_tid() does rcu_read_lock().  Otherwise
      leader->thread_group.next (used by next_thread()) can be invalid if the
      rcu grace period expires in between.
      
      The race is subtle and unlikely, but still it is possible afaics.  To
      simplify lets ignore the "likely" case when tid != 0, f_version can be
      cleared by proc_task_operations->llseek().
      
      Suppose we have a main thread M and its subthread T.  Suppose that f_pos
      == 3, iow first_tid() should return T.  Now suppose that the following
      happens between rcu_read_unlock() and rcu_read_lock():
      
      	1. T execs and becomes the new leader. This removes M from
      	    ->thread_group but next_thread(M) is still T.
      
      	2. T creates another thread X which does exec as well, T
      	   goes away.
      
      	3. X creates another subthread, this increments nr_threads.
      
      	4. first_tid() does next_thread(M) and returns the already
      	   dead T.
      
      Note also that we need 2.  and 3.  only because of get_nr_threads() check,
      and this check was supposed to be optimization only.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Sameer Nanda <snanda@chromium.org>
      Cc: Sergey Dyasly <dserrg@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      940fe479
  2. 06 11月, 2013 1 次提交
  3. 29 6月, 2013 5 次提交
  4. 28 5月, 2013 1 次提交
  5. 02 5月, 2013 2 次提交
  6. 01 5月, 2013 1 次提交
  7. 18 4月, 2013 2 次提交
    • P
      posix-timers: Show sigevent info in proc file · 57b8015e
      Pavel Emelyanov 提交于
      Previous patch added proc file to list posix timers created by task.
      Expand the information provided in this file by adding info about
      notification method, with which timers were created. I.e. after
      the "ID:" line there go
      
      1. "signal:" line, that shows signal number and sigval bits;
      2. "notify:" line, that shows the timer notification method.
      
      Thus the timer entry would looke like this:
      
      ID: 123
      signal: 14/0000000000b005d0
      notify: signal/pid.732
      
      This information is enough to understand how timer_create() was called
      for each particular timer.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Link: http://lkml.kernel.org/r/513DA024.80404@parallels.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      57b8015e
    • P
      posix-timers: Introduce /proc/PID/timers file · 48f6a7a5
      Pavel Emelyanov 提交于
      Currently kernel doesn't provide any API for getting info about what
      posix timers are configured by processes. It's implied, that a process
      which configured some timers, knows what it did. However, for external
      tools it's impossible to get this information. In particular, this is
      critical for checkpoint-restore project to have this info.
      
      Introduce a per-pid proc file with information about posix
      timers. Since these timers are shared between threads, this file is
      present on tgid level only, no such thing in tid subdirs.
      
      The file format is expected to be the "/proc/<pid>/smaps"-like,
      i.e. each timer will occupy seveal lines to allow for future
      extending.
      
      Each new timer entry starts with the
      
      ID: <number>
      
      line which is added by this patch.
      Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Matthew Helsley <matt.helsley@gmail.com>
      Link: http://lkml.kernel.org/r/513DA00D.6070009@parallels.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      48f6a7a5
  8. 10 4月, 2013 1 次提交
  9. 28 2月, 2013 1 次提交
  10. 26 2月, 2013 2 次提交
  11. 23 2月, 2013 1 次提交
  12. 21 12月, 2012 1 次提交
  13. 12 12月, 2012 1 次提交
  14. 11 12月, 2012 1 次提交
  15. 27 11月, 2012 1 次提交
  16. 19 11月, 2012 3 次提交
    • E
      pidns: Make the pidns proc mount/umount logic obvious. · 0a01f2cc
      Eric W. Biederman 提交于
      Track the number of pids in the proc hash table.  When the number of
      pids goes to 0 schedule work to unmount the kernel mount of proc.
      
      Move the mount of proc into alloc_pid when we allocate the pid for
      init.
      
      Remove the surprising calls of pid_ns_release proc in fork and
      proc_flush_task.  Those code paths really shouldn't know about proc
      namespace implementation details and people have demonstrated several
      times that finding and understanding those code paths is difficult and
      non-obvious.
      
      Because of the call path detach pid is alwasy called with the
      rtnl_lock held free_pid is not allowed to sleep, so the work to
      unmounting proc is moved to a work queue.  This has the side benefit
      of not blocking the entire world waiting for the unnecessary
      rcu_barrier in deactivate_locked_super.
      
      In the process of making the code clear and obvious this fixes a bug
      reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
      mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
      succeeded and copy_net_ns failed.
      Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0a01f2cc
    • E
      procfs: Don't cache a pid in the root inode. · ae06c7c8
      Eric W. Biederman 提交于
      Now that we have s_fs_info pointing to our pid namespace
      the original reason for the proc root inode having a struct
      pid is gone.
      
      Caching a pid in the root inode has led to some complicated
      code.  Now that we don't need the struct pid, just remove it.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      ae06c7c8
    • E
      procfs: Use the proc generic infrastructure for proc/self. · e656d8a6
      Eric W. Biederman 提交于
      I had visions at one point of splitting proc into two filesystems.  If
      that had happened proc/self being the the part of proc that actually deals
      with pids would have been a nice cleanup.  As it is proc/self requires
      a lot of unnecessary infrastructure for a single file.
      
      The only user visible change is that a mounted /proc for a pid namespace
      that is dead now shows a broken proc symlink, instead of being completely
      invisible.  I don't think anyone will notice or care.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e656d8a6
  17. 17 11月, 2012 1 次提交
  18. 30 10月, 2012 1 次提交
    • M
      sched/autogroup: Fix crash on reboot when autogroup is disabled · 5258f386
      Mike Galbraith 提交于
      Due to these two commits:
      
        8323f26c sched: Fix race in task_group()
        800d4d30 sched, autogroup: Stop going ahead if autogroup is disabled
      
      ... autogroup scheduling's dynamic knobs are wrecked.
      
      With both patches applied, all you have to do to crash a box is
      disable autogroup during boot up, then reboot.. boom, NULL pointer
      dereference due to 800d4d30 not allowing autogroup to move things,
      and 8323f26c making that the only way to switch runqueues.
      
      Remove most of the (dysfunctional) knobs and turn the remaining
      sched_autogroup_enabled knob readonly.
      
      If the user fiddles with cgroups hereafter, once tasks
      are moved, autogroup won't mess with them again unless
      they call setsid().
      
      No knobs, no glitz, nada, just a cute little thing folks can
      turn on if they don't want to muck about with cgroups and/or
      systemd.
      Signed-off-by: NMike Galbraith <efault@gmx.de>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiaotian Feng <dannyfeng@tencent.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org> # v3.6
      Link: http://lkml.kernel.org/r/1351451963.4999.8.camel@maggy.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5258f386
  19. 13 10月, 2012 1 次提交
  20. 09 10月, 2012 1 次提交
  21. 27 9月, 2012 3 次提交
  22. 18 9月, 2012 2 次提交
    • E
      userns: Add kprojid_t and associated infrastructure in projid.h · f76d207a
      Eric W. Biederman 提交于
      Implement kprojid_t a cousin of the kuid_t and kgid_t.
      
      The per user namespace mapping of project id values can be set with
      /proc/<pid>/projid_map.
      
      A full compliment of helpers is provided: make_kprojid, from_kprojid,
      from_kprojid_munged, kporjid_has_mapping, projid_valid, projid_eq,
      projid_eq, projid_lt.
      
      Project identifiers are part of the generic disk quota interface,
      although it appears only xfs implements project identifiers currently.
      
      The xfs code allows anyone who has permission to set the project
      identifier on a file to use any project identifier so when
      setting up the user namespace project identifier mappings I do
      not require a capability.
      
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jan Kara <jack@suse.cz>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f76d207a
    • E
      userns: Convert the audit loginuid to be a kuid · e1760bd5
      Eric W. Biederman 提交于
      Always store audit loginuids in type kuid_t.
      
      Print loginuids by converting them into uids in the appropriate user
      namespace, and then printing the resulting uid.
      
      Modify audit_get_loginuid to return a kuid_t.
      
      Modify audit_set_loginuid to take a kuid_t.
      
      Modify /proc/<pid>/loginuid on read to convert the loginuid into the
      user namespace of the opener of the file.
      
      Modify /proc/<pid>/loginud on write to convert the loginuid
      rom the user namespace of the opener of the file.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Paul Moore <paul@paul-moore.com> ?
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e1760bd5
  23. 31 7月, 2012 2 次提交
    • D
      proc: do not allow negative offsets on /proc/<pid>/environ · bc452b4b
      Djalal Harouni 提交于
      __mem_open() which is called by both /proc/<pid>/environ and
      /proc/<pid>/mem ->open() handlers will allow the use of negative offsets.
      /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ.
      
      Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
      to allow negative offsets only on /proc/<pid>/mem.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc452b4b
    • D
      proc: environ_read() make sure offset points to environment address range · e8905ec2
      Djalal Harouni 提交于
      Currently the following offset and environment address range check in
      environ_read() of /proc/<pid>/environ is buggy:
      
        int this_len = mm->env_end - (mm->env_start + src);
        if (this_len <= 0)
          break;
      
      Large or negative offsets on /proc/<pid>/environ converted to 'unsigned
      long' may pass this check since '(mm->env_start + src)' can overflow and
      'this_len' will be positive.
      
      This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since
      (mm->env_start + src) will point and read from another VMA.
      
      There are two fixes here plus some code cleaning:
      
      1) Fix the overflow by checking if the offset that was converted to
         unsigned long will always point to the [mm->env_start, mm->env_end]
         address range.
      
      2) Remove the truncation that was made to the result of the check,
         storing the result in 'int this_len' will alter its value and we can
         not depend on it.
      
      For kernels that have commit b409e578 ("proc: clean up
      /proc/<pid>/environ handling") which adds the appropriate ptrace check and
      saves the 'mm' at ->open() time, this is not a security issue.
      
      This patch is taken from the grsecurity patch since it was just made
      available.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8905ec2
  24. 14 7月, 2012 1 次提交