1. 27 9月, 2012 1 次提交
    • C
      procfs: Move /proc/pid/fd[info] handling code to fd.[ch] · faf60af1
      Cyrill Gorcunov 提交于
      This patch prepares the ground for further extension of
      /proc/pid/fd[info] handling code by moving fdinfo handling
      code into fs/proc/fd.c.
      
      I think such move makes both fs/proc/base.c and fs/proc/fd.c
      easier to read.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      CC: Al Viro <viro@ZenIV.linux.org.uk>
      CC: Alexey Dobriyan <adobriyan@gmail.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: James Bottomley <jbottomley@parallels.com>
      CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      CC: Alexey Dobriyan <adobriyan@gmail.com>
      CC: Matthew Helsley <matt.helsley@gmail.com>
      CC: "J. Bruce Fields" <bfields@fieldses.org>
      CC: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      faf60af1
  2. 18 9月, 2012 1 次提交
    • F
      fs/proc: fix potential unregister_sysctl_table hang · 6bf61045
      Francesco Ruggeri 提交于
      The unregister_sysctl_table() function hangs if all references to its
      ctl_table_header structure are not dropped.
      
      This can happen sometimes because of a leak in proc_sys_lookup():
      proc_sys_lookup() gets a reference to the table via lookup_entry(), but
      it does not release it when a subsequent call to sysctl_follow_link()
      fails.
      
      This patch fixes this leak by making sure the reference is always
      dropped on return.
      
      See also commit 076c3eed ("sysctl: Rewrite proc_sys_lookup
      introducing find_entry and lookup_entry") which reorganized this code in
      3.4.
      
      Tested in Linux 3.4.4.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bf61045
  3. 31 7月, 2012 2 次提交
    • D
      proc: do not allow negative offsets on /proc/<pid>/environ · bc452b4b
      Djalal Harouni 提交于
      __mem_open() which is called by both /proc/<pid>/environ and
      /proc/<pid>/mem ->open() handlers will allow the use of negative offsets.
      /proc/<pid>/mem has negative offsets but not /proc/<pid>/environ.
      
      Clean this by moving the 'force FMODE_UNSIGNED_OFFSET flag' to mem_open()
      to allow negative offsets only on /proc/<pid>/mem.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc452b4b
    • D
      proc: environ_read() make sure offset points to environment address range · e8905ec2
      Djalal Harouni 提交于
      Currently the following offset and environment address range check in
      environ_read() of /proc/<pid>/environ is buggy:
      
        int this_len = mm->env_end - (mm->env_start + src);
        if (this_len <= 0)
          break;
      
      Large or negative offsets on /proc/<pid>/environ converted to 'unsigned
      long' may pass this check since '(mm->env_start + src)' can overflow and
      'this_len' will be positive.
      
      This can turn /proc/<pid>/environ to act like /proc/<pid>/mem since
      (mm->env_start + src) will point and read from another VMA.
      
      There are two fixes here plus some code cleaning:
      
      1) Fix the overflow by checking if the offset that was converted to
         unsigned long will always point to the [mm->env_start, mm->env_end]
         address range.
      
      2) Remove the truncation that was made to the result of the check,
         storing the result in 'int this_len' will alter its value and we can
         not depend on it.
      
      For kernels that have commit b409e578 ("proc: clean up
      /proc/<pid>/environ handling") which adds the appropriate ptrace check and
      saves the 'mm' at ->open() time, this is not a security issue.
      
      This patch is taken from the grsecurity patch since it was just made
      available.
      Signed-off-by: NDjalal Harouni <tixxdz@opendz.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Brad Spengler <spender@grsecurity.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8905ec2
  4. 14 7月, 2012 5 次提交
  5. 11 7月, 2012 1 次提交
  6. 05 6月, 2012 1 次提交
    • L
      vfs: Fix /proc/<tid>/fdinfo/<fd> file handling · 0640113b
      Linus Torvalds 提交于
      Cyrill Gorcunov reports that I broke the fdinfo files with commit
      30a08bf2 ("proc: move fd symlink i_mode calculations into
      tid_fd_revalidate()"), and he's quite right.
      
      The tid_fd_revalidate() function is not just used for the <tid>/fd
      symlinks, it's also used for the <tid>/fdinfo/<fd> files, and the
      permission model for those are different.
      
      So do the dynamic symlink permission handling just for symlinks, making
      the fdinfo files once more appear as the proper regular files they are.
      
      Of course, Al Viro argued (probably correctly) that we shouldn't do the
      symlink permission games at all, and make the symlinks always just be
      the normal 'lrwxrwxrwx'.  That would have avoided this issue too, but
      since somebody noticed that the permissions had changed (which was the
      reason for that original commit 30a08bf2 in the first place), people
      do apparently use this feature.
      
      [ Basically, you can use the symlink permission data as a cheap "fdinfo"
        replacement, since you see whether the file is open for reading and/or
        writing by just looking at st_mode of the symlink.  So the feature
        does make sense, even if the pain it has caused means we probably
        shouldn't have done it to begin with. ]
      Reported-and-tested-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0640113b
  7. 01 6月, 2012 11 次提交
  8. 30 5月, 2012 2 次提交
  9. 19 5月, 2012 1 次提交
    • L
      proc: move fd symlink i_mode calculations into tid_fd_revalidate() · 30a08bf2
      Linus Torvalds 提交于
      Instead of doing the i_mode calculations at proc_fd_instantiate() time,
      move them into tid_fd_revalidate(), which is where the other inode state
      (notably uid/gid information) is updated too.
      
      Otherwise we'll end up with stale i_mode information if an fd is re-used
      while the dentry still hangs around.  Not that anything really *cares*
      (symlink permissions don't really matter), but Tetsuo Handa noticed that
      the owner read/write bits don't always match the state of the
      readability of the file descriptor, and we _used_ to get this right a
      long time ago in a galaxy far, far away.
      
      Besides, aside from fixing an ugly detail (that has apparently been this
      way since commit 61a28784: "proc: Remove the hard coded inode
      numbers" in 2006), this removes more lines of code than it adds.  And it
      just makes sense to update i_mode in the same place we update i_uid/gid.
      
      Al Viro correctly points out that we could just do the inode fill in the
      inode iops ->getattr() function instead.  However, that does require
      somewhat slightly more invasive changes, and adds yet *another* lookup
      of the file descriptor.  We need to do the revalidate() for other
      reasons anyway, and have the file descriptor handy, so we might as well
      fill in the information at this point.
      Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NEric Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30a08bf2
  10. 18 5月, 2012 1 次提交
    • C
      fs, proc: fix ABBA deadlock in case of execution attempt of map_files/ entries · eb94cd96
      Cyrill Gorcunov 提交于
      map_files/ entries are never supposed to be executed, still curious
      minds might try to run them, which leads to the following deadlock
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.4.0-rc4-24406-g841e6a6 #121 Not tainted
        -------------------------------------------------------
        bash/1556 is trying to acquire lock:
         (&sb->s_type->i_mutex_key#8){+.+.+.}, at: do_lookup+0x267/0x2b1
      
        but task is already holding lock:
         (&sig->cred_guard_mutex){+.+.+.}, at: prepare_bprm_creds+0x2d/0x69
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&sig->cred_guard_mutex){+.+.+.}:
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_killable_nested+0x40/0x45
               lock_trace+0x24/0x59
               proc_map_files_lookup+0x5a/0x165
               __lookup_hash+0x52/0x73
               do_lookup+0x276/0x2b1
               walk_component+0x3d/0x114
               do_last+0xfc/0x540
               path_openat+0xd3/0x306
               do_filp_open+0x3d/0x89
               do_sys_open+0x74/0x106
               sys_open+0x21/0x23
               tracesys+0xdd/0xe2
      
        -> #0 (&sb->s_type->i_mutex_key#8){+.+.+.}:
               check_prev_add+0x6a/0x1ef
               validate_chain+0x444/0x4f4
               __lock_acquire+0x387/0x3f8
               lock_acquire+0x12b/0x158
               __mutex_lock_common+0x56/0x3a9
               mutex_lock_nested+0x40/0x45
               do_lookup+0x267/0x2b1
               walk_component+0x3d/0x114
               link_path_walk+0x1f9/0x48f
               path_openat+0xb6/0x306
               do_filp_open+0x3d/0x89
               open_exec+0x25/0xa0
               do_execve_common+0xea/0x2f9
               do_execve+0x43/0x45
               sys_execve+0x43/0x5a
               stub_execve+0x6c/0xc0
      
      This is because prepare_bprm_creds grabs task->signal->cred_guard_mutex
      and when do_lookup happens we try to grab task->signal->cred_guard_mutex
      again in lock_trace.
      
      Fix it using plain ptrace_may_access() helper in proc_map_files_lookup()
      and in proc_map_files_readdir() instead of lock_trace(), the caller must
      be CAP_SYS_ADMIN granted anyway.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Vasiliy Kulikov <segoon@openwall.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb94cd96
  11. 16 5月, 2012 2 次提交
  12. 11 5月, 2012 1 次提交
  13. 06 5月, 2012 1 次提交
  14. 03 5月, 2012 1 次提交
  15. 26 4月, 2012 2 次提交
    • E
      userns: Rework the user_namespace adding uid/gid mapping support · 22d917d8
      Eric W. Biederman 提交于
      - Convert the old uid mapping functions into compatibility wrappers
      - Add a uid/gid mapping layer from user space uid and gids to kernel
        internal uids and gids that is extent based for simplicty and speed.
        * Working with number space after mapping uids/gids into their kernel
          internal version adds only mapping complexity over what we have today,
          leaving the kernel code easy to understand and test.
      - Add proc files /proc/self/uid_map /proc/self/gid_map
        These files display the mapping and allow a mapping to be added
        if a mapping does not exist.
      - Allow entering the user namespace without a uid or gid mapping.
        Since we are starting with an existing user our uids and gids
        still have global mappings so are still valid and useful they just don't
        have local mappings.  The requirement for things to work are global uid
        and gid so it is odd but perfectly fine not to have a local uid
        and gid mapping.
        Not requiring global uid and gid mappings greatly simplifies
        the logic of setting up the uid and gid mappings by allowing
        the mappings to be set after the namespace is created which makes the
        slight weirdness worth it.
      - Make the mappings in the initial user namespace to the global
        uid/gid space explicit.  Today it is an identity mapping
        but in the future we may want to twist this for debugging, similar
        to what we do with jiffies.
      - Document the memory ordering requirements of setting the uid and
        gid mappings.  We only allow the mappings to be set once
        and there are no pointers involved so the requirments are
        trivial but a little atypical.
      
      Performance:
      
      In this scheme for the permission checks the performance is expected to
      stay the same as the actuall machine instructions should remain the same.
      
      The worst case I could think of is ls -l on a large directory where
      all of the stat results need to be translated with from kuids and
      kgids to uids and gids.  So I benchmarked that case on my laptop
      with a dual core hyperthread Intel i5-2520M cpu with 3M of cpu cache.
      
      My benchmark consisted of going to single user mode where nothing else
      was running. On an ext4 filesystem opening 1,000,000 files and looping
      through all of the files 1000 times and calling fstat on the
      individuals files.  This was to ensure I was benchmarking stat times
      where the inodes were in the kernels cache, but the inode values were
      not in the processors cache.  My results:
      
      v3.4-rc1:         ~= 156ns (unmodified v3.4-rc1 with user namespace support disabled)
      v3.4-rc1-userns-: ~= 155ns (v3.4-rc1 with my user namespace patches and user namespace support disabled)
      v3.4-rc1-userns+: ~= 164ns (v3.4-rc1 with my user namespace patches and user namespace support enabled)
      
      All of the configurations ran in roughly 120ns when I performed tests
      that ran in the cpu cache.
      
      So in summary the performance impact is:
      1ns improvement in the worst case with user namespace support compiled out.
      8ns aka 5% slowdown in the worst case with user namespace support compiled in.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      22d917d8
    • W
      revert "proc: clear_refs: do not clear reserved pages" · 63f61a6f
      Will Deacon 提交于
      Revert commit 85e72aa5 ("proc: clear_refs: do not clear reserved
      pages"), which was a quick fix suitable for -stable until ARM had been
      moved over to the gate_vma mechanism:
      
      https://lkml.org/lkml/2012/1/14/55
      
      With commit f9d4861f ("ARM: 7294/1: vectors: use gate_vma for vectors user
      mapping"), ARM does now use the gate_vma, so the PageReserved check can be
      removed from the proc code.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Acked-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63f61a6f
  16. 06 4月, 2012 1 次提交
  17. 30 3月, 2012 2 次提交
  18. 29 3月, 2012 4 次提交