1. 02 3月, 2017 1 次提交
  2. 25 2月, 2017 1 次提交
  3. 25 12月, 2016 1 次提交
  4. 13 12月, 2016 1 次提交
  5. 24 6月, 2016 2 次提交
  6. 11 6月, 2016 1 次提交
    • J
      proc: prevent stacking filesystems on top · e54ad7f1
      Jann Horn 提交于
      This prevents stacking filesystems (ecryptfs and overlayfs) from using
      procfs as lower filesystem.  There is too much magic going on inside
      procfs, and there is no good reason to stack stuff on top of procfs.
      
      (For example, procfs does access checks in VFS open handlers, and
      ecryptfs by design calls open handlers from a kernel thread that doesn't
      drop privileges or so.)
      Signed-off-by: NJann Horn <jannh@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e54ad7f1
  7. 03 5月, 2016 1 次提交
  8. 10 7月, 2015 1 次提交
    • E
      vfs: Commit to never having exectuables on proc and sysfs. · 90f8572b
      Eric W. Biederman 提交于
      Today proc and sysfs do not contain any executable files.  Several
      applications today mount proc or sysfs without noexec and nosuid and
      then depend on there being no exectuables files on proc or sysfs.
      Having any executable files show on proc or sysfs would cause
      a user space visible regression, and most likely security problems.
      
      Therefore commit to never allowing executables on proc and sysfs by
      adding a new flag to mark them as filesystems without executables and
      enforce that flag.
      
      Test the flag where MNT_NOEXEC is tested today, so that the only user
      visible effect will be that exectuables will be treated as if the
      execute bit is cleared.
      
      The filesystems proc and sysfs do not currently incoporate any
      executable files so this does not result in any user visible effects.
      
      This makes it unnecessary to vet changes to proc and sysfs tightly for
      adding exectuable files or changes to chattr that would modify
      existing files, as no matter what the individual file say they will
      not be treated as exectuable files by the vfs.
      
      Not having to vet changes to closely is important as without this we
      are only one proc_create call (or another goof up in the
      implementation of notify_change) from having problematic executables
      on proc.  Those mistakes are all too easy to make and would create
      a situation where there are security issues or the assumptions of
      some program having to be broken (and cause userspace regressions).
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      90f8572b
  9. 01 7月, 2015 1 次提交
  10. 14 5月, 2015 1 次提交
    • E
      mnt: Refactor the logic for mounting sysfs and proc in a user namespace · 1b852bce
      Eric W. Biederman 提交于
      Fresh mounts of proc and sysfs are a very special case that works very
      much like a bind mount.  Unfortunately the current structure can not
      preserve the MNT_LOCK... mount flags.  Therefore refactor the logic
      into a form that can be modified to preserve those lock bits.
      
      Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
      of the filesystem be fully visible in the current mount namespace,
      before the filesystem may be mounted.
      
      Move the logic for calling fs_fully_visible from proc and sysfs into
      fs/namespace.c where it has greater access to mount namespace state.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      1b852bce
  11. 16 4月, 2015 1 次提交
  12. 11 12月, 2014 1 次提交
    • N
      fs/proc: use a rb tree for the directory entries · 710585d4
      Nicolas Dichtel 提交于
      When a lot of netdevices are created, one of the bottleneck is the
      creation of proc entries.  This serie aims to accelerate this part.
      
      The current implementation for the directories in /proc is using a single
      linked list.  This is slow when handling directories with large numbers of
      entries (eg netdevice-related entries when lots of tunnels are opened).
      
      This patch replaces this linked list by a red-black tree.
      
      Here are some numbers:
      
      dummy30000.batch contains 30 000 times 'link add type dummy'.
      
      Before the patch:
        $ time ip -b dummy30000.batch
        real    2m31.950s
        user    0m0.440s
        sys     2m21.440s
        $ time rmmod dummy
        real    1m35.764s
        user    0m0.000s
        sys     1m24.088s
      
      After the patch:
        $ time ip -b dummy30000.batch
        real    2m0.874s
        user    0m0.448s
        sys     1m49.720s
        $ time rmmod dummy
        real    1m13.988s
        user    0m0.000s
        sys     1m1.008s
      
      The idea of improving this part was suggested by Thierry Herbelot.
      
      [akpm@linux-foundation.org: initialise proc_root.subdir at compile time]
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Thierry Herbelot <thierry.herbelot@6wind.com>.
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      710585d4
  13. 11 8月, 2014 1 次提交
    • L
      Revert "proc: Point /proc/{mounts,net} at /proc/thread-self/{mounts,net}... · 155134fe
      Linus Torvalds 提交于
      Revert "proc: Point /proc/{mounts,net} at /proc/thread-self/{mounts,net} instead of /proc/self/{mounts,net}"
      
      This reverts commits 344470ca and e8132440.
      
      It turns out that the exact path in the symlink matters, if for somewhat
      unfortunate reasons: some apparmor configurations don't allow dhclient
      access to the per-thread /proc files.  As reported by Jörg Otte:
      
        audit: type=1400 audit(1407684227.003:28): apparmor="DENIED"
          operation="open" profile="/sbin/dhclient"
          name="/proc/1540/task/1540/net/dev" pid=1540 comm="dhclient"
          requested_mask="r" denied_mask="r" fsuid=0 ouid=0
      
      so we had better revert this for now.  We might be able to work around
      this in practice by only using the per-thread symlinks if the thread
      isn't the thread group leader, and if the namespaces differ between
      threads (which basically never happens).
      
      We'll see. In the meantime, the revert was made to be intentionally easy.
      Reported-by: NJörg Otte <jrg.otte@gmail.com>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      155134fe
  14. 09 8月, 2014 1 次提交
  15. 05 8月, 2014 2 次提交
  16. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  17. 12 3月, 2014 1 次提交
    • G
      of: remove /proc/device-tree · 8357041a
      Grant Likely 提交于
      The same data is now available in sysfs, so we can remove the code
      that exports it in /proc and replace it with a symlink to the sysfs
      version.
      
      Tested on versatile qemu model and mpc5200 eval board. More testing
      would be appreciated.
      
      v5: Fixed up conflicts with mainline changes
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com>
      Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
      8357041a
  18. 27 8月, 2013 2 次提交
    • E
      userns: Better restrictions on when proc and sysfs can be mounted · e51db735
      Eric W. Biederman 提交于
      Rely on the fact that another flavor of the filesystem is already
      mounted and do not rely on state in the user namespace.
      
      Verify that the mounted filesystem is not covered in any significant
      way.  I would love to verify that the previously mounted filesystem
      has no mounts on top but there are at least the directories
      /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
      for other filesystems to mount on top of.
      
      Refactor the test into a function named fs_fully_visible and call that
      function from the mount routines of proc and sysfs.  This makes this
      test local to the filesystems involved and the results current of when
      the mounts take place, removing a weird threading of the user
      namespace, the mount namespace and the filesystems themselves.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      e51db735
    • E
      proc: Restrict mounting the proc filesystem · aee1c13d
      Eric W. Biederman 提交于
      Don't allow mounting the proc filesystem unless the caller has
      CAP_SYS_ADMIN rights over the pid namespace.  The principle here is if
      you create or have capabilities over it you can mount it, otherwise
      you get to live with what other people have mounted.
      
      Andy pointed out that this is needed to prevent users in a user
      namespace from remounting proc and specifying different hidepid and gid
      options on already existing proc mounts.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      aee1c13d
  19. 20 8月, 2013 1 次提交
    • R
      proc: return on proc_readdir error · 94fc5d9d
      Richard Genoud 提交于
      Commit f0c3b509 ("[readdir] convert procfs") introduced a bug on the
      listing of the proc file-system.  The return value of proc_readdir()
      isn't tested anymore in the proc_root_readdir function.
      
      This lead to an "interesting" behaviour when we are using the getdents()
      system call with a buffer too small: instead of failing, it returns the
      first entries of /proc (enough to fill the given buffer), plus the PID
      directories.
      
      This is not triggered on glibc (as getdents is called with a 32KB
      buffer), but on uclibc, the buffer size is only 1KB, thus some proc
      entries are missing.
      
      See https://lkml.org/lkml/2013/8/12/288 for more background.
      Signed-off-by: NRichard Genoud <richard.genoud@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94fc5d9d
  20. 29 6月, 2013 1 次提交
  21. 10 4月, 2013 1 次提交
  22. 27 3月, 2013 1 次提交
    • E
      userns: Restrict when proc and sysfs can be mounted · 87a8ebd6
      Eric W. Biederman 提交于
      Only allow unprivileged mounts of proc and sysfs if they are already
      mounted when the user namespace is created.
      
      proc and sysfs are interesting because they have content that is
      per namespace, and so fresh mounts are needed when new namespaces
      are created while at the same time proc and sysfs have content that
      is shared between every instance.
      
      Respect the policy of who may see the shared content of proc and sysfs
      by only allowing new mounts if there was an existing mount at the time
      the user namespace was created.
      
      In practice there are only two interesting cases: proc and sysfs are
      mounted at their usual places, proc and sysfs are not mounted at all
      (some form of mount namespace jail).
      
      Cc: stable@vger.kernel.org
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      87a8ebd6
  23. 20 11月, 2012 1 次提交
  24. 19 11月, 2012 4 次提交
    • E
      pidns: Make the pidns proc mount/umount logic obvious. · 0a01f2cc
      Eric W. Biederman 提交于
      Track the number of pids in the proc hash table.  When the number of
      pids goes to 0 schedule work to unmount the kernel mount of proc.
      
      Move the mount of proc into alloc_pid when we allocate the pid for
      init.
      
      Remove the surprising calls of pid_ns_release proc in fork and
      proc_flush_task.  Those code paths really shouldn't know about proc
      namespace implementation details and people have demonstrated several
      times that finding and understanding those code paths is difficult and
      non-obvious.
      
      Because of the call path detach pid is alwasy called with the
      rtnl_lock held free_pid is not allowed to sleep, so the work to
      unmounting proc is moved to a work queue.  This has the side benefit
      of not blocking the entire world waiting for the unnecessary
      rcu_barrier in deactivate_locked_super.
      
      In the process of making the code clear and obvious this fixes a bug
      reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
      mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
      succeeded and copy_net_ns failed.
      Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      0a01f2cc
    • E
      pidns: Use task_active_pid_ns where appropriate · 17cf22c3
      Eric W. Biederman 提交于
      The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
      aka ns_of_pid(task_pid(tsk)) should have the same number of
      cache line misses with the practical difference that
      ns_of_pid(task_pid(tsk)) is released later in a processes life.
      
      Furthermore by using task_active_pid_ns it becomes trivial
      to write an unshare implementation for the the pid namespace.
      
      So I have used task_active_pid_ns everywhere I can.
      
      In fork since the pid has not yet been attached to the
      process I use ns_of_pid, to achieve the same effect.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      17cf22c3
    • E
      procfs: Don't cache a pid in the root inode. · ae06c7c8
      Eric W. Biederman 提交于
      Now that we have s_fs_info pointing to our pid namespace
      the original reason for the proc root inode having a struct
      pid is gone.
      
      Caching a pid in the root inode has led to some complicated
      code.  Now that we don't need the struct pid, just remove it.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      ae06c7c8
    • E
      procfs: Use the proc generic infrastructure for proc/self. · e656d8a6
      Eric W. Biederman 提交于
      I had visions at one point of splitting proc into two filesystems.  If
      that had happened proc/self being the the part of proc that actually deals
      with pids would have been a nice cleanup.  As it is proc/self requires
      a lot of unnecessary infrastructure for a single file.
      
      The only user visible change is that a mounted /proc for a pid namespace
      that is dead now shows a broken proc symlink, instead of being completely
      invisible.  I don't think anyone will notice or care.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      e656d8a6
  25. 06 10月, 2012 1 次提交
  26. 14 7月, 2012 2 次提交
  27. 16 5月, 2012 1 次提交
  28. 06 4月, 2012 1 次提交
  29. 11 1月, 2012 2 次提交
    • V
      procfs: add hidepid= and gid= mount options · 0499680a
      Vasiliy Kulikov 提交于
      Add support for mount options to restrict access to /proc/PID/
      directories.  The default backward-compatible "relaxed" behaviour is left
      untouched.
      
      The first mount option is called "hidepid" and its value defines how much
      info about processes we want to be available for non-owners:
      
      hidepid=0 (default) means the old behavior - anybody may read all
      world-readable /proc/PID/* files.
      
      hidepid=1 means users may not access any /proc/<pid>/ directories, but
      their own.  Sensitive files like cmdline, sched*, status are now protected
      against other users.  As permission checking done in proc_pid_permission()
      and files' permissions are left untouched, programs expecting specific
      files' modes are not confused.
      
      hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
      users.  It doesn't mean that it hides whether a process exists (it can be
      learned by other means, e.g.  by kill -0 $PID), but it hides process' euid
      and egid.  It compicates intruder's task of gathering info about running
      processes, whether some daemon runs with elevated privileges, whether
      another user runs some sensitive program, whether other users run any
      program at all, etc.
      
      gid=XXX defines a group that will be able to gather all processes' info
      (as in hidepid=0 mode).  This group should be used instead of putting
      nonroot user in sudoers file or something.  However, untrusted users (like
      daemons, etc.) which are not supposed to monitor the tasks in the whole
      system should not be added to the group.
      
      hidepid=1 or higher is designed to restrict access to procfs files, which
      might reveal some sensitive private information like precise keystrokes
      timings:
      
      http://www.openwall.com/lists/oss-security/2011/11/05/3
      
      hidepid=1/2 doesn't break monitoring userspace tools.  ps, top, pgrep, and
      conky gracefully handle EPERM/ENOENT and behave as if the current user is
      the only user running processes.  pstree shows the process subtree which
      contains "pstree" process.
      
      Note: the patch doesn't deal with setuid/setgid issues of keeping
      preopened descriptors of procfs files (like
      https://lkml.org/lkml/2011/2/7/368).  We rely on that the leaked
      information like the scheduling counters of setuid apps doesn't threaten
      anybody's privacy - only the user started the setuid program may read the
      counters.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0499680a
    • V
      procfs: parse mount options · 97412950
      Vasiliy Kulikov 提交于
      Add support for procfs mount options.  Actual mount options are coming in
      the next patches.
      Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Theodore Tso <tytso@MIT.EDU>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97412950
  30. 09 12月, 2011 1 次提交
  31. 28 7月, 2011 1 次提交
    • D
      proc: make struct proc_dir_entry::name a terminal array rather than a pointer · 09570f91
      David Howells 提交于
      Since __proc_create() appends the name it is given to the end of the PDE
      structure that it allocates, there isn't a need to store a name pointer.
      Instead we can just replace the name pointer with a terminal char array of
      _unspecified_ length.  The compiler will simply append the string to statically
      defined variables of PDE type overlapping any hole at the end of the structure
      and, unlike specifying an explicitly _zero_ length array, won't give a warning
      if you try to statically initialise it with a string of more than zero length.
      
      Also, whilst we're at it:
      
       (1) Move namelen to end just prior to name and reduce it to a single byte
           (name shouldn't be longer than NAME_MAX).
      
       (2) Move pde_unload_lock two places further on so that if it's four bytes in
           size on a 64-bit machine, it won't cause an unused hole in the PDE struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09570f91
  32. 13 6月, 2011 1 次提交