1. 20 11月, 2012 3 次提交
    • E
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman 提交于
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      98f842e6
    • E
      proc: Generalize proc inode allocation · 33d6dce6
      Eric W. Biederman 提交于
      Generalize the proc inode allocation so that it can be
      used without having to having to create a proc_dir_entry.
      
      This will allow namespace file descriptors to remain light
      weight entitities but still have the same inode number
      when the backing namespace is the same.
      Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      33d6dce6
    • E
      userns: Implent proc namespace operations · cde1975b
      Eric W. Biederman 提交于
      This allows entering a user namespace, and the ability
      to store a reference to a user namespace with a bind
      mount.
      
      Addition of missing userns_ns_put in userns_install
      from Gao feng <gaofeng@cn.fujitsu.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      cde1975b
  2. 19 11月, 2012 2 次提交
    • E
      vfs: Add setns support for the mount namespace · 8823c079
      Eric W. Biederman 提交于
      setns support for the mount namespace is a little tricky as an
      arbitrary decision must be made about what to set fs->root and
      fs->pwd to, as there is no expectation of a relationship between
      the two mount namespaces.  Therefore I arbitrarily find the root
      mount point, and follow every mount on top of it to find the top
      of the mount stack.  Then I set fs->root and fs->pwd to that
      location.  The topmost root of the mount stack seems like a
      reasonable place to be.
      
      Bind mount support for the mount namespace inodes has the
      possibility of creating circular dependencies between mount
      namespaces.  Circular dependencies can result in loops that
      prevent mount namespaces from every being freed.  I avoid
      creating those circular dependencies by adding a sequence number
      to the mount namespace and require all bind mounts be of a
      younger mount namespace into an older mount namespace.
      
      Add a helper function proc_ns_inode so it is possible to
      detect when we are attempting to bind mound a namespace inode.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      8823c079
    • E
      pidns: Add setns support · 57e8391d
      Eric W. Biederman 提交于
      - Pid namespaces are designed to be inescapable so verify that the
        passed in pid namespace is a child of the currently active
        pid namespace or the currently active pid namespace itself.
      
        Allowing the currently active pid namespace is important so
        the effects of an earlier setns can be cancelled.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      57e8391d
  3. 16 5月, 2012 1 次提交
  4. 11 1月, 2012 1 次提交
  5. 04 1月, 2012 1 次提交
  6. 28 7月, 2011 1 次提交
    • D
      proc: make struct proc_dir_entry::name a terminal array rather than a pointer · 09570f91
      David Howells 提交于
      Since __proc_create() appends the name it is given to the end of the PDE
      structure that it allocates, there isn't a need to store a name pointer.
      Instead we can just replace the name pointer with a terminal char array of
      _unspecified_ length.  The compiler will simply append the string to statically
      defined variables of PDE type overlapping any hole at the end of the structure
      and, unlike specifying an explicitly _zero_ length array, won't give a warning
      if you try to statically initialise it with a string of more than zero length.
      
      Also, whilst we're at it:
      
       (1) Move namelen to end just prior to name and reduce it to a single byte
           (name shouldn't be longer than NAME_MAX).
      
       (2) Move pde_unload_lock two places further on so that if it's four bytes in
           size on a 64-bit machine, it won't cause an unused hole in the PDE struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09570f91
  7. 27 7月, 2011 1 次提交
  8. 27 5月, 2011 1 次提交
    • J
      mm: extract exe_file handling from procfs · 38646013
      Jiri Slaby 提交于
      Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
      This was because exe_file was needed only for /proc/<pid>/exe.  Since we
      will need the exe_file functionality also for core dumps (so core name can
      contain full binary path), built this functionality always into the
      kernel.
      
      To achieve that move that out of proc FS to the kernel/ where in fact it
      should belong.  By doing that we can make dup_mm_exe_file static.  Also we
      can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38646013
  9. 25 5月, 2011 1 次提交
  10. 18 5月, 2011 1 次提交
  11. 11 5月, 2011 4 次提交
  12. 24 3月, 2011 1 次提交
  13. 23 9月, 2009 3 次提交
    • K
      kcore: register vmemmap range · 26562c59
      KAMEZAWA Hiroyuki 提交于
      Benjamin Herrenschmidt <benh@kernel.crashing.org> pointed out that vmemmap
      range is not included in KCORE_RAM, KCORE_VMALLOC ....
      
      This adds KCORE_VMEMMAP if SPARSEMEM_VMEMMAP is used.  By this, vmemmap
      can be readable via /proc/kcore
      
      Because it's not vmalloc area, vread/vwrite cannot be used.  But the range
      is static against the memory layout, this patch handles vmemmap area by
      the same scheme with physical memory.
      
      This patch assumes SPARSEMEM_VMEMMAP range is not in VMALLOC range.  It's
      correct now.
      
      [akpm@linux-foundation.org: fix typo]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26562c59
    • K
      kcore: add kclist types · c30bb2a2
      KAMEZAWA Hiroyuki 提交于
      Presently, kclist_add() only eats start address and size as its arguments.
      Considering to make kclist dynamically reconfigulable, it's necessary to
      know which kclists are for System RAM and which are not.
      
      This patch add kclist types as
        KCORE_RAM
        KCORE_VMALLOC
        KCORE_TEXT
        KCORE_OTHER
      
      This "type" is used in a patch following this for detecting KCORE_RAM.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c30bb2a2
    • K
      kcore: use usual list for kclist · 2ef43ec7
      KAMEZAWA Hiroyuki 提交于
      This patchset is for /proc/kcore.  With this,
      
       - many per-arch hooks are removed.
      
       - /proc/kcore will know really valid physical memory area.
      
       - /proc/kcore will be aware of memory hotplug.
      
       - /proc/kcore will be architecture independent i.e.
         if an arch supports CONFIG_MMU, it can use /proc/kcore.
         (if the arch uses usual memory layout.)
      
      This patch:
      
      /proc/kcore uses its own list handling codes. It's better to use
      generic list codes.
      
      No changes in logic. just clean up.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ef43ec7
  14. 12 6月, 2009 1 次提交
  15. 31 3月, 2009 1 次提交
    • A
      proc 2/2: remove struct proc_dir_entry::owner · 99b76233
      Alexey Dobriyan 提交于
      Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
      as correctly noted at bug #12454. Someone can lookup entry with NULL
      ->owner, thus not pinning enything, and release it later resulting
      in module refcount underflow.
      
      We can keep ->owner and supply it at registration time like ->proc_fops
      and ->data.
      
      But this leaves ->owner as easy-manipulative field (just one C assignment)
      and somebody will forget to unpin previous/pin current module when
      switching ->owner. ->proc_fops is declared as "const" which should give
      some thoughts.
      
      ->read_proc/->write_proc were just fixed to not require ->owner for
      protection.
      
      rmmod'ed directories will be empty and return "." and ".." -- no harm.
      And directories with tricky enough readdir and lookup shouldn't be modular.
      We definitely don't want such modular code.
      
      Removing ->owner will also make PDE smaller.
      
      So, let's nuke it.
      
      Kudos to Jeff Layton for reminding about this, let's say, oversight.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12454Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      99b76233
  16. 23 10月, 2008 2 次提交
  17. 07 10月, 2008 1 次提交
  18. 27 7月, 2008 1 次提交
    • A
      [PATCH] sanitize proc_sysctl · 9043476f
      Al Viro 提交于
      * keep references to ctl_table_head and ctl_table in /proc/sys inodes
      * grab the former during operations, use the latter for access to
        entry if that succeeds
      * have ->d_compare() check if table should be seen for one who does lookup;
        that allows us to avoid flipping inodes - if we have the same name resolve
        to different things, we'll just keep several dentries and ->d_compare()
        will reject the wrong ones.
      * have ->lookup() and ->readdir() scan the table of our inode first, then
        walk all ctl_table_header and scan ->attached_by for those that are
        attached to our directory.
      * implement ->getattr().
      * get rid of insane amounts of tree-walking
      * get rid of the need to know dentry in ->permission() and of the contortions
        induced by that.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9043476f
  19. 26 7月, 2008 2 次提交
  20. 23 7月, 2008 1 次提交
  21. 13 6月, 2008 1 次提交
  22. 29 4月, 2008 7 次提交
    • D
      proc: introduce proc_create_data to setup de->data · 59b74351
      Denis V. Lunev 提交于
      This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
      the most part of the kernel code.  The original OOPS is described in the
      commit 2d3a4e36:
      
          Typical PDE creation code looks like:
      
          	pde = create_proc_entry("foo", 0, NULL);
          	if (pde)
          		pde->proc_fops = &foo_proc_fops;
      
          Notice that PDE is first created, only then ->proc_fops is set up to
          final value. This is a problem because right after creation
          a) PDE is fully visible in /proc , and
          b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
             possible to ->read without ->open (see one class of oopses below).
      
          The fix is new API called proc_create() which makes sure ->proc_fops are
          set up before gluing PDE to main tree. Typical new code looks like:
      
          	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
          	if (!pde)
          		return -ENOMEM;
      
          Fix most networking users for a start.
      
          In the long run, create_proc_entry() for regular files will go.
      
      In addition to this, proc_create_data is introduced to fix reading from
      proc without PDE->data. The race is basically the same as above.
      
      create_proc_entries is replaced in the entire kernel code as new method
      is also simply better.
      
      This patch:
      
      The problem is the same as for de->proc_fops.  Right now PDE becomes visible
      without data set.  So, the entry could be looked up without data.  This, in
      most cases, will simply OOPS.
      
      proc_create_data call is created to address this issue.  proc_create now
      becomes a wrapper around it.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59b74351
    • A
      proc: remove ->get_info infrastructure · 8731f14d
      Alexey Dobriyan 提交于
      Now that last dozen or so users of ->get_info were removed, ditch it too.
      Everyone sane shouldd have switched to seq_file interface long ago.
      
      P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
            is long-standing crap, BTW, thus
            a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
            b) new such users should be rejected,
            c) everyone is encouraged to convert his favourite ->read_proc user or
               I'll do it, lazy bastards.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8731f14d
    • A
      proc: remove proc_root from drivers · c74c120a
      Alexey Dobriyan 提交于
      Remove proc_root export.  Creation and removal works well if parent PDE is
      supplied as NULL -- it worked always that way.
      
      So, one useless export removed and consistency added, some drivers created
      PDEs with &proc_root as parent but removed them as NULL and so on.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c74c120a
    • A
      proc: remove proc_root_driver · 928b4d8c
      Alexey Dobriyan 提交于
      Use creation by full path: "driver/foo".
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      928b4d8c
    • A
      proc: remove proc_root_fs · 36a5aeb8
      Alexey Dobriyan 提交于
      Use creation by full path instead: "fs/foo".
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36a5aeb8
    • A
      proc: remove proc_bus · 9c37066d
      Alexey Dobriyan 提交于
      Remove proc_bus export and variable itself. Using pathnames works fine
      and is slightly more understandable and greppable.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c37066d
    • M
      procfs task exe symlink · 925d1c40
      Matt Helsley 提交于
      The kernel implements readlink of /proc/pid/exe by getting the file from
      the first executable VMA.  Then the path to the file is reconstructed and
      reported as the result.
      
      Because of the VMA walk the code is slightly different on nommu systems.
      This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      walking the VMAs to find the first executable file-backed VMA we store a
      reference to the exec'd file in the mm_struct.
      
      That reference would prevent the filesystem holding the executable file
      from being unmounted even after unmapping the VMAs.  So we track the number
      of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      unmapped.  This avoids pinning the mounted filesystem.
      
      [akpm@linux-foundation.org: improve comments]
      [yamamoto@valinux.co.jp: fix dup_mmap]
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      Cc:"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      925d1c40
  23. 08 3月, 2008 1 次提交
    • P
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov 提交于
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9720acd
  24. 15 2月, 2008 1 次提交