1. 28 7月, 2008 1 次提交
    • A
      task IO accounting: improve code readability · 5995477a
      Andrea Righi 提交于
      Put all i/o statistics in struct proc_io_accounting and use inline functions to
      initialize and increment statistics, removing a lot of single variable
      assignments.
      
      This also reduces the kernel size as following (with CONFIG_TASK_XACCT=y and
      CONFIG_TASK_IO_ACCOUNTING=y).
      
          text    data     bss     dec     hex filename
         11651       0       0   11651    2d83 kernel/exit.o.before
         11619       0       0   11619    2d63 kernel/exit.o.after
         10886     132     136   11154    2b92 kernel/fork.o.before
         10758     132     136   11026    2b12 kernel/fork.o.after
      
       3082029  807968 4818600 8708597  84e1f5 vmlinux.o.before
       3081869  807968 4818600 8708437  84e155 vmlinux.o.after
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Acked-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5995477a
  2. 27 7月, 2008 4 次提交
  3. 26 7月, 2008 1 次提交
  4. 14 7月, 2008 1 次提交
    • S
      Security: split proc ptrace checking into read vs. attach · 006ebb40
      Stephen Smalley 提交于
      Enable security modules to distinguish reading of process state via
      proc from full ptrace access by renaming ptrace_may_attach to
      ptrace_may_access and adding a mode argument indicating whether only
      read access or full attach access is requested.  This allows security
      modules to permit access to reading process state without granting
      full ptrace access.  The base DAC/capability checking remains unchanged.
      
      Read access to /proc/pid/mem continues to apply a full ptrace attach
      check since check_mem_permission() already requires the current task
      to already be ptracing the target.  The other ptrace checks within
      proc for elements like environ, maps, and fds are changed to pass the
      read mode instead of attach.
      
      In the SELinux case, we model such reading of process state as a
      reading of a proc file labeled with the target process' label.  This
      enables SELinux policy to permit such reading of process state without
      permitting control or manipulation of the target process, as there are
      a number of cases where programs probe for such information via proc
      but do not need to be able to control the target (e.g. procps,
      lsof, PolicyKit, ConsoleKit).  At present we have to choose between
      allowing full ptrace in policy (more permissive than required/desired)
      or breaking functionality (or in some cases just silencing the denials
      via dontaudit rules but this can hide genuine attacks).
      
      This version of the patch incorporates comments from Casey Schaufler
      (change/replace existing ptrace_may_attach interface, pass access
      mode), and Chris Wright (provide greater consistency in the checking).
      
      Note that like their predecessors __ptrace_may_attach and
      ptrace_may_attach, the __ptrace_may_access and ptrace_may_access
      interfaces use different return value conventions from each other (0
      or -errno vs. 1 or 0).  I retained this difference to avoid any
      changes to the caller logic but made the difference clearer by
      changing the latter interface to return a bool rather than an int and
      by adding a comment about it to ptrace.h for any future callers.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      006ebb40
  5. 07 6月, 2008 1 次提交
  6. 17 5月, 2008 1 次提交
  7. 02 5月, 2008 1 次提交
  8. 29 4月, 2008 2 次提交
    • R
      procfs: mem permission cleanup · 638fa202
      Roland McGrath 提交于
      This cleans up the permission checks done for /proc/PID/mem i/o calls.  It
      puts all the logic in a new function, check_mem_permission().
      
      The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
      magical expression multiple times.  The new function does all that work in one
      place, with clear comments.
      
      The old code called security_ptrace() twice on successful checks, once in
      MAY_PTRACE() and once in __ptrace_may_attach().  Now it's only called once,
      and only if all other checks have succeeded.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      638fa202
    • M
      procfs task exe symlink · 925d1c40
      Matt Helsley 提交于
      The kernel implements readlink of /proc/pid/exe by getting the file from
      the first executable VMA.  Then the path to the file is reconstructed and
      reported as the result.
      
      Because of the VMA walk the code is slightly different on nommu systems.
      This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      walking the VMAs to find the first executable file-backed VMA we store a
      reference to the exec'd file in the mm_struct.
      
      That reference would prevent the filesystem holding the executable file
      from being unmounted even after unmapping the VMAs.  So we track the number
      of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      unmapped.  This avoids pinning the mounted filesystem.
      
      [akpm@linux-foundation.org: improve comments]
      [yamamoto@valinux.co.jp: fix dup_mmap]
      Signed-off-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: David Howells <dhowells@redhat.com>
      Cc:"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      925d1c40
  9. 23 4月, 2008 3 次提交
    • R
      [patch 6/7] vfs: mountinfo: add /proc/<pid>/mountinfo · 2d4d4864
      Ram Pai 提交于
      [mszeredi@suse.cz] rewrite and split big patch into managable chunks
      
      /proc/mounts in its current form lacks important information:
      
       - propagation state
       - root of mount for bind mounts
       - the st_dev value used within the filesystem
       - identifier for each mount and it's parent
      
      It also suffers from the following problems:
      
       - not easily extendable
       - ambiguity of mountpoints within a chrooted environment
       - doesn't distinguish between filesystem dependent and independent options
       - doesn't distinguish between per mount and per super block options
      
      This patch introduces /proc/<pid>/mountinfo which attempts to address
      all these deficiencies.
      
      Code shared between /proc/<pid>/mounts and /proc/<pid>/mountinfo is
      extracted into separate functions.
      
      Thanks to Al Viro for the help in getting the design right.
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2d4d4864
    • M
      [patch 5/7] vfs: mountinfo: allow using process root · a1a2c409
      Miklos Szeredi 提交于
      Allow /proc/<pid>/mountinfo to use the root of <pid> to calculate
      mountpoints.
      
       - move definition of 'struct proc_mounts' to <linux/mnt_namespace.h>
       - add the process's namespace and root to this structure
       - pass a pointer to 'struct proc_mounts' into seq_operations
      
      In addition the following cleanups are made:
      
       - use a common open function for /proc/<pid>/{mounts,mountstat}
       - surround namespace.c part of these proc files with #ifdef CONFIG_PROC_FS
       - make the seq_operations structures const
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a1a2c409
    • A
      [PATCH] proc_readfd_common() race fix · 9b4f526c
      Al Viro 提交于
      Since we drop the rcu_read_lock inside the loop, we can't assume
      that files->fdt will remain unchanged (and not freed) between
      iterations.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b4f526c
  10. 21 3月, 2008 1 次提交
    • A
      [NET]: Fix permissions of /proc/net · 4f42c288
      Andre Noll 提交于
      commit e9720acd ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
      broke ganglia and probably other applications that read /proc/net/dev.
      
      This is due to the change of permissions of /proc/net that was
      introduced in that commit.
      
      Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
      After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net
      
      This patch restores the permissions to the old value which makes
      ganglia happy again.
      
      Pavel Emelyanov says:
      
      	This also broke the postfix, as it was reported in bug #10286
      	and described in detail by Benjamin.
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f42c288
  11. 18 3月, 2008 1 次提交
  12. 12 3月, 2008 1 次提交
  13. 08 3月, 2008 1 次提交
    • P
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov 提交于
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9720acd
  14. 25 2月, 2008 3 次提交
  15. 24 2月, 2008 1 次提交
  16. 15 2月, 2008 5 次提交
  17. 09 2月, 2008 8 次提交
  18. 06 2月, 2008 4 次提交
    • A
      Fix /proc dcache deadlock in do_exit · 7766755a
      Andrea Arcangeli 提交于
      This patch fixes a sles9 system hang in start_this_handle from a customer
      with some heavy workload where all tasks are waiting on kjournald to commit
      the transaction, but kjournald waits on t_updates to go down to zero (it
      never does).
      
      This was reported as a lowmem shortage deadlock but when checking the debug
      data I noticed the VM wasn't under pressure at all (well it was really
      under vm pressure, because lots of tasks hanged in the VM prune_dcache
      methods trying to flush dirty inodes, but no task was hanging in GFP_NOFS
      mode, the holder of the journal handle should have if this was a vm issue
      in the first place).
      
      No task was apparently holding the leftover handle in the committing
      transaction, so I deduced t_updates was stuck to 1 because a journal_stop
      was never run by some path (this turned out to be correct).  With a debug
      patch adding proper reverse links and stack trace logging in ext3 deployed
      in production, I found journal_stop is never run because
      mark_inode_dirty_sync is called inside release_task called by do_exit.
      (that was quite fun because I would have never thought about this
      subtleness, I thought a regular path in ext3 had a bug and it forgot to
      call journal_stop)
      
      do_exit->release_task->mark_inode_dirty_sync->schedule() (will never
      come back to run journal_stop)
      
      The reason is that shrink_dcache_parent is racy by design (feature not
      a bug) and it can do blocking I/O in some case, but the point is that
      calling shrink_dcache_parent at the last stage of do_exit isn't safe
      for self-reaping tasks.
      
      I guess the memory pressure of the unbalanced highmem system allowed
      to trigger this more easily.
      
      Now mainline doesn't have this line in iput (like sles9 has):
      
          	     if (inode->i_state & I_DIRTY_DELAYED)
      	     			mark_inode_dirty_sync(inode);
      
      so it will probably not crash with ext3, but for example ext2 implements an
      I/O-blocking ext2_put_inode that will lead to similar screwups with
      ext2_free_blocks never coming back and it's definitely wrong to call
      blocking-IO paths inside do_exit.  So this should fix a subtle bug in
      mainline too (not verified in practice though).  The equivalent fix for
      ext3 is also not verified yet to fix the problem in sles9 but I don't have
      doubt it will (it usually takes days to crash, so it'll take weeks to be
      sure).
      
      An alternate fix would be to offload that work to a kernel thread, but I
      don't think a reschedule for this is worth it, the vm should be able to
      collect those entries for the synchronous release_task.
      Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7766755a
    • M
      maps4: make page monitoring /proc file optional · 1e883281
      Matt Mackall 提交于
      Make /proc/ page monitoring configurable
      
      This puts the following files under an embedded config option:
      
      /proc/pid/clear_refs
      /proc/pid/smaps
      /proc/pid/pagemap
      /proc/kpagecount
      /proc/kpageflags
      
      [akpm@linux-foundation.org: Kconfig fix]
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e883281
    • M
      maps4: add /proc/pid/pagemap interface · 85863e47
      Matt Mackall 提交于
      This interface provides a mapping for each page in an address space to its
      physical page frame number, allowing precise determination of what pages are
      mapped and what pages are shared between processes.
      
      New in this version:
      
      - headers gone again (as recommended by Dave Hansen and Alan Cox)
      - 64-bit entries (as per discussion with Andi Kleen)
      - swap pte information exported (from Dave Hansen)
      - page walker callback for holes (from Dave Hansen)
      - direct put_user I/O (as suggested by Rusty Russell)
      
      This patch folds in cleanups and swap PTE support from Dave Hansen
      <haveblue@us.ibm.com>.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85863e47
    • M
      maps4: move clear_refs code to task_mmu.c · f248dcb3
      Matt Mackall 提交于
      This puts all the clear_refs code where it belongs and probably lets things
      compile on MMU-less systems as well.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f248dcb3