1. 26 11月, 2006 1 次提交
    • V
      [PATCH] mounstats NULL pointer dereference · 701e054e
      Vasily Tarasov 提交于
      OpenVZ developers team has encountered the following problem in 2.6.19-rc6
      kernel. After some seconds of running script
      
      while [[ 1 ]]
      do
      	find  /proc -name mountstats | xargs cat
      done
      
      this Oops appears:
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address
      00000010
       printing eip:
      c01a6b70
      *pde = 00000000
      Oops: 0000 [#1]
      SMP
      Modules linked in: xt_length ipt_ttl xt_tcpmss ipt_TCPMSS iptable_mangle
      iptable_filter xt_multiport xt_limit ipt_tos ipt_REJECT ip_tables x_tables
      parport_pc lp parport sunrpc af_packet thermal processor fan button battery
      asus_acpi ac ohci_hcd ehci_hcd usbcore i2c_nforce2 i2c_core tg3 floppy
      pata_amd
      ide_cd cdrom sata_nv libata
      CPU:    1
      EIP:    0060:[<c01a6b70>]    Not tainted VLI
      EFLAGS: 00010246   (2.6.19-rc6 #2)
      EIP is at mountstats_open+0x70/0xf0
      eax: 00000000   ebx: e6247030   ecx: e62470f8   edx: 00000000
      esi: 00000000   edi: c01a6b00   ebp: c33b83c0   esp: f4105eb4
      ds: 007b   es: 007b   ss: 0068
      Process cat (pid: 6044, ti=f4105000 task=f4104a70 task.ti=f4105000)
      Stack: c33b83c0 c04ee940 f46a4a80 c33b83c0 e4df31b4 c01a6b00 f4105000 c0169231
             e4df31b4 c33b83c0 c33b83c0 f4105f20 00000003 f4105000 c0169445 f2503cf0
             f7f8c4c0 00008000 c33b83c0 00000000 00008000 c0169350 f4105f20 00008000
      Call Trace:
       [<c01a6b00>] mountstats_open+0x0/0xf0
       [<c0169231>] __dentry_open+0x181/0x250
       [<c0169445>] nameidata_to_filp+0x35/0x50
       [<c0169350>] do_filp_open+0x50/0x60
       [<c01873d6>] seq_read+0xc6/0x300
       [<c0169511>] get_unused_fd+0x31/0xc0
       [<c01696d3>] do_sys_open+0x63/0x110
       [<c01697a7>] sys_open+0x27/0x30
       [<c01030bd>] sysenter_past_esp+0x56/0x79
       =======================
      Code: 45 74 8b 54 24 20 89 44 24 08 8b 42 f0 31 d2 e8 47 cb f8 ff 85 c0 89 c3
      74 51 8d 80 a0 04 00 00 e8 46 06 2c 00 8b 83 48 04 00 00 <8b> 78 10 85 ff 74
      03
      f0 ff 07 b0 01 86 83 a0 04 00 00 f0 ff 4b
      EIP: [<c01a6b70>] mountstats_open+0x70/0xf0 SS:ESP 0068:f4105eb4
      
      The problem is that task->nsproxy can be equal NULL for some time during
      task exit. This patch fixes the BUG.
      Signed-off-by: NVasily Tarasov <vtaras@openvz.org>
      Cc: Herbert Poetzl <herbert@13thfloor.at>
      Cc: "Serge E. Hallyn" <serue@us.ibm.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      701e054e
  2. 21 10月, 2006 1 次提交
  3. 17 10月, 2006 1 次提交
  4. 02 10月, 2006 14 次提交
  5. 30 9月, 2006 1 次提交
  6. 16 7月, 2006 1 次提交
  7. 15 7月, 2006 2 次提交
  8. 01 7月, 2006 1 次提交
  9. 27 6月, 2006 18 次提交
    • E
      [PATCH] SELinux: Add sockcreate node to procattr API · 42c3e03e
      Eric Paris 提交于
      Below is a patch to add a new /proc/self/attr/sockcreate A process may write a
      context into this interface and all subsequent sockets created will be labeled
      with that context.  This is the same idea as the fscreate interface where a
      process can specify the label of a file about to be created.  At this time one
      envisioned user of this will be xinetd.  It will be able to better label
      sockets for the actual services.  At this time all sockets take the label of
      the creating process, so all xinitd sockets would just be labeled the same.
      
      I tested this by creating a tcp sender and listener.  The sender was able to
      write to this new proc file and then create sockets with the specified label.
      I am able to be sure the new label was used since the avc denial messages
      kicked out by the kernel included both the new security permission
      setsockcreate and all the socket denials were for the new label, not the label
      of the running process.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      42c3e03e
    • O
      [PATCH] cleanup next_tid() · c1df7fb8
      Oleg Nesterov 提交于
      Try to make next_tid() a bit more readable and deletes unnecessary
      "pid_alive(pos)" check.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c1df7fb8
    • O
      [PATCH] simplify/fix first_tid() · a872ff0c
      Oleg Nesterov 提交于
      first_tid:
      
      	/* If nr exceeds the number of threads there is nothing todo */
      	if (nr) {
      		if (nr >= get_nr_threads(leader))
      			goto done;
      	}
      
      This is not reliable: sub-threads can exit after this check, so the
      'for' loop below can overlap and proc_task_readdir() can return an
      already filldir'ed dirents.
      
      	for (; pos && pid_alive(pos); pos = next_thread(pos)) {
      		if (--nr > 0)
      			continue;
      
      Off-by-one error, will return 'leader' when nr == 1.
      
      This patch tries to fix these problems and simplify the code.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a872ff0c
    • E
      [PATCH] proc: Remove tasklist_lock from proc_task_readdir. · cc288738
      Eric W. Biederman 提交于
      This is just like my previous removal of tasklist_lock from first_tgid, and
      next_tgid.  It simply had to wait until it was rcu safe to walk the thread
      list.
      
      This should be the last instance of the tasklist_lock in proc.  So user
      processes should not be able to influence the tasklist lock hold times.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cc288738
    • E
      [PATCH] proc: Cleanup proc_fd_access_allowed · df26c40e
      Eric W. Biederman 提交于
      In process of getting proc_fd_access_allowed to work it has developed a few
      warts.  In particular the special case that always allows introspection and
      the special case to allow inspection of kernel threads.
      
      The special case for introspection is needed for /proc/self/mem.
      
      The special case for kernel threads really should be overridable
      by security modules.
      
      So consolidate these checks into ptrace.c:may_attach().
      
      The check to always allow introspection is trivial.
      
      The check to allow access to kernel threads, and zombies is a little
      trickier.  mem_read and mem_write already verify an mm exists so it isn't
      needed twice.  proc_fd_access_allowed only doesn't want a check to verify
      task->mm exits, s it prevents all access to kernel threads.  So just move
      the task->mm check into ptrace_attach where it is needed for practical
      reasons.
      
      I did a quick audit and none of the security modules in the kernel seem to
      care if they are passed a task without an mm into security_ptrace.  So the
      above move should be safe and it allows security modules to come up with
      more restrictive policy.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      df26c40e
    • E
      [PATCH] proc: Use sane permission checks on the /proc/<pid>/fd/ symlinks · 778c1144
      Eric W. Biederman 提交于
      Since 2.2 we have been doing a chroot check to see if it is appropriate to
      return a read or follow one of these magic symlinks.  The chroot check was
      asking a question about the visibility of files to the calling process and
      it was actually checking the destination process, and not the files
      themselves.  That test was clearly bogus.
      
      In my first pass through I simply fixed the test to check the visibility of
      the files themselves.  That naive approach to fixing the permissions was
      too strict and resulted in cases where a task could not even see all of
      it's file descriptors.
      
      What has disturbed me about relaxing this check is that file descriptors
      are per-process private things, and they are occasionaly used a user space
      capability tokens.  Looking a little farther into the symlink path on /proc
      I did find userid checks and a check for capability (CAP_DAC_OVERRIDE) so
      there were permissions checking this.
      
      But I was still concerned about privacy.  Besides /proc there is only one
      other way to find out this kind of information, and that is ptrace.  ptrace
      has been around for a long time and it has a well established security
      model.
      
      So after thinking about it I finally realized that the permission checks
      that make sense are the permission checks applied to ptrace_attach.  The
      checks are simple per process, and won't cause nasty surprises for people
      coming from less capable unices.
      
      Unfortunately there is one case that the current ptrace_attach test does
      not cover: Zombies and kernel threads.  Single stepping those kinds of
      processes is impossible.  Being able to see which file descriptors are open
      on these tasks is important to lsof, fuser and friends.  So for these
      special processes I made the rule you can't find out unless you have
      CAP_SYS_PTRACE.
      
      These proc permission checks should now conform to the principle of least
      surprise.  As well as using much less code to implement :)
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      778c1144
    • E
      [PATCH] proc: optimize proc_check_dentry_visible · 5b0c1dd3
      Eric W. Biederman 提交于
      The code doesn't need to sleep to when making this check so I can just do the
      comparison and not worry about the reference counts.
      
      TODO: While looking at this I realized that my original cleanup did not push
      the permission check far enough down into the stack.  The call of
      proc_check_dentry_visible needs to move out of the generic proc
      readlink/follow link code and into the individual get_link instances.
      Otherwise the shared resources checks are not quite correct (shared
      files_struct does not require a shared fs_struct), and there are races with
      unshare.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b0c1dd3
    • E
      [PATCH] proc: Use struct pid not struct task_ref · 13b41b09
      Eric W. Biederman 提交于
      Incrementally update my proc-dont-lock-task_structs-indefinitely patches so
      that they work with struct pid instead of struct task_ref.
      
      Mostly this is a straight 1-1 substitution.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      13b41b09
    • E
      [PATCH] proc: don't lock task_structs indefinitely · 99f89551
      Eric W. Biederman 提交于
      Every inode in /proc holds a reference to a struct task_struct.  If a
      directory or file is opened and remains open after the the task exits this
      pinning continues.  With 8K stacks on a 32bit machine the amount pinned per
      file descriptor is about 10K.
      
      Normally I would figure a reasonable per user process limit is about 100
      processes.  With 80 processes, with a 1000 file descriptors each I can trigger
      the 00M killer on a 32bit kernel, because I have pinned about 800MB of useless
      data.
      
      This patch replaces the struct task_struct pointer with a pointer to a struct
      task_ref which has a struct task_struct pointer.  The so the pinning of dead
      tasks does not happen.
      
      The code now has to contend with the fact that the task may now exit at any
      time.  Which is a little but not muh more complicated.
      
      With this change it takes about 1000 processes each opening up 1000 file
      descriptors before I can trigger the OOM killer.  Much better.
      
      [mlp@google.com: task_mmu small fixes]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Albert Cahalan <acahalan@gmail.com>
      Signed-off-by: NPrasanna Meda <mlp@google.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      99f89551
    • E
      [PATCH] proc: make PROC_NUMBUF the buffer size for holding integers as strings · 8578cea7
      Eric W. Biederman 提交于
      Currently in /proc at several different places we define buffers to hold a
      process id, or a file descriptor .  In most of them we use either a hard coded
      number or a different define.  Modify them all to use PROC_NUMBUF, so the code
      has a chance of being maintained.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8578cea7
    • E
      [PATCH] simply fix first_tgid · 9cc8cbc7
      Eric W. Biederman 提交于
      Like the bug Oleg spotted in first_tid there was also a small off by one
      error in first_tgid, when a seek was done on the /proc directory.  This
      fixes that and changes the code structure to make it a little more obvious
      what is going on.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9cc8cbc7
    • E
      [PATCH] proc: Remove tasklist_lock from proc_pid_lookup() and proc_task_lookup() · de758734
      Eric W. Biederman 提交于
      Since we no longer need the tasklist_lock for get_task_struct the lookup
      methods no longer need the tasklist_lock.
      
      This just depends on my previous patch that makes get_task_struct() rcu
      safe.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de758734
    • E
      [PATCH] proc: Remove tasklist_lock from proc_pid_readdir · 454cc105
      Eric W. Biederman 提交于
      We don't need the tasklist_lock to safely iterate through processes
      anymore.
      
      This depends on my previous to task patches that make get_task_struct rcu
      safe, and that make next_task() rcu safe.  I haven't gotten
      first_tid/next_tid yet only because next_thread is missing an
      rcu_dereference.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      454cc105
    • E
      [PATCH] proc: refactor reading directories of tasks · 0bc58a91
      Eric W. Biederman 提交于
      There are a couple of problems this patch addresses.
      - /proc/<tgid>/task currently does not work correctly if you stop reading
        in the middle of a directory.
      
      - /proc/ currently requires a full pass through the task list with
        the tasklist lock held, to determine there are no more processes to read.
      
      - The hand rolled integer to string conversion does not properly running
        out of buffer space.
      
      - We seem to be batching reading of pids from the tasklist without reason,
        and complicating the logic of the code.
      
      This patch addresses that by changing how tasks are processed.  A
      first_<task_type> function is built that handles restarts, and a
      next_<task_type> function is built that just advances to the next task.
      
      first_<task_type> when it detects a restart usually uses find_task_by_pid.  If
      that doesn't work because there has been a seek on the directory, or we have
      already given a complete directory listing, it first checks the number tasks
      of that type, and only if we are under that count does it walk through all of
      the tasks to find the one we are interested in.
      
      The code that fills in the directory is simpler because there is only a single
      for loop.
      
      The hand rolled integer to string conversion is replaced by snprintf which
      should handle the the out of buffer case correctly.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0bc58a91
    • E
      [PATCH] proc: Close the race of a process dying durning lookup · cd6a3ce9
      Eric W. Biederman 提交于
      proc_lookup and task exiting are not synchronized, although some of the
      previous code may have suggested that.  Every time before we reuse a dentry
      namei.c calls d_op->derevalidate which prevents us from reusing a stale dcache
      entry.  Unfortunately it does not prevent us from returning a stale dcache
      entry.  This race has been explicitly plugged in proc_pid_lookup but there is
      nothing to confine it to just that proc lookup function.
      
      So to prevent the race I call revalidate explictily in all of the proc lookup
      functions after I call d_add, and report an error if the revalidate does not
      succeed.
      
      Years ago Al Viro did something similar but those changes got lost in the
      churn.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cd6a3ce9
    • E
      [PATCH] proc: Rewrite the proc dentry flush on exit optimization · 48e6484d
      Eric W. Biederman 提交于
      To keep the dcache from filling up with dead /proc entries we flush them on
      process exit.  However over the years that code has gotten hairy with a
      dentry_pointer and a lock in task_struct and misdocumented as a correctness
      feature.
      
      I have rewritten this code to look and see if we have a corresponding entry in
      the dcache and if so flush it on process exit.  This removes the extra fields
      in the task_struct and allows me to trivially handle the case of a
      /proc/<tgid>/task/<pid> entry as well as the current /proc/<pid> entries.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      48e6484d
    • E
      [PATCH] proc: Move proc_maps_operations into task_mmu.c · 662795de
      Eric W. Biederman 提交于
      All of the functions for proc_maps_operations are already defined in
      task_mmu.c so move the operations structure to keep the functionality
      together.
      
      Since task_nommu.c implements a dummy version of /proc/<pid>/maps give it a
      simplified version of proc_maps_operations that it can modify to best suit its
      needs.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      662795de
    • E
      [PATCH] proc: Fix the link count for /proc/<pid>/task · 6e66b52b
      Eric W. Biederman 提交于
      Use getattr to get an accurate link count when needed.  This is cheaper and
      more accurate than trying to derive it by walking the thread list of a
      process.
      
      Especially as it happens when needed stat instead of at readdir time.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6e66b52b