1. 09 2月, 2008 4 次提交
  2. 11 12月, 2007 1 次提交
    • E
      proc: remove/Fix proc generic d_revalidate · 3790ee4b
      Eric W. Biederman 提交于
      Ultimately to implement /proc perfectly we need an implementation of
      d_revalidate because files and directories can be removed behind the back
      of the VFS, and d_revalidate is the only way we can let the VFS know that
      this has happened.
      
      Unfortunately the linux VFS can not cope with anything in the path to a
      mount point going away.  So a proper d_revalidate method that calls d_drop
      also needs to call have_submounts which is moderately expensive, so you
      really don't want a d_revalidate method that unconditionally calls it, but
      instead only calls it when the backing object has really gone away.
      
      proc generic entries only disappear on module_unload (when not counting the
      fledgling network namespace) so it is quite rare that we actually encounter
      that case and has not actually caused us real world trouble yet.
      
      So until we get a proper test for keeping dentries in the dcache fix the
      current d_revalidate method by completely removing it.  This returns us to
      the current status quo.
      
      So with CONFIG_NETNS=n things should look as they have always looked.
      
      For CONFIG_NETNS=y things work most of the time but there are a few rare
      corner cases that don't behave properly.  As the network namespace is
      barely present in 2.6.24 this should not be a problem.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "Denis V. Lunev" <den@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3790ee4b
  3. 06 12月, 2007 1 次提交
    • A
      proc: fix proc_dir_entry refcounting · 5a622f2d
      Alexey Dobriyan 提交于
      Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
      Switch to usual scheme:
      * PDE is created with refcount 1
      * every de_get does +1
      * every de_put() and remove_proc_entry() do -1
      * once refcount reaches 0, PDE is freed.
      
      This elegantly fixes at least two following races (both observed) without
      introducing new locks, without abusing old locks, without spreading
      lock_kernel():
      
      1) PDE leak
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      if (atomic_read(&de->count) == 0)
      					if (atomic_dec_and_test(&de->count))
      						if (de->deleted)
      							/* also not taken! */
      							free_proc_entry(de);
      else
      	de->deleted = 1;
      		[refcount=0, deleted=1]
      
      2) use after free
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      
      					if (atomic_dec_and_test(&de->count))
      if (atomic_read(&de->count) == 0)
      	free_proc_entry(de);
      						/* boom! */
      						if (de->deleted)
      							free_proc_entry(de);
      
      BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
      printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c086340 #4)
      EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
      EIP is at strnlen+0x6/0x18
      EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
      ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
      Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
             c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
             f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
      Call Trace:
       [<c10ac4f0>] vsnprintf+0x2ad/0x49b
       [<c10ac779>] vscnprintf+0x14/0x1f
       [<c1018e6b>] vprintk+0xc5/0x2f9
       [<c10379f1>] handle_fasteoi_irq+0x0/0xab
       [<c1004f44>] do_IRQ+0x9f/0xb7
       [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
       [<c100264e>] need_resched+0x1f/0x21
       [<c10190ba>] printk+0x1b/0x1f
       [<c107c8ad>] de_put+0x3d/0x50
       [<c107c8f8>] proc_delete_inode+0x38/0x41
       [<c107c8c0>] proc_delete_inode+0x0/0x41
       [<c1066298>] generic_delete_inode+0x5e/0xc6
       [<c1065aa9>] iput+0x60/0x62
       [<c1063c8e>] d_kill+0x2d/0x46
       [<c1063fa9>] dput+0xdc/0xe4
       [<c10571a1>] __fput+0xb0/0xcd
       [<c1054e49>] filp_close+0x48/0x4f
       [<c1055ee9>] sys_close+0x67/0xa5
       [<c10026b6>] sysenter_past_esp+0x5f/0x85
      =======================
      Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
      EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44
      
      Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
      module is already pinned and remove_proc_entry() can't happen => nobody
      can mark PDE deleted.
      
      Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
      never get it, it's just for proper /proc/net removal. I double checked
      CLONE_NETNS continues to work.
      
      Patch survives many hours of modprobe/rmmod/cat loops without new bugs
      which can be attributed to refcounting.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a622f2d
  4. 01 12月, 2007 1 次提交
    • E
      [NETNS]: Fix /proc/net breakage · 2b1e300a
      Eric W. Biederman 提交于
      Well I clearly goofed when I added the initial network namespace support
      for /proc/net.  Currently things work but there are odd details visible to
      user space, even when we have a single network namespace.
      
      Since we do not cache proc_dir_entry dentries at the moment we can just
      modify ->lookup to return a different directory inode depending on the
      network namespace of the process looking at /proc/net, replacing the
      current technique of using a magic and fragile follow_link method.
      
      To accomplish that this patch:
      - introduces a shadow_proc method to allow different dentries to
        be returned from proc_lookup.
      - Removes the old /proc/net follow_link magic
      - Fixes a weakness in our not caching of proc generic dentries.
      
      As shadow_proc uses a task struct to decided which dentry to return we can
      go back later and fix the proc generic caching without modifying any code
      that uses the shadow_proc method.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2b1e300a
  5. 30 11月, 2007 1 次提交
    • A
      proc: fix NULL ->i_fop oops · c2319540
      Alexey Dobriyan 提交于
      proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
      NULL dereference during "file->f_op->readdir(file, buf, filler)".
      
      The solution is to remove proc_kill_inodes() completely:
      
      a) we don't have tricky modules implementing their tricky readdir hooks which
         could keeping this revoke from hell.
      
      b) In a situation when module is gone but PDE still alive, standard
         readdir will return only "." and "..", because pde->next was cleared by
         remove_proc_entry().
      
      c) the race proc_kill_inode() destined to prevent is not completely
         fixed, just race window made smaller, because vfs_readdir() is run
         without sb_lock held and without file_list_lock held.  Effectively,
         ->i_fop is cleared at random moment, which can't fix properly anything.
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
      printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
      Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac0 #2)
      EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
      EIP is at vfs_readdir+0x47/0x74
      EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
      ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
      Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
             00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
             00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
      Call Trace:
       [<c1061040>] filldir64+0x0/0xc5
       [<c1061295>] sys_getdents64+0x63/0xa5
       [<c10026ba>] sysenter_past_esp+0x5f/0x85
       =======================
      Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
      EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78
      
      hch: "Nice, getting rid of this is a very good step formwards.
            Unfortunately we have another copy of this junk in
            security/selinux/selinuxfs.c:sel_remove_entries() which would need the
            same treatment."
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2319540
  6. 15 11月, 2007 1 次提交
  7. 17 10月, 2007 1 次提交
    • M
      Group short-lived and reclaimable kernel allocations · e12ba74d
      Mel Gorman 提交于
      This patch marks a number of allocations that are either short-lived such as
      network buffers or are reclaimable such as inode allocations.  When something
      like updatedb is called, long-lived and unmovable kernel allocations tend to
      be spread throughout the address space which increases fragmentation.
      
      This patch groups these allocations together as much as possible by adding a
      new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
      reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
      them and re-reading the information from elsewhere.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12ba74d
  8. 17 7月, 2007 2 次提交
    • C
      procfs directory entry cleanup · 99fc06df
      Changli Gao 提交于
      Function proc_register() will assign proc_dir_operations and
      proc_dir_inode_operations to ent's members proc_fops and proc_iops
      correctly if ent is a directory. So the early assignment isn't
      necessary.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99fc06df
    • A
      Fix rmmod/read/write races in /proc entries · 786d7e16
      Alexey Dobriyan 提交于
      Fix following races:
      ===========================================
      1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
         meanwhile. Or, more generically, system call done on /proc file, method
         supplied by module is called, module dissapeares meanwhile.
      
         pde = create_proc_entry()
         if (!pde)
      	return -ENOMEM;
         pde->write_proc = ...
      				open
      				write
      				copy_from_user
         pde = create_proc_entry();
         if (!pde) {
      	remove_proc_entry();
      	return -ENOMEM;
      	/* module unloaded */
         }
      				*boom*
      ==========================================
      2. bogo-revoke aka proc_kill_inodes()
      
        remove_proc_entry		vfs_read
        proc_kill_inodes		[check ->f_op validness]
      				[check ->f_op->read validness]
      				[verify_area, security permissions checks]
      	->f_op = NULL;
      				if (file->f_op->read)
      					/* ->f_op dereference, boom */
      
      NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
      see how this scheme behaves, then extend if needed for directories.
      Directories creators in /proc only set ->owner for them, so proxying for
      directories may be unneeded.
      
      NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
      ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
      If your in-tree module uses something else, yell on me. Full audit pending.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      786d7e16
  9. 09 5月, 2007 2 次提交
  10. 15 2月, 2007 1 次提交
    • E
      [PATCH] sysctl: reimplement the sysctl proc support · 77b14db5
      Eric W. Biederman 提交于
      With this change the sysctl inodes can be cached and nothing needs to be done
      when removing a sysctl table.
      
      For a cost of 2K code we will save about 4K of static tables (when we remove
      de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
      about half that on a 32bit arch.
      
      The speed feels about the same, even though we can now cache the sysctl
      dentries :(
      
      We get the core advantage that we don't need to have a 1 to 1 mapping between
      ctl table entries and proc files.  Making it possible to have /proc/sys vary
      depending on the namespace you are in.  The currently merged namespaces don't
      have an issue here but the network namespace under /proc/sys/net needs to have
      different directories depending on which network adapters are visible.  By
      simply being a cache different directories being visible depending on who you
      are is trivial to implement.
      
      [akpm@osdl.org: fix uninitialised var]
      [akpm@osdl.org: fix ARM build]
      [bunk@stusta.de: make things static]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77b14db5
  11. 13 2月, 2007 2 次提交
  12. 09 12月, 2006 1 次提交
  13. 29 3月, 2006 1 次提交
  14. 27 3月, 2006 1 次提交
    • S
      [PATCH] protect remove_proc_entry · 64a07bd8
      Steven Rostedt 提交于
      It has been discovered that the remove_proc_entry has a race in the removing
      of entries in the proc file system that are siblings.  There's no protection
      around the traversing and removing of elements that belong in the same
      subdirectory.
      
      This subdirectory list is protected in other areas by the BKL.  So the BKL was
      at first used to protect this area too, but unfortunately, remove_proc_entry
      may be called with spinlocks held.  The BKL may schedule, so this was not a
      solution.
      
      The final solution was to add a new global spin lock to protect this list,
      called proc_subdir_lock.  This lock now protects the list in
      remove_proc_entry, and I also went around looking for other areas that this
      list is modified and added this protection there too.  Care must be taken
      since these locations call several functions that may also schedule.
      
      Since I don't see any location that these functions that modify the
      subdirectory list are called by interrupts, the irqsave/restore versions of
      the spin lock was _not_ used.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64a07bd8
  15. 09 1月, 2006 1 次提交
  16. 31 12月, 2005 1 次提交
    • L
      Insanity avoidance in /proc · 8b90db0d
      Linus Torvalds 提交于
      The old /proc interfaces were never updated to use loff_t, and are just
      generally broken.  Now, we should be using the seq_file interface for
      all of the proc files, but converting the legacy functions is more work
      than most people care for and has little upside..
      
      But at least we can make the non-LFS rules explicit, rather than just
      insanely wrapping the offset or something.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8b90db0d
  17. 31 10月, 2005 1 次提交
  18. 08 9月, 2005 1 次提交
  19. 20 8月, 2005 1 次提交
  20. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4