1. 05 1月, 2009 1 次提交
    • A
      proc: stop using BKL · b4df2b92
      Alexey Dobriyan 提交于
      There are four BKL users in proc: de_put(), proc_lookup_de(),
      proc_readdir_de(), proc_root_readdir(),
      
      1) de_put()
      -----------
      de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
      needed. BKL doesn't matter to possible refcount leak as well.
      
      2) proc_lookup_de()
      -------------------
      Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
      potentially blocking, all callers of proc_lookup_de() eventually end up
      from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
      doesn't protect anything.
      
      3) proc_readdir_de()
      --------------------
      "." and ".." part doesn't need BKL, walking PDE list is under
      proc_subdir_lock, calling filldir callback is potentially blocking
      because it writes to luserspace. All proc_readdir_de() callers
      eventually come from ->readdir hook which is under directory's
      ->i_mutex -- BKL doesn't protect anything.
      
      4) proc_root_readdir_de()
      -------------------------
      proc_root_readdir_de is ->readdir hook, see (3).
      
      Since readdir hooks doesn't use BKL anymore, switch to
      generic_file_llseek, since it also takes directory's i_mutex.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      b4df2b92
  2. 23 10月, 2008 1 次提交
  3. 14 9月, 2008 1 次提交
  4. 25 8月, 2008 1 次提交
  5. 01 8月, 2008 2 次提交
  6. 27 7月, 2008 1 次提交
  7. 26 7月, 2008 1 次提交
  8. 02 5月, 2008 1 次提交
  9. 29 4月, 2008 6 次提交
    • D
      proc: introduce proc_create_data to setup de->data · 59b74351
      Denis V. Lunev 提交于
      This set of patches fixes an proc ->open'less usage due to ->proc_fops flip in
      the most part of the kernel code.  The original OOPS is described in the
      commit 2d3a4e36:
      
          Typical PDE creation code looks like:
      
          	pde = create_proc_entry("foo", 0, NULL);
          	if (pde)
          		pde->proc_fops = &foo_proc_fops;
      
          Notice that PDE is first created, only then ->proc_fops is set up to
          final value. This is a problem because right after creation
          a) PDE is fully visible in /proc , and
          b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
             possible to ->read without ->open (see one class of oopses below).
      
          The fix is new API called proc_create() which makes sure ->proc_fops are
          set up before gluing PDE to main tree. Typical new code looks like:
      
          	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
          	if (!pde)
          		return -ENOMEM;
      
          Fix most networking users for a start.
      
          In the long run, create_proc_entry() for regular files will go.
      
      In addition to this, proc_create_data is introduced to fix reading from
      proc without PDE->data. The race is basically the same as above.
      
      create_proc_entries is replaced in the entire kernel code as new method
      is also simply better.
      
      This patch:
      
      The problem is the same as for de->proc_fops.  Right now PDE becomes visible
      without data set.  So, the entry could be looked up without data.  This, in
      most cases, will simply OOPS.
      
      proc_create_data call is created to address this issue.  proc_create now
      becomes a wrapper around it.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Karsten Keil <kkeil@suse.de>
      Cc: Kyle McMartin <kyle@parisc-linux.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Nadia Derbey <Nadia.Derbey@bull.net>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Osterlund <petero2@telia.com>
      Cc: Pierre Peiffer <peifferp@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59b74351
    • A
      proc: remove ->get_info infrastructure · 8731f14d
      Alexey Dobriyan 提交于
      Now that last dozen or so users of ->get_info were removed, ditch it too.
      Everyone sane shouldd have switched to seq_file interface long ago.
      
      P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
            is long-standing crap, BTW, thus
            a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
            b) new such users should be rejected,
            c) everyone is encouraged to convert his favourite ->read_proc user or
               I'll do it, lazy bastards.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8731f14d
    • A
      proc: drop several "PDE valid/invalid" checks · 5e971dce
      Alexey Dobriyan 提交于
      proc-misc code is noticeably full of "if (de)" checks when PDE passed is
      always valid.  Remove them.
      
      Addition of such check in proc_lookup_de() is for failed lookup case.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e971dce
    • A
      proc: less special case in xlate code · 7cee4e00
      Alexey Dobriyan 提交于
      If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
      PDE should consist of only one path component, otherwise creation or or
      removal will fail.  However, if NULL is passed as parent then create/remove
      accept full path as a argument.  This is arbitrary restriction -- all
      infrastructure is in place.
      
      So, patch allows the following to succeed:
      
      	create_proc_entry("foo/bar", 0, pde_baz);
      	remove_proc_entry("baz/foo/bar", &proc_root);
      
      Also makes the following to behave identically:
      
      	create_proc_entry("foo/bar", 0, NULL);
      	create_proc_entry("foo/bar", 0, &proc_root);
      
      Discrepancy noticed by Den Lunev (IIRC).
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7cee4e00
    • A
      proc: simplify locking in remove_proc_entry() · f649d6d3
      Alexey Dobriyan 提交于
      proc_subdir_lock protects only modifying and walking through PDE lists, so
      after we've found PDE to remove and actually removed it from lists, there is
      no need to hold proc_subdir_lock for the rest of operation.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f649d6d3
    • A
      proc: print more information when removing non-empty directories · e93b4ea2
      Alexey Dobriyan 提交于
      This usually saves one recompile to insert similar printk like below. :)
      
      Sample nastygram:
      
      remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
      ------------[ cut here ]------------
      WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
      Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
      Pid: 3034, comm: rmmod Tainted: G   M     2.6.25-rc1 #5
      
      Call Trace:
       [<ffffffff80231974>] warn_on_slowpath+0x64/0x90
       [<ffffffff80232a6e>] printk+0x4e/0x60
       [<ffffffff802d6c8a>] remove_proc_entry+0x18a/0x200
       [<ffffffff8045cd88>] mutex_lock_nested+0x1c8/0x2d0
       [<ffffffff8025f0f0>] __try_stop_module+0x0/0x40
       [<ffffffff8025effd>] sys_delete_module+0x14d/0x200
       [<ffffffff8045df3d>] lockdep_sys_exit_thunk+0x35/0x67
       [<ffffffff8031c307>] __up_read+0x27/0xa0
       [<ffffffff8045decc>] trace_hardirqs_on_thunk+0x35/0x3a
       [<ffffffff8020b6ab>] system_call_after_swapgs+0x7b/0x80
      
      ---[ end trace 10ef850597e89c54 ]---
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e93b4ea2
  10. 08 3月, 2008 1 次提交
    • P
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov 提交于
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9720acd
  11. 09 2月, 2008 5 次提交
    • A
      proc: fix ->open'less usage due to ->proc_fops flip · 2d3a4e36
      Alexey Dobriyan 提交于
      Typical PDE creation code looks like:
      
      	pde = create_proc_entry("foo", 0, NULL);
      	if (pde)
      		pde->proc_fops = &foo_proc_fops;
      
      Notice that PDE is first created, only then ->proc_fops is set up to
      final value. This is a problem because right after creation
      a) PDE is fully visible in /proc , and
      b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
         possible to ->read without ->open (see one class of oopses below).
      
      The fix is new API called proc_create() which makes sure ->proc_fops are
      set up before gluing PDE to main tree. Typical new code looks like:
      
      	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
      	if (!pde)
      		return -ENOMEM;
      
      Fix most networking users for a start.
      
      In the long run, create_proc_entry() for regular files will go.
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
      printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/block/sda/sda1/dev
      Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      
      Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
      EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
      EIP is at mutex_lock_nested+0x75/0x25d
      EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
      ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
      Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
             00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
             00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
      Call Trace:
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c10818b8>] proc_reg_read+0x60/0x73
       [<c1081858>] proc_reg_read+0x0/0x73
       [<c105a34f>] vfs_read+0x6c/0x8b
       [<c105a6f3>] sys_read+0x3c/0x63
       [<c10025f2>] sysenter_past_esp+0x5f/0xa5
       [<c10697a7>] destroy_inode+0x24/0x33
       =======================
      INFO: lockdep is turned off.
      Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
      EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d3a4e36
    • Z
      proc: detect duplicate names on registration · 94413d88
      Zhang Rui 提交于
      Print a warning if PDE is registered with a name which already exists in
      target directory.
      
      Bug report and a simple fix can be found here:
      http://bugzilla.kernel.org/show_bug.cgi?id=8798
      
      [\n fixlet and no undescriptive variable usage --adobriyan]
      [akpm@linux-foundation.org: make printk comprehensible]
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      94413d88
    • A
      proc: remove useless check on symlink removal · fd2cbe48
      Alexey Dobriyan 提交于
      proc symlinks always have valid ->data containing destination of symlink.  No
      need to check it on removal -- proc_symlink() already done it.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fd2cbe48
    • A
      proc: simplify function prototypes · 76df0c25
      Alexey Dobriyan 提交于
      Move code around so as to reduce the number of forward-declarations.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76df0c25
    • A
      proc: less LOCK operations during lookup · 4237e0d3
      Alexey Dobriyan 提交于
      Pseudo-code for lookup effectively is:
      
      	LOCK kernel
      	LOCK proc_subdir_lock
      		find PDE
      		UNLOCK proc_subdir_lock
      
      		get inode
      
      		LOCK proc_subdir_lock
      		goto unlock
      	UNLOCK proc_subdir_lock
      	UNLOCK kernel
      
      We can get rid of LOCK/UNLOCK pair after getting inode simply by jumping
      to unlock_kernel() directly.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4237e0d3
  12. 11 12月, 2007 1 次提交
    • E
      proc: remove/Fix proc generic d_revalidate · 3790ee4b
      Eric W. Biederman 提交于
      Ultimately to implement /proc perfectly we need an implementation of
      d_revalidate because files and directories can be removed behind the back
      of the VFS, and d_revalidate is the only way we can let the VFS know that
      this has happened.
      
      Unfortunately the linux VFS can not cope with anything in the path to a
      mount point going away.  So a proper d_revalidate method that calls d_drop
      also needs to call have_submounts which is moderately expensive, so you
      really don't want a d_revalidate method that unconditionally calls it, but
      instead only calls it when the backing object has really gone away.
      
      proc generic entries only disappear on module_unload (when not counting the
      fledgling network namespace) so it is quite rare that we actually encounter
      that case and has not actually caused us real world trouble yet.
      
      So until we get a proper test for keeping dentries in the dcache fix the
      current d_revalidate method by completely removing it.  This returns us to
      the current status quo.
      
      So with CONFIG_NETNS=n things should look as they have always looked.
      
      For CONFIG_NETNS=y things work most of the time but there are a few rare
      corner cases that don't behave properly.  As the network namespace is
      barely present in 2.6.24 this should not be a problem.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "Denis V. Lunev" <den@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3790ee4b
  13. 06 12月, 2007 1 次提交
    • A
      proc: fix proc_dir_entry refcounting · 5a622f2d
      Alexey Dobriyan 提交于
      Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
      Switch to usual scheme:
      * PDE is created with refcount 1
      * every de_get does +1
      * every de_put() and remove_proc_entry() do -1
      * once refcount reaches 0, PDE is freed.
      
      This elegantly fixes at least two following races (both observed) without
      introducing new locks, without abusing old locks, without spreading
      lock_kernel():
      
      1) PDE leak
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      if (atomic_read(&de->count) == 0)
      					if (atomic_dec_and_test(&de->count))
      						if (de->deleted)
      							/* also not taken! */
      							free_proc_entry(de);
      else
      	de->deleted = 1;
      		[refcount=0, deleted=1]
      
      2) use after free
      
      remove_proc_entry			de_put
      -----------------			------
      			[refcnt = 1]
      
      					if (atomic_dec_and_test(&de->count))
      if (atomic_read(&de->count) == 0)
      	free_proc_entry(de);
      						/* boom! */
      						if (de->deleted)
      							free_proc_entry(de);
      
      BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
      printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c086340 #4)
      EIP: 0060:[<c10acdda>] EFLAGS: 00210097 CPU: 1
      EIP is at strnlen+0x6/0x18
      EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
      ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
      Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
             c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
             f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
      Call Trace:
       [<c10ac4f0>] vsnprintf+0x2ad/0x49b
       [<c10ac779>] vscnprintf+0x14/0x1f
       [<c1018e6b>] vprintk+0xc5/0x2f9
       [<c10379f1>] handle_fasteoi_irq+0x0/0xab
       [<c1004f44>] do_IRQ+0x9f/0xb7
       [<c117db3b>] preempt_schedule_irq+0x3f/0x5b
       [<c100264e>] need_resched+0x1f/0x21
       [<c10190ba>] printk+0x1b/0x1f
       [<c107c8ad>] de_put+0x3d/0x50
       [<c107c8f8>] proc_delete_inode+0x38/0x41
       [<c107c8c0>] proc_delete_inode+0x0/0x41
       [<c1066298>] generic_delete_inode+0x5e/0xc6
       [<c1065aa9>] iput+0x60/0x62
       [<c1063c8e>] d_kill+0x2d/0x46
       [<c1063fa9>] dput+0xdc/0xe4
       [<c10571a1>] __fput+0xb0/0xcd
       [<c1054e49>] filp_close+0x48/0x4f
       [<c1055ee9>] sys_close+0x67/0xa5
       [<c10026b6>] sysenter_past_esp+0x5f/0x85
      =======================
      Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 <80> 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
      EIP: [<c10acdda>] strnlen+0x6/0x18 SS:ESP 0068:f380be44
      
      Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
      module is already pinned and remove_proc_entry() can't happen => nobody
      can mark PDE deleted.
      
      Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
      never get it, it's just for proper /proc/net removal. I double checked
      CLONE_NETNS continues to work.
      
      Patch survives many hours of modprobe/rmmod/cat loops without new bugs
      which can be attributed to refcounting.
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a622f2d
  14. 01 12月, 2007 1 次提交
    • E
      [NETNS]: Fix /proc/net breakage · 2b1e300a
      Eric W. Biederman 提交于
      Well I clearly goofed when I added the initial network namespace support
      for /proc/net.  Currently things work but there are odd details visible to
      user space, even when we have a single network namespace.
      
      Since we do not cache proc_dir_entry dentries at the moment we can just
      modify ->lookup to return a different directory inode depending on the
      network namespace of the process looking at /proc/net, replacing the
      current technique of using a magic and fragile follow_link method.
      
      To accomplish that this patch:
      - introduces a shadow_proc method to allow different dentries to
        be returned from proc_lookup.
      - Removes the old /proc/net follow_link magic
      - Fixes a weakness in our not caching of proc generic dentries.
      
      As shadow_proc uses a task struct to decided which dentry to return we can
      go back later and fix the proc generic caching without modifying any code
      that uses the shadow_proc method.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2b1e300a
  15. 30 11月, 2007 1 次提交
    • A
      proc: fix NULL ->i_fop oops · c2319540
      Alexey Dobriyan 提交于
      proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
      NULL dereference during "file->f_op->readdir(file, buf, filler)".
      
      The solution is to remove proc_kill_inodes() completely:
      
      a) we don't have tricky modules implementing their tricky readdir hooks which
         could keeping this revoke from hell.
      
      b) In a situation when module is gone but PDE still alive, standard
         readdir will return only "." and "..", because pde->next was cleared by
         remove_proc_entry().
      
      c) the race proc_kill_inode() destined to prevent is not completely
         fixed, just race window made smaller, because vfs_readdir() is run
         without sb_lock held and without file_list_lock held.  Effectively,
         ->i_fop is cleared at random moment, which can't fix properly anything.
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
      printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
      Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac0 #2)
      EIP: 0060:[<c1061205>] EFLAGS: 00010246 CPU: 0
      EIP is at vfs_readdir+0x47/0x74
      EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
      ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
      Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
             00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
             00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
      Call Trace:
       [<c1061040>] filldir64+0x0/0xc5
       [<c1061295>] sys_getdents64+0x63/0xa5
       [<c10026ba>] sysenter_past_esp+0x5f/0x85
       =======================
      Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 <ff> 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
      EIP: [<c1061205>] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78
      
      hch: "Nice, getting rid of this is a very good step formwards.
            Unfortunately we have another copy of this junk in
            security/selinux/selinuxfs.c:sel_remove_entries() which would need the
            same treatment."
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Acked-by: NChristoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2319540
  16. 15 11月, 2007 1 次提交
  17. 17 10月, 2007 1 次提交
    • M
      Group short-lived and reclaimable kernel allocations · e12ba74d
      Mel Gorman 提交于
      This patch marks a number of allocations that are either short-lived such as
      network buffers or are reclaimable such as inode allocations.  When something
      like updatedb is called, long-lived and unmovable kernel allocations tend to
      be spread throughout the address space which increases fragmentation.
      
      This patch groups these allocations together as much as possible by adding a
      new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
      reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
      them and re-reading the information from elsewhere.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12ba74d
  18. 17 7月, 2007 2 次提交
    • C
      procfs directory entry cleanup · 99fc06df
      Changli Gao 提交于
      Function proc_register() will assign proc_dir_operations and
      proc_dir_inode_operations to ent's members proc_fops and proc_iops
      correctly if ent is a directory. So the early assignment isn't
      necessary.
      
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      99fc06df
    • A
      Fix rmmod/read/write races in /proc entries · 786d7e16
      Alexey Dobriyan 提交于
      Fix following races:
      ===========================================
      1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
         meanwhile. Or, more generically, system call done on /proc file, method
         supplied by module is called, module dissapeares meanwhile.
      
         pde = create_proc_entry()
         if (!pde)
      	return -ENOMEM;
         pde->write_proc = ...
      				open
      				write
      				copy_from_user
         pde = create_proc_entry();
         if (!pde) {
      	remove_proc_entry();
      	return -ENOMEM;
      	/* module unloaded */
         }
      				*boom*
      ==========================================
      2. bogo-revoke aka proc_kill_inodes()
      
        remove_proc_entry		vfs_read
        proc_kill_inodes		[check ->f_op validness]
      				[check ->f_op->read validness]
      				[verify_area, security permissions checks]
      	->f_op = NULL;
      				if (file->f_op->read)
      					/* ->f_op dereference, boom */
      
      NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
      see how this scheme behaves, then extend if needed for directories.
      Directories creators in /proc only set ->owner for them, so proxying for
      directories may be unneeded.
      
      NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
      ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
      If your in-tree module uses something else, yell on me. Full audit pending.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      786d7e16
  19. 09 5月, 2007 2 次提交
  20. 15 2月, 2007 1 次提交
    • E
      [PATCH] sysctl: reimplement the sysctl proc support · 77b14db5
      Eric W. Biederman 提交于
      With this change the sysctl inodes can be cached and nothing needs to be done
      when removing a sysctl table.
      
      For a cost of 2K code we will save about 4K of static tables (when we remove
      de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
      about half that on a 32bit arch.
      
      The speed feels about the same, even though we can now cache the sysctl
      dentries :(
      
      We get the core advantage that we don't need to have a 1 to 1 mapping between
      ctl table entries and proc files.  Making it possible to have /proc/sys vary
      depending on the namespace you are in.  The currently merged namespaces don't
      have an issue here but the network namespace under /proc/sys/net needs to have
      different directories depending on which network adapters are visible.  By
      simply being a cache different directories being visible depending on who you
      are is trivial to implement.
      
      [akpm@osdl.org: fix uninitialised var]
      [akpm@osdl.org: fix ARM build]
      [bunk@stusta.de: make things static]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77b14db5
  21. 13 2月, 2007 2 次提交
  22. 09 12月, 2006 1 次提交
  23. 29 3月, 2006 1 次提交
  24. 27 3月, 2006 1 次提交
    • S
      [PATCH] protect remove_proc_entry · 64a07bd8
      Steven Rostedt 提交于
      It has been discovered that the remove_proc_entry has a race in the removing
      of entries in the proc file system that are siblings.  There's no protection
      around the traversing and removing of elements that belong in the same
      subdirectory.
      
      This subdirectory list is protected in other areas by the BKL.  So the BKL was
      at first used to protect this area too, but unfortunately, remove_proc_entry
      may be called with spinlocks held.  The BKL may schedule, so this was not a
      solution.
      
      The final solution was to add a new global spin lock to protect this list,
      called proc_subdir_lock.  This lock now protects the list in
      remove_proc_entry, and I also went around looking for other areas that this
      list is modified and added this protection there too.  Care must be taken
      since these locations call several functions that may also schedule.
      
      Since I don't see any location that these functions that modify the
      subdirectory list are called by interrupts, the irqsave/restore versions of
      the spin lock was _not_ used.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64a07bd8
  25. 09 1月, 2006 1 次提交
  26. 31 12月, 2005 1 次提交
    • L
      Insanity avoidance in /proc · 8b90db0d
      Linus Torvalds 提交于
      The old /proc interfaces were never updated to use loff_t, and are just
      generally broken.  Now, we should be using the seq_file interface for
      all of the proc files, but converting the legacy functions is more work
      than most people care for and has little upside..
      
      But at least we can make the non-LFS rules explicit, rather than just
      insanely wrapping the offset or something.
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8b90db0d
  27. 31 10月, 2005 1 次提交