1. 05 8月, 2014 1 次提交
  2. 30 7月, 2014 1 次提交
    • E
      namespaces: Use task_lock and not rcu to protect nsproxy · 728dba3a
      Eric W. Biederman 提交于
      The synchronous syncrhonize_rcu in switch_task_namespaces makes setns
      a sufficiently expensive system call that people have complained.
      
      Upon inspect nsproxy no longer needs rcu protection for remote reads.
      remote reads are rare.  So optimize for same process reads and write
      by switching using rask_lock instead.
      
      This yields a simpler to understand lock, and a faster setns system call.
      
      In particular this fixes a performance regression observed
      by Rafael David Tinoco <rafael.tinoco@canonical.com>.
      
      This is effectively a revert of Pavel Emelyanov's commit
      cf7b708c Make access to task's nsproxy lighter
      from 2007.  The race this originialy fixed no longer exists as
      do_notify_parent uses task_active_pid_ns(parent) instead of
      parent->nsproxy.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      728dba3a
  3. 29 6月, 2013 1 次提交
  4. 02 5月, 2013 1 次提交
  5. 23 2月, 2013 1 次提交
  6. 19 2月, 2013 2 次提交
  7. 14 7月, 2012 1 次提交
    • A
      stop passing nameidata to ->lookup() · 00cd8dd3
      Al Viro 提交于
      Just the flags; only NFS cares even about that, but there are
      legitimate uses for such argument.  And getting rid of that
      completely would require splitting ->lookup() into a couple
      of methods (at least), so let's leave that alone for now...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      00cd8dd3
  8. 04 1月, 2012 1 次提交
  9. 28 7月, 2011 1 次提交
    • D
      proc: make struct proc_dir_entry::name a terminal array rather than a pointer · 09570f91
      David Howells 提交于
      Since __proc_create() appends the name it is given to the end of the PDE
      structure that it allocates, there isn't a need to store a name pointer.
      Instead we can just replace the name pointer with a terminal char array of
      _unspecified_ length.  The compiler will simply append the string to statically
      defined variables of PDE type overlapping any hole at the end of the structure
      and, unlike specifying an explicitly _zero_ length array, won't give a warning
      if you try to statically initialise it with a string of more than zero length.
      
      Also, whilst we're at it:
      
       (1) Move namelen to end just prior to name and reduce it to a single byte
           (name shouldn't be longer than NAME_MAX).
      
       (2) Move pde_unload_lock two places further on so that if it's four bytes in
           size on a 64-bit machine, it won't cause an unused hole in the PDE struct.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09570f91
  10. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  11. 05 1月, 2009 1 次提交
    • A
      proc: stop using BKL · b4df2b92
      Alexey Dobriyan 提交于
      There are four BKL users in proc: de_put(), proc_lookup_de(),
      proc_readdir_de(), proc_root_readdir(),
      
      1) de_put()
      -----------
      de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
      needed. BKL doesn't matter to possible refcount leak as well.
      
      2) proc_lookup_de()
      -------------------
      Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
      potentially blocking, all callers of proc_lookup_de() eventually end up
      from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
      doesn't protect anything.
      
      3) proc_readdir_de()
      --------------------
      "." and ".." part doesn't need BKL, walking PDE list is under
      proc_subdir_lock, calling filldir callback is potentially blocking
      because it writes to luserspace. All proc_readdir_de() callers
      eventually come from ->readdir hook which is under directory's
      ->i_mutex -- BKL doesn't protect anything.
      
      4) proc_root_readdir_de()
      -------------------------
      proc_root_readdir_de is ->readdir hook, see (3).
      
      Since readdir hooks doesn't use BKL anymore, switch to
      generic_file_llseek, since it also takes directory's i_mutex.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      b4df2b92
  12. 23 7月, 2008 1 次提交
  13. 18 7月, 2008 2 次提交
  14. 02 5月, 2008 1 次提交
  15. 02 4月, 2008 1 次提交
  16. 28 3月, 2008 1 次提交
  17. 26 3月, 2008 1 次提交
  18. 08 3月, 2008 1 次提交
    • P
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov 提交于
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9720acd
  19. 09 2月, 2008 1 次提交
    • A
      proc: fix ->open'less usage due to ->proc_fops flip · 2d3a4e36
      Alexey Dobriyan 提交于
      Typical PDE creation code looks like:
      
      	pde = create_proc_entry("foo", 0, NULL);
      	if (pde)
      		pde->proc_fops = &foo_proc_fops;
      
      Notice that PDE is first created, only then ->proc_fops is set up to
      final value. This is a problem because right after creation
      a) PDE is fully visible in /proc , and
      b) ->proc_fops are proc_file_operations which do not have ->open callback. So, it's
         possible to ->read without ->open (see one class of oopses below).
      
      The fix is new API called proc_create() which makes sure ->proc_fops are
      set up before gluing PDE to main tree. Typical new code looks like:
      
      	pde = proc_create("foo", 0, NULL, &foo_proc_fops);
      	if (!pde)
      		return -ENOMEM;
      
      Fix most networking users for a start.
      
      In the long run, create_proc_entry() for regular files will go.
      
      BUG: unable to handle kernel NULL pointer dereference at virtual address 00000024
      printing eip: c1188c1b *pdpt = 000000002929e001 *pde = 0000000000000000
      Oops: 0002 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      last sysfs file: /sys/block/sda/sda1/dev
      Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
      
      Pid: 24679, comm: cat Not tainted (2.6.24-rc3-mm1 #2)
      EIP: 0060:[<c1188c1b>] EFLAGS: 00210002 CPU: 0
      EIP is at mutex_lock_nested+0x75/0x25d
      EAX: 000006fe EBX: fffffffb ECX: 00001000 EDX: e9340570
      ESI: 00000020 EDI: 00200246 EBP: e9340570 ESP: e8ea1ef8
       DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      Process cat (pid: 24679, ti=E8EA1000 task=E9340570 task.ti=E8EA1000)
      Stack: 00000000 c106f7ce e8ee05b4 00000000 00000001 458003d0 f6fb6f20 fffffffb
             00000000 c106f7aa 00001000 c106f7ce 08ae9000 f6db53f0 00000020 00200246
             00000000 00000002 00000000 00200246 00200246 e8ee05a0 fffffffb e8ee0550
      Call Trace:
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c106f7ce>] seq_read+0x24/0x28a
       [<c106f7aa>] seq_read+0x0/0x28a
       [<c10818b8>] proc_reg_read+0x60/0x73
       [<c1081858>] proc_reg_read+0x0/0x73
       [<c105a34f>] vfs_read+0x6c/0x8b
       [<c105a6f3>] sys_read+0x3c/0x63
       [<c10025f2>] sysenter_past_esp+0x5f/0xa5
       [<c10697a7>] destroy_inode+0x24/0x33
       =======================
      INFO: lockdep is turned off.
      Code: 75 21 68 e1 1a 19 c1 68 87 00 00 00 68 b8 e8 1f c1 68 25 73 1f c1 e8 84 06 e9 ff e8 52 b8 e7 ff 83 c4 10 9c 5f fa e8 28 89 ea ff <f0> fe 4e 04 79 0a f3 90 80 7e 04 00 7e f8 eb f0 39 76 34 74 33
      EIP: [<c1188c1b>] mutex_lock_nested+0x75/0x25d SS:ESP 0068:e8ea1ef8
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@sw.ru>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d3a4e36
  20. 29 1月, 2008 2 次提交
  21. 01 12月, 2007 1 次提交
    • E
      [NETNS]: Fix /proc/net breakage · 2b1e300a
      Eric W. Biederman 提交于
      Well I clearly goofed when I added the initial network namespace support
      for /proc/net.  Currently things work but there are odd details visible to
      user space, even when we have a single network namespace.
      
      Since we do not cache proc_dir_entry dentries at the moment we can just
      modify ->lookup to return a different directory inode depending on the
      network namespace of the process looking at /proc/net, replacing the
      current technique of using a magic and fragile follow_link method.
      
      To accomplish that this patch:
      - introduces a shadow_proc method to allow different dentries to
        be returned from proc_lookup.
      - Removes the old /proc/net follow_link magic
      - Fixes a weakness in our not caching of proc generic dentries.
      
      As shadow_proc uses a task struct to decided which dentry to return we can
      go back later and fix the proc generic caching without modifying any code
      that uses the shadow_proc method.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      2b1e300a
  22. 13 11月, 2007 1 次提交
  23. 07 11月, 2007 1 次提交
  24. 27 10月, 2007 1 次提交
    • E
      [NET]: Marking struct pernet_operations __net_initdata was inappropriate · 2b008b0a
      Eric W. Biederman 提交于
      It is not safe to to place struct pernet_operations in a special section.
      We need struct pernet_operations to last until we call unregister_pernet_subsys.
      Which doesn't happen until module unload.
      
      So marking struct pernet_operations is a disaster for modules in two ways.
      - We discard it before we call the exit method it points to.
      - Because I keep struct pernet_operations on a linked list discarding
        it for compiled in code removes elements in the middle of a linked
        list and does horrible things for linked insert.
      
      So this looks safe assuming __exit_refok is not discarded
      for modules.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b008b0a
  25. 26 10月, 2007 1 次提交
  26. 11 10月, 2007 4 次提交
    • P
      [NETNS]: Move some code into __init section when CONFIG_NET_NS=n · 4665079c
      Pavel Emelyanov 提交于
      With the net namespaces many code leaved the __init section,
      thus making the kernel occupy more memory than it did before.
      Since we have a config option that prohibits the namespace
      creation, the functions that initialize/finalize some netns
      stuff are simply not needed and can be freed after the boot.
      
      Currently, this is almost not noticeable, since few calls
      are no longer in __init, but when the namespaces will be
      merged it will be possible to free more code. I propose to
      use the __net_init, __net_exit and __net_initdata "attributes"
      for functions/variables that are not used if the CONFIG_NET_NS
      is not set to save more space in memory.
      
      The exiting functions cannot just reside in the __exit section,
      as noticed by David, since the init section will have
      references on it and the compilation will fail due to modpost
      checks. These references can exist, since the init namespace
      never dies and the exit callbacks are never called. So I
      introduce the __exit_refok attribute just like it is already
      done with the __init_refok.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4665079c
    • E
      [NET]: Fix race when opening a proc file while a network namespace is exiting. · 077130c0
      Eric W. Biederman 提交于
      The problem:  proc_net files remember which network namespace the are
      against but do not remember hold a reference count (as that would pin
      the network namespace).   So we currently have a small window where
      the reference count on a network namespace may be incremented when opening
      a /proc file when it has already gone to zero.
      
      To fix this introduce maybe_get_net and get_proc_net.
      
      maybe_get_net increments the network namespace reference count only if it is
      greater then zero, ensuring we don't increment a reference count after it
      has gone to zero.
      
      get_proc_net handles all of the magic to go from a proc inode to the network
      namespace instance and call maybe_get_net on it.
      
      PROC_NET the old accessor is removed so that we don't get confused and use
      the wrong helper function.
      
      Then I fix up the callers to use get_proc_net and handle the case case
      where get_proc_net returns NULL.  In that case I return -ENXIO because
      effectively the network namespace has already gone away so the files
      we are trying to access don't exist anymore.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Acked-by: NPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      077130c0
    • D
      [NETNS]: Fix export symbols. · 36ac3135
      Daniel Lezcano 提交于
      Add the appropriate EXPORT_SYMBOLS for proc_net_create,
      proc_net_fops_create and proc_net_remove to fix errors when
      compiling allmodconfig
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Acked-by: NBenjamin Thery <benjamin.thery@bull.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36ac3135
    • D
      [NET]: Fix missed addition of fs/proc/proc_net.c · 3c12afe7
      David S. Miller 提交于
      My bad.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c12afe7