1. 02 12月, 2009 2 次提交
    • E
      net: Automatically allocate per namespace data. · f875bae0
      Eric W. Biederman 提交于
      To get the full benefit of batched network namespace cleanup netowrk
      device deletion needs to be performed by the generic code.  When
      using register_pernet_gen_device and freeing the data in exit_net
      it is impossible to delay allocation until after exit_net has called
      as the device uninit methods are no longer safe.
      
      To correct this, and to simplify working with per network namespace data
      I have moved allocation and deletion of per network namespace data into
      the network namespace core.  The core now frees the data only after
      all of the network namespace exit routines have run.
      
      Now it is only required to set the new fields .id and .size
      in the pernet_operations structure if you want network namespace
      data to be managed for you automatically.
      
      This makes the current register_pernet_gen_device and
      register_pernet_gen_subsys routines unnecessary.  For the moment
      I have left them as compatibility wrappers in net_namespace.h
      They will be removed once all of the users have been updated.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f875bae0
    • E
      net: Batch network namespace destruction. · 2b035b39
      Eric W. Biederman 提交于
      It is fairly common to kill several network namespaces at once.  Either
      because they are nested one inside the other or because they are cooperating
      in multiple machine networking experiments.  As the network stack control logic
      does not parallelize easily batch up multiple network namespaces existing
      together.
      
      To get the full benefit of batching the virtual network devices to be
      removed must be all removed in one batch.  For that purpose I have added
      a loop after the last network device operations have run that batches
      up all remaining network devices and deletes them.
      
      An extra benefit is that the reorganization slightly shrinks the size
      of the per network namespace data structures replaceing a work_struct
      with a list_head.
      
      In a trivial test with 4K namespaces this change reduced the cost of
      a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
      (at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b035b39
  2. 24 10月, 2009 1 次提交
  3. 08 10月, 2009 1 次提交
    • J
      wext: refactor · 3d23e349
      Johannes Berg 提交于
      Refactor wext to
       * split out iwpriv handling
       * split out iwspy handling
       * split out procfs support
       * allow cfg80211 to have wireless extensions compat code
         w/o CONFIG_WIRELESS_EXT
      
      After this, drivers need to
       - select WIRELESS_EXT	- for wext support
       - select WEXT_PRIV	- for iwpriv support
       - select WEXT_SPY	- for iwspy support
      
      except cfg80211 -- which gets new hooks in wext-core.c
      and can then get wext handlers without CONFIG_WIRELESS_EXT.
      
      Wireless extensions procfs support is auto-selected
      based on PROC_FS and anything that requires the wext core
      (i.e. WIRELESS_EXT or CFG80211_WEXT).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      3d23e349
  4. 20 7月, 2009 1 次提交
  5. 15 7月, 2009 1 次提交
  6. 13 7月, 2009 3 次提交
    • J
      net: move and export get_net_ns_by_pid · 30ffee84
      Johannes Berg 提交于
      The function get_net_ns_by_pid(), to get a network
      namespace from a pid_t, will be required in cfg80211
      as well. Therefore, let's move it to net_namespace.c
      and export it. We can't make it a static inline in
      the !NETNS case because it needs to verify that the
      given pid even exists (and return -ESRCH).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30ffee84
    • J
      genetlink: make netns aware · 134e6375
      Johannes Berg 提交于
      This makes generic netlink network namespace aware. No
      generic netlink families except for the controller family
      are made namespace aware, they need to be checked one by
      one and then set the family->netnsok member to true.
      
      A new function genlmsg_multicast_netns() is introduced to
      allow sending a multicast message in a given namespace,
      for example when it applies to an object that lives in
      that namespace, a new function genlmsg_multicast_allns()
      to send a message to all network namespaces (for objects
      that do not have an associated netns).
      
      The function genlmsg_multicast() is changed to multicast
      the message in just init_net, which is currently correct
      for all generic netlink families since they only work in
      init_net right now. Some will later want to work in all
      net namespaces because they do not care about the netns
      at all -- those will have to be converted to use one of
      the new functions genlmsg_multicast_allns() or
      genlmsg_multicast_netns() whenever they are made netns
      aware in some way.
      
      After this patch families can easily decide whether or
      not they should be available in all net namespaces. Many
      genl families us it for objects not related to networking
      and should therefore be available in all namespaces, but
      that will have to be done on a per family basis.
      
      Note that this doesn't touch on the checkpoint/restart
      problem where network namespaces could be used, genl
      families and multicast groups are numbered globally and
      I see no easy way of changing that, especially since it
      must be possible to multicast to all network namespaces
      for those families that do not care about netns.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      134e6375
    • J
      net: make namespace iteration possible under RCU · 11a28d37
      Johannes Berg 提交于
      All we need to take care of is using proper RCU list
      add/del primitives and inserting a synchronize_rcu()
      at one place to make sure the exit notifiers are run
      after everybody has stopped iterating the list.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11a28d37
  7. 03 3月, 2009 1 次提交
    • E
      netns: Remove net_alive · 17edde52
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17edde52
  8. 23 2月, 2009 1 次提交
    • E
      netns: Remove net_alive · ce16c533
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce16c533
  9. 26 11月, 2008 1 次提交
  10. 12 11月, 2008 1 次提交
  11. 31 10月, 2008 1 次提交
  12. 08 10月, 2008 1 次提交
  13. 27 7月, 2008 1 次提交
    • A
      [PATCH] beginning of sysctl cleanup - ctl_table_set · 73455092
      Al Viro 提交于
      New object: set of sysctls [currently - root and per-net-ns].
      Contains: pointer to parent set, list of tables and "should I see this set?"
      method (->is_seen(set)).
      Current lists of tables are subsumed by that; net-ns contains such a beast.
      ->lookup() for ctl_table_root returns pointer to ctl_table_set instead of
      that to ->list of that ctl_table_set.
      
      [folded compile fixes by rdd for configs without sysctl]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73455092
  14. 18 7月, 2008 1 次提交
  15. 21 6月, 2008 1 次提交
    • E
      netns: Don't receive new packets in a dead network namespace. · b9f75f45
      Eric W. Biederman 提交于
      Alexey Dobriyan <adobriyan@gmail.com> writes:
      > Subject: ICMP sockets destruction vs ICMP packets oops
      
      > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
      > icmp_reply() wants ICMP socket.
      >
      > Steps to reproduce:
      >
      > 	launch shell in new netns
      > 	move real NIC to netns
      > 	setup routing
      > 	ping -i 0
      > 	exit from shell
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > PGD 17f3cd067 PUD 17f3ce067 PMD 0 
      > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
      > CPU 0 
      > Modules linked in: usblp usbcore
      > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
      > RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
      > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
      > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
      > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
      > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
      > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
      > FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
      > Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
      >  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
      >  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
      > Call Trace:
      >  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
      >  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
      >  [<ffffffff803fd645>] icmp_echo+0x65/0x70
      >  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
      >  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
      >  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
      >  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
      >  [<ffffffff803be57d>] process_backlog+0x9d/0x100
      >  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
      >  [<ffffffff80237be5>] __do_softirq+0x75/0x100
      >  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
      >  [<ffffffff8020f085>] do_softirq+0x65/0xa0
      >  [<ffffffff80237af7>] irq_exit+0x97/0xa0
      >  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
      >  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
      >  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
      >  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
      >  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
      >  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
      >  [<ffffffff8040f380>] ? rest_init+0x70/0x80
      > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
      > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
      > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
      > RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      >  RSP <ffffffff8057fc30>
      > CR2: 0000000000000000
      > ---[ end trace ea161157b76b33e8 ]---
      > Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      Receiving packets while we are cleaning up a network namespace is a
      racy proposition. It is possible when the packet arrives that we have
      removed some but not all of the state we need to fully process it.  We
      have the choice of either playing wack-a-mole with the cleanup routines
      or simply dropping packets when we don't have a network namespace to
      handle them.
      
      Since the check looks inexpensive in netif_receive_skb let's just
      drop the incoming packets.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f75f45
  16. 20 5月, 2008 1 次提交
  17. 16 4月, 2008 1 次提交
  18. 15 4月, 2008 2 次提交
    • P
      [NETNS]: The generic per-net pointers. · dec827d1
      Pavel Emelyanov 提交于
      Add the elastic array of void * pointer to the struct net.
      The access rules are simple:
      
       1. register the ops with register_pernet_gen_device to get
          the id of your private pointer
       2. call net_assign_generic() to put the private data on the
          struct net (most preferably this should be done in the
          ->init callback of the ops registered)
       3. do not store any private reference on the net_generic array;
       4. do not change this pointer while the net is alive;
       5. use the net_generic() to get the pointer.
      
      When adding a new pointer, I copy the old array, replace it
      with a new one and schedule the old for kfree after an RCU
      grace period.
      
      Since the net_generic explores the net->gen array inside rcu
      read section and once set the net->gen->ptr[x] pointer never 
      changes, this grants us a safe access to generic pointers.
      
      Quoting Paul: "... RCU is protecting -only- the net_generic 
      structure that net_generic() is traversing, and the [pointer]
      returned by net_generic() is protected by a reference counter 
      in the upper-level struct net."
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dec827d1
    • P
      [NETNS]: The net-subsys IDs generator. · c93cf61f
      Pavel Emelyanov 提交于
      To make some per-net generic pointers, we need some way to address
      them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
      subsystems.
      
      Addressing questions about potential checkpoint/restart problems: 
      these IDs are "lite-offsets" within the net structure and are by no 
      means supposed to be exported to the userspace.
      
      Since it will be used in the nearest future by devices only (tun,
      vlan, tunnels, bridge, etc), I make it resemble the functionality
      of register_pernet_device().
      
      The new ids is stored in the *id pointer _before_ calling the init
      callback to make this id available in this callback.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c93cf61f
  19. 14 4月, 2008 1 次提交
  20. 04 4月, 2008 1 次提交
  21. 02 4月, 2008 2 次提交
  22. 01 4月, 2008 1 次提交
  23. 26 3月, 2008 1 次提交
  24. 08 3月, 2008 1 次提交
    • P
      [NET]: Make /proc/net a symlink on /proc/self/net (v3) · e9720acd
      Pavel Emelyanov 提交于
      Current /proc/net is done with so called "shadows", but current
      implementation is broken and has little chances to get fixed.
      
      The problem is that dentries subtree of /proc/net directory has
      fancy revalidation rules to make processes living in different
      net namespaces see different entries in /proc/net subtree, but
      currently, tasks see in the /proc/net subdir the contents of any
      other namespace, depending on who opened the file first.
      
      The proposed fix is to turn /proc/net into a symlink, which points
      to /proc/self/net, which in turn shows what previously was in
      /proc/net - the network-related info, from the net namespace the
      appropriate task lives in.
      
      # ls -l /proc/net
      lrwxrwxrwx  1 root root 8 Mar  5 15:17 /proc/net -> self/net
      
      In other words - this behaves like /proc/mounts, but unlike
      "mounts", "net" is not a file, but a directory.
      
      Changes from v2:
      * Fixed discrepancy of /proc/net nlink count and selinux labeling
        screwup pointed out by Stephen.
      
        To get the correct nlink count the ->getattr callback for /proc/net
        is overridden to read one from the net->proc_net entry.
      
        To make selinux still work the net->proc_net entry is initialized
        properly, i.e. with the "net" name and the proc_net parent.
      
      Selinux fixes are
      Acked-by: NStephen Smalley <sds@tycho.nsa.gov>
      
      Changes from v1:
      * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9720acd
  25. 01 2月, 2008 1 次提交
  26. 29 1月, 2008 10 次提交