1. 21 6月, 2008 1 次提交
    • E
      netns: Don't receive new packets in a dead network namespace. · b9f75f45
      Eric W. Biederman 提交于
      Alexey Dobriyan <adobriyan@gmail.com> writes:
      > Subject: ICMP sockets destruction vs ICMP packets oops
      
      > After icmp_sk_exit() nuked ICMP sockets, we get an interrupt.
      > icmp_reply() wants ICMP socket.
      >
      > Steps to reproduce:
      >
      > 	launch shell in new netns
      > 	move real NIC to netns
      > 	setup routing
      > 	ping -i 0
      > 	exit from shell
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
      > IP: [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > PGD 17f3cd067 PUD 17f3ce067 PMD 0 
      > Oops: 0000 [1] PREEMPT SMP DEBUG_PAGEALLOC
      > CPU 0 
      > Modules linked in: usblp usbcore
      > Pid: 0, comm: swapper Not tainted 2.6.26-rc6-netns-ct #4
      > RIP: 0010:[<ffffffff803fce17>]  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      > RSP: 0018:ffffffff8057fc30  EFLAGS: 00010286
      > RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff81017c7db900
      > RDX: 0000000000000034 RSI: ffff81017c7db900 RDI: ffff81017dc41800
      > RBP: ffffffff8057fc40 R08: 0000000000000001 R09: 000000000000a815
      > R10: 0000000000000000 R11: 0000000000000001 R12: ffffffff8057fd28
      > R13: ffffffff8057fd00 R14: ffff81017c7db938 R15: ffff81017dc41800
      > FS:  0000000000000000(0000) GS:ffffffff80525000(0000) knlGS:0000000000000000
      > CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > CR2: 0000000000000000 CR3: 000000017fcda000 CR4: 00000000000006e0
      > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > Process swapper (pid: 0, threadinfo ffffffff8053a000, task ffffffff804fa4a0)
      > Stack:  0000000000000000 ffff81017c7db900 ffffffff8057fcf0 ffffffff803fcfe4
      >  ffffffff804faa38 0000000000000246 0000000000005a40 0000000000000246
      >  000000000001ffff ffff81017dd68dc0 0000000000005a40 0000000055342436
      > Call Trace:
      >  <IRQ>  [<ffffffff803fcfe4>] icmp_reply+0x44/0x1e0
      >  [<ffffffff803d3a0a>] ? ip_route_input+0x23a/0x1360
      >  [<ffffffff803fd645>] icmp_echo+0x65/0x70
      >  [<ffffffff803fd300>] icmp_rcv+0x180/0x1b0
      >  [<ffffffff803d6d84>] ip_local_deliver+0xf4/0x1f0
      >  [<ffffffff803d71bb>] ip_rcv+0x33b/0x650
      >  [<ffffffff803bb16a>] netif_receive_skb+0x27a/0x340
      >  [<ffffffff803be57d>] process_backlog+0x9d/0x100
      >  [<ffffffff803bdd4d>] net_rx_action+0x18d/0x250
      >  [<ffffffff80237be5>] __do_softirq+0x75/0x100
      >  [<ffffffff8020c97c>] call_softirq+0x1c/0x30
      >  [<ffffffff8020f085>] do_softirq+0x65/0xa0
      >  [<ffffffff80237af7>] irq_exit+0x97/0xa0
      >  [<ffffffff8020f198>] do_IRQ+0xa8/0x130
      >  [<ffffffff80212ee0>] ? mwait_idle+0x0/0x60
      >  [<ffffffff8020bc46>] ret_from_intr+0x0/0xf
      >  <EOI>  [<ffffffff80212f2c>] ? mwait_idle+0x4c/0x60
      >  [<ffffffff80212f23>] ? mwait_idle+0x43/0x60
      >  [<ffffffff8020a217>] ? cpu_idle+0x57/0xa0
      >  [<ffffffff8040f380>] ? rest_init+0x70/0x80
      > Code: 10 5b 41 5c 41 5d 41 5e c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 53
      > 48 83 ec 08 48 8b 9f 78 01 00 00 e8 2b c7 f1 ff 89 c0 <48> 8b 04 c3 48 83 c4 08
      > 5b c9 c3 66 66 66 66 66 2e 0f 1f 84 00
      > RIP  [<ffffffff803fce17>] icmp_sk+0x17/0x30
      >  RSP <ffffffff8057fc30>
      > CR2: 0000000000000000
      > ---[ end trace ea161157b76b33e8 ]---
      > Kernel panic - not syncing: Aiee, killing interrupt handler!
      
      Receiving packets while we are cleaning up a network namespace is a
      racy proposition. It is possible when the packet arrives that we have
      removed some but not all of the state we need to fully process it.  We
      have the choice of either playing wack-a-mole with the cleanup routines
      or simply dropping packets when we don't have a network namespace to
      handle them.
      
      Since the check looks inexpensive in netif_receive_skb let's just
      drop the incoming packets.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9f75f45
  2. 16 4月, 2008 1 次提交
  3. 15 4月, 2008 2 次提交
    • P
      [NETNS]: The generic per-net pointers. · dec827d1
      Pavel Emelyanov 提交于
      Add the elastic array of void * pointer to the struct net.
      The access rules are simple:
      
       1. register the ops with register_pernet_gen_device to get
          the id of your private pointer
       2. call net_assign_generic() to put the private data on the
          struct net (most preferably this should be done in the
          ->init callback of the ops registered)
       3. do not store any private reference on the net_generic array;
       4. do not change this pointer while the net is alive;
       5. use the net_generic() to get the pointer.
      
      When adding a new pointer, I copy the old array, replace it
      with a new one and schedule the old for kfree after an RCU
      grace period.
      
      Since the net_generic explores the net->gen array inside rcu
      read section and once set the net->gen->ptr[x] pointer never 
      changes, this grants us a safe access to generic pointers.
      
      Quoting Paul: "... RCU is protecting -only- the net_generic 
      structure that net_generic() is traversing, and the [pointer]
      returned by net_generic() is protected by a reference counter 
      in the upper-level struct net."
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dec827d1
    • P
      [NETNS]: The net-subsys IDs generator. · c93cf61f
      Pavel Emelyanov 提交于
      To make some per-net generic pointers, we need some way to address
      them, i.e. - IDs. This is simple IDA-based IDs generator for pernet
      subsystems.
      
      Addressing questions about potential checkpoint/restart problems: 
      these IDs are "lite-offsets" within the net structure and are by no 
      means supposed to be exported to the userspace.
      
      Since it will be used in the nearest future by devices only (tun,
      vlan, tunnels, bridge, etc), I make it resemble the functionality
      of register_pernet_device().
      
      The new ids is stored in the *id pointer _before_ calling the init
      callback to make this id available in this callback.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c93cf61f
  4. 03 2月, 2008 1 次提交
  5. 29 1月, 2008 1 次提交
  6. 23 1月, 2008 1 次提交
  7. 13 11月, 2007 1 次提交
  8. 07 11月, 2007 1 次提交
    • J
      [NETNS]: Fix compiler error in net_namespace.c · 45a19b0a
      Johann Felix Soden 提交于
      Because net_free is called by copy_net_ns before its declaration, the
      compiler gives an error. This patch puts net_free before copy_net_ns
      to fix this.
      
      The compiler error:
      net/core/net_namespace.c: In function 'copy_net_ns':
      net/core/net_namespace.c:97: error: implicit declaration of function 'net_free'
      net/core/net_namespace.c: At top level:
      net/core/net_namespace.c:104: warning: conflicting types for 'net_free'
      net/core/net_namespace.c:104: error: static declaration of 'net_free' follows non-static declaration
      net/core/net_namespace.c:97: error: previous implicit declaration of 'net_free' was here
      
      The error was introduced by the '[NET]: Hide the dead code in the
      net_namespace.c' patch (6a1a3b9f).
      Signed-off-by: NJohann Felix Soden <johfel@users.sourceforge.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45a19b0a
  9. 01 11月, 2007 4 次提交
  10. 31 10月, 2007 1 次提交
    • D
      [NETNS]: fix net released by rcu callback · 310928d9
      Daniel Lezcano 提交于
      When a network namespace reference is held by a network subsystem,
      and when this reference is decremented in a rcu update callback, we
      must ensure that there is no more outstanding rcu update before
      trying to free the network namespace.
      
      In the normal case, the rcu_barrier is called when the network namespace
      is exiting in the cleanup_net function.
      
      But when a network namespace creation fails, and the subsystems are
      undone (like the cleanup), the rcu_barrier is missing.
      
      This patch adds the missing rcu_barrier.
      Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      310928d9
  11. 11 10月, 2007 6 次提交
    • P
      [NETNS]: Don't memset() netns to zero manually · 32f0c4cb
      Pavel Emelyanov 提交于
      The newly created net namespace is set to 0 with memset()
      in setup_net(). The setup_net() is also called for the
      init_net_ns(), which is zeroed naturally as a global var.
      
      So remove this memset and allocate new nets with the
      kmem_cache_zalloc().
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32f0c4cb
    • E
      [NETNS]: Simplify the network namespace list locking rules. · f4618d39
      Eric W. Biederman 提交于
      Denis V. Lunev <den@sw.ru> noticed that the locking rules
      for the network namespace list are over complicated and broken.
      
      In particular the current register_netdev_notifier currently
      does not take any lock making the for_each_net iteration racy
      with network namespace creation and destruction. Oops.
      
      The fact that we need to use for_each_net in rtnl_unlock() when
      the rtnetlink support becomes per network namespace makes designing
      the proper locking tricky.  In addition we need to be able to call
      rtnl_lock() and rtnl_unlock() when we have the net_mutex held.
      
      After thinking about it and looking at the alternatives carefully
      it looks like the simplest and most maintainable solution is
      to remove net_list_mutex altogether, and to use the rtnl_mutex instead.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f4618d39
    • E
      [NET]: Add network namespace clone & unshare support. · 9dd776b6
      Eric W. Biederman 提交于
      This patch allows you to create a new network namespace
      using sys_clone, or sys_unshare.
      
      As the network namespace is still experimental and under development
      clone and unshare support is only made available when CONFIG_NET_NS is
      selected at compile time.
      
      As this patch introduces network namespace support into code paths
      that exist when the CONFIG_NET is not selected there are a few
      additions made to net_namespace.h to allow a few more functions
      to be used when the networking stack is not compiled in.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd776b6
    • P
      [NETNS]: Cleanup list walking in setup_net and cleanup_net · 768f3591
      Pavel Emelyanov 提交于
      I proposed introducing a list_for_each_entry_continue_reverse macro
      to be used in setup_net() when unrolling the failed ->init callback.
      
      Here is the macro and some more cleanup in the setup_net() itself
      to remove one variable from the stack :) The same thing is for the
      cleanup_net() - the existing list_for_each_entry_reverse() is used.
      
      Minor, but the code looks nicer.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      768f3591
    • D
      [NET]: #if 0 out net_alloc() for now. · 678aa8e4
      David S. Miller 提交于
      We will undo this once it is actually used.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678aa8e4
    • E
      [NET]: Basic network namespace infrastructure. · 5f256bec
      Eric W. Biederman 提交于
      This is the basic infrastructure needed to support network
      namespaces.  This infrastructure is:
      - Registration functions to support initializing per network
        namespace data when a network namespaces is created or destroyed.
      
      - struct net.  The network namespace data structure.
        This structure will grow as variables are made per network
        namespace but this is the minimal starting point.
      
      - Functions to grab a reference to the network namespace.
        I provide both get/put functions that keep a network namespace
        from being freed.  And hold/release functions serve as weak references
        and will warn if their count is not zero when the data structure
        is freed.  Useful for dealing with more complicated data structures
        like the ipv4 route cache.
      
      - A list of all of the network namespaces so we can iterate over them.
      
      - A slab for the network namespace data structure allowing leaks
        to be spotted.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f256bec