1. 04 12月, 2009 8 次提交
  2. 03 12月, 2009 1 次提交
  3. 02 12月, 2009 4 次提交
    • E
      net: Simplfy default_device_exit and improve batching. · e008b5fc
      Eric W. Biederman 提交于
      - Defer dellink to net_cleanup() allowing for batching.
      - Fix comment.
      - Use for_each_netdev_safe again as dev_change_net_namespace touches
        at most one network device (unlike veth dellink).
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e008b5fc
    • E
      net: Automatically allocate per namespace data. · f875bae0
      Eric W. Biederman 提交于
      To get the full benefit of batched network namespace cleanup netowrk
      device deletion needs to be performed by the generic code.  When
      using register_pernet_gen_device and freeing the data in exit_net
      it is impossible to delay allocation until after exit_net has called
      as the device uninit methods are no longer safe.
      
      To correct this, and to simplify working with per network namespace data
      I have moved allocation and deletion of per network namespace data into
      the network namespace core.  The core now frees the data only after
      all of the network namespace exit routines have run.
      
      Now it is only required to set the new fields .id and .size
      in the pernet_operations structure if you want network namespace
      data to be managed for you automatically.
      
      This makes the current register_pernet_gen_device and
      register_pernet_gen_subsys routines unnecessary.  For the moment
      I have left them as compatibility wrappers in net_namespace.h
      They will be removed once all of the users have been updated.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f875bae0
    • E
      net: Batch network namespace destruction. · 2b035b39
      Eric W. Biederman 提交于
      It is fairly common to kill several network namespaces at once.  Either
      because they are nested one inside the other or because they are cooperating
      in multiple machine networking experiments.  As the network stack control logic
      does not parallelize easily batch up multiple network namespaces existing
      together.
      
      To get the full benefit of batching the virtual network devices to be
      removed must be all removed in one batch.  For that purpose I have added
      a loop after the last network device operations have run that batches
      up all remaining network devices and deletes them.
      
      An extra benefit is that the reorganization slightly shrinks the size
      of the per network namespace data structures replaceing a work_struct
      with a list_head.
      
      In a trivial test with 4K namespaces this change reduced the cost of
      a destroying 4K namespaces from 7+ minutes (at 12% cpu) to 44 seconds
      (at 60% cpu).  The bulk of that 44s was spent in inet_twsk_purge.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b035b39
    • E
      net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCH · a5ee1551
      Eric W. Biederman 提交于
      The motivation for an additional notifier in batched netdevice
      notification (rt_do_flush) only needs to be called once per batch not
      once per namespace.
      
      For further batching improvements I need a guarantee that the
      netdevices are unregistered in order allowing me to unregister an all
      of the network devices in a network namespace at the same time with
      the guarantee that the loopback device is really and truly
      unregistered last.
      
      Additionally it appears that we moved the route cache flush after
      the final synchronize_net, which seems wrong and there was no
      explanation.  So I have restored the original location of the final
      synchronize_net.
      
      Cc: Octavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5ee1551
  4. 30 11月, 2009 1 次提交
  5. 29 11月, 2009 1 次提交
    • E
      pktgen: NUMA aware · 3291b9db
      Eric Dumazet 提交于
      pktgen threads are bound to given CPU, we can allocate memory for
      these threads in a NUMA aware way.
      
      After a pktgen session on two threads, we can check flows memory was
      allocated on right node, instead of a not related one.
      
      # grep pktgen_thread_write /proc/vmallocinfo
      0xffffc90007204000-0xffffc90007385000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N0=384
      0xffffc90007386000-0xffffc90007507000 1576960 pktgen_thread_write+0x3a4/0x6b0 [pktgen] pages=384 vmalloc N1=384
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3291b9db
  6. 27 11月, 2009 1 次提交
  7. 26 11月, 2009 1 次提交
  8. 25 11月, 2009 1 次提交
  9. 24 11月, 2009 1 次提交
    • E
      pktgen: Fix device name compares · 593f63b0
      Eric Dumazet 提交于
      Commit e6fce5b9 (pktgen: multiqueue etc.) tried to relax
      the pktgen restriction of one device per kernel thread, adding a '@'
      tag to device names.
      
      Problem is we dont perform check on full pktgen device name.
      This allows adding many time same 'device' to pktgen thread
      
       pgset "add_device eth0@0"
      
      one session later :
      
       pgset "add_device eth0@0"
      
      (This doesnt find previous device)
      
      This consumes ~1.5 MBytes of vmalloc memory per round and also triggers
      this warning :
      
      [  673.186380] proc_dir_entry 'pktgen/eth0@0' already registered
      [  673.186383] Modules linked in: pktgen ixgbe ehci_hcd psmouse mdio mousedev evdev [last unloaded: pktgen]
      [  673.186406] Pid: 6219, comm: bash Tainted: G        W  2.6.32-rc7-03302-g41cec6f1-dirty #16
      [  673.186410] Call Trace:
      [  673.186417]  [<ffffffff8104a29b>] warn_slowpath_common+0x7b/0xc0
      [  673.186422]  [<ffffffff8104a341>] warn_slowpath_fmt+0x41/0x50
      [  673.186426]  [<ffffffff8114e789>] proc_register+0x109/0x210
      [  673.186433]  [<ffffffff8100bf2e>] ? apic_timer_interrupt+0xe/0x20
      [  673.186438]  [<ffffffff8114e905>] proc_create_data+0x75/0xd0
      [  673.186444]  [<ffffffffa006ad38>] pktgen_thread_write+0x568/0x640 [pktgen]
      [  673.186449]  [<ffffffffa006a7d0>] ? pktgen_thread_write+0x0/0x640 [pktgen]
      [  673.186453]  [<ffffffff81149144>] proc_reg_write+0x84/0xc0
      [  673.186458]  [<ffffffff810f5a58>] vfs_write+0xb8/0x180
      [  673.186463]  [<ffffffff810f5c11>] sys_write+0x51/0x90
      [  673.186468]  [<ffffffff8100b51b>] system_call_fastpath+0x16/0x1b
      [  673.186470] ---[ end trace ccbb991b0a8d994d ]---
      
      Solution to this problem is to use a odevname field (includes @ tag and suffix),
      instead of using netdevice name.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      593f63b0
  10. 23 11月, 2009 1 次提交
  11. 21 11月, 2009 1 次提交
  12. 18 11月, 2009 4 次提交
    • O
      d9031024
    • E
      linkwatch: linkwatch_forget_dev() to speedup device dismantle · e014debe
      Eric Dumazet 提交于
      Herbert Xu a écrit :
      > On Tue, Nov 17, 2009 at 04:26:04AM -0800, David Miller wrote:
      >> Really, the link watch stuff is just due for a redesign.  I don't
      >> think a simple hack is going to cut it this time, sorry Eric :-)
      >
      > I have no objections against any redesigns, but since the only
      > caller of linkwatch_forget_dev runs in process context with the
      > RTNL, it could also legally emit those events.
      
      Thanks guys, here an updated version then, before linkwatch surgery ?
      
      In this version, I force the event to be sent synchronously.
      
      [PATCH net-next-2.6] linkwatch: linkwatch_forget_dev() to speedup device dismantle
      
      time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105
      
      real	0m0.266s
      user	0m0.000s
      sys	0m0.001s
      
      real	0m0.770s
      user	0m0.000s
      sys	0m0.000s
      
      real	0m1.022s
      user	0m0.000s
      sys	0m0.000s
      
      One problem of current schem in vlan dismantle phase is the
      holding of device done by following chain :
      
      vlan_dev_stop() ->
      	netif_carrier_off(dev) ->
      		linkwatch_fire_event(dev) ->
      			dev_hold() ...
      
      And __linkwatch_run_queue() runs up to one second later...
      
      A generic fix to this problem is to add a linkwatch_forget_dev() method
      to unlink the device from the list of watched devices.
      
      dev->link_watch_next becomes dev->link_watch_list (and use a bit more memory),
      to be able to unlink device in O(1).
      
      After patch :
      time ip link del eth3.103 ; time ip link del eth3.104 ; time ip link del eth3.105
      
      real    0m0.024s
      user    0m0.000s
      sys     0m0.000s
      
      real    0m0.032s
      user    0m0.000s
      sys     0m0.001s
      
      real    0m0.033s
      user    0m0.000s
      sys     0m0.000s
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e014debe
    • O
      net: introduce NETDEV_UNREGISTER_PERNET · 395264d5
      Octavian Purdila 提交于
      This new event is called once for each unique net namespace in batched
      unregister operations (with the argument set to a random device from
      that namespace) and once per device in non-batched unregister
      operations.
      
      It allows us to factorize some device unregister work such as clearing the
      routing cache.
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      395264d5
    • E
      net: add dev_txq_stats_fold() helper · d83345ad
      Eric Dumazet 提交于
      Some drivers ndo_get_stats() method need to perform txqueue stats folding.
      
      Move folding from dev_get_stats() to a new dev_txq_stats_fold() function
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d83345ad
  13. 17 11月, 2009 1 次提交
  14. 16 11月, 2009 3 次提交
  15. 14 11月, 2009 1 次提交
    • P
      net: allow to propagate errors through ->ndo_hard_start_xmit() · 572a9d7b
      Patrick McHardy 提交于
      Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return
      one of the NETDEV_TX codes. This prevents any kind of error propagation for
      virtual devices, like queue congestion of the underlying device in case of
      layered devices, or unreachability in case of tunnels.
      
      This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX
      codes and changes the two callers of dev_hard_start_xmit() to expect either
      errno codes, NET_XMIT codes or NETDEV_TX codes as return value.
      
      In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK
      since no error propagation is possible when using qdiscs. In case of
      dev_queue_xmit(), the error is propagated upwards.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      572a9d7b
  16. 12 11月, 2009 3 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f
    • S
      netdev: fold name hash properly (v3) · 08e9897d
      stephen hemminger 提交于
      The full_name_hash function does not produce well distributed values in
      the lower bits, so most code uses hash_32() to fold it.  This is really
      a bug introduced when name hashing was added, back in 2.5 when I added
      name hashing.
      
      hash_32 is all that is needed since full_name_hash returns unsigned int
      which is only 32 bits on 64 bit platforms.
      
      Also, there is no point in using hash_32 on ifindex, because the is naturally
      sequential and usually well distributed.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08e9897d
    • A
      skbuff: Do not allow skb recycling with disabled IRQs · e84af6dd
      Anton Vorontsov 提交于
      NAPI drivers try to recycle SKBs in their polling routine, but we
      generally don't know the context in which the polling will be called,
      and the skb recycling itself may require IRQs to be enabled.
      
      This patch adds irqs_disabled() test to the skb_recycle_check()
      routine, so that we'll not let the drivers hit the skb recycling
      path with IRQs disabled.
      
      As a side effect, this patch actually disables skb recycling for some
      [broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during
      TX ring processing, and then tries to recycle an skb, and that caused
      the following badness:
      
      nf_conntrack version 0.5.0 (1008 buckets, 4032 max)
      ------------[ cut here ]------------
      Badness at kernel/softirq.c:143
      NIP: c003e3c4 LR: c423a528 CTR: c003e344
      ...
      NIP [c003e3c4] local_bh_enable+0x80/0xc4
      LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
      Call Trace:
      [c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable)
      [c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack]
      [c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e84af6dd
  17. 08 11月, 2009 1 次提交
    • E
      net: Support specifying the network namespace upon device creation. · 81adee47
      Eric W. Biederman 提交于
      There is no good reason to not support userspace specifying the
      network namespace during device creation, and it makes it easier
      to create a network device and pass it to a child network namespace
      with a well known name.
      
      We have to be careful to ensure that the target network namespace
      for the new device exists through the life of the call.  To keep
      that logic clear I have factored out the network namespace grabbing
      logic into rtnl_link_get_net.
      
      In addtion we need to continue to pass the source network namespace
      to the rtnl_link_ops.newlink method so that we can find the base
      device source network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      81adee47
  18. 07 11月, 2009 2 次提交
  19. 06 11月, 2009 3 次提交
  20. 04 11月, 2009 1 次提交