1. 20 8月, 2009 1 次提交
  2. 07 8月, 2009 1 次提交
    • K
      net: Avoid enqueuing skb for default qdiscs · bbd8a0d3
      Krishna Kumar 提交于
      dev_queue_xmit enqueue's a skb and calls qdisc_run which
      dequeue's the skb and xmits it. In most cases, the skb that
      is enqueue'd is the same one that is dequeue'd (unless the
      queue gets stopped or multiple cpu's write to the same queue
      and ends in a race with qdisc_run). For default qdiscs, we
      can remove the redundant enqueue/dequeue and simply xmit the
      skb since the default qdisc is work-conserving.
      
      The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the
      default fast queue. The controversial part of the patch is
      incrementing qlen when a skb is requeued - this is to avoid
      checks like the second line below:
      
      +  } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) &&
      >>         !q->gso_skb &&
      +          !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) {
      
      Results of a 2 hour testing for multiple netperf sessions (1,
      2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are
      aggregate Mb/s across iterations tested with this version on
      System-X boxes with Chelsio 10gbps cards:
      
      ----------------------------------
      Size |  ORG BW          NEW BW   |
      ----------------------------------
      128K |  156964          159381   |
      256K |  158650          162042   |
      ----------------------------------
      
      Changes from ver1:
      
      1. Move sch_direct_xmit declaration from sch_generic.h to
         pkt_sched.h
      2. Update qdisc basic statistics for direct xmit path.
      3. Set qlen to zero in qdisc_reset.
      4. Changed some function names to more meaningful ones.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbd8a0d3
  3. 06 8月, 2009 1 次提交
  4. 05 8月, 2009 1 次提交
    • I
      net: Fix spinlock use in alloc_netdev_mq() · 0bf52b98
      Ingo Molnar 提交于
      -tip testing found this lockdep warning:
      
      [    2.272010] calling  net_dev_init+0x0/0x164 @ 1
      [    2.276033] device class 'net': registering
      [    2.280191] INFO: trying to register non-static key.
      [    2.284005] the code is fine but needs lockdep annotation.
      [    2.284005] turning off the locking correctness validator.
      [    2.284005] Pid: 1, comm: swapper Not tainted 2.6.31-rc5-tip #1145
      [    2.284005] Call Trace:
      [    2.284005]  [<7958eb4e>] ? printk+0xf/0x11
      [    2.284005]  [<7904f83c>] __lock_acquire+0x11b/0x622
      [    2.284005]  [<7908c9b7>] ? alloc_debug_processing+0xf9/0x144
      [    2.284005]  [<7904e2be>] ? mark_held_locks+0x3a/0x52
      [    2.284005]  [<7908dbc4>] ? kmem_cache_alloc+0xa8/0x13f
      [    2.284005]  [<7904e475>] ? trace_hardirqs_on_caller+0xa2/0xc3
      [    2.284005]  [<7904fdf6>] lock_acquire+0xb3/0xd0
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79591514>] _spin_lock_bh+0x2d/0x5d
      [    2.284005]  [<79489678>] ? alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<79489678>] alloc_netdev_mq+0xf5/0x1ad
      [    2.284005]  [<793a38f2>] ? loopback_setup+0x0/0x74
      [    2.284005]  [<798eecd0>] loopback_net_init+0x20/0x5d
      [    2.284005]  [<79483efb>] register_pernet_device+0x23/0x4b
      [    2.284005]  [<798f5c9f>] net_dev_init+0x115/0x164
      [    2.284005]  [<7900104f>] do_one_initcall+0x4a/0x11a
      [    2.284005]  [<798f5b8a>] ? net_dev_init+0x0/0x164
      [    2.284005]  [<79066f6d>] ? register_irq_proc+0x8c/0xa8
      [    2.284005]  [<798cc29a>] do_basic_setup+0x42/0x52
      [    2.284005]  [<798cc30a>] kernel_init+0x60/0xa1
      [    2.284005]  [<798cc2aa>] ? kernel_init+0x0/0xa1
      [    2.284005]  [<79003e03>] kernel_thread_helper+0x7/0x10
      [    2.284078] device: 'lo': device_add
      [    2.288248] initcall net_dev_init+0x0/0x164 returned 0 after 11718 usecs
      [    2.292010] calling  neigh_init+0x0/0x66 @ 1
      [    2.296010] initcall neigh_init+0x0/0x66 returned 0 after 0 usecs
      
      it's using an zero-initialized spinlock. This is a side-effect of:
      
              dev_unicast_init(dev);
      
      in alloc_netdev_mq() making use of dev->addr_list_lock.
      
      The device has just been allocated freshly, it's not accessible
      anywhere yet so no locking is needed at all - in fact it's wrong
      to lock it here (the lock isnt initialized yet).
      
      This bug was introduced via:
      
      | commit a6ac65db
      | Date:   Thu Jul 30 01:06:12 2009 +0000
      |
      |     net: restore the original spinlock to protect unicast list
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJiri Pirko <jpirko@redhat.com>
      Tested-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bf52b98
  5. 03 8月, 2009 1 次提交
  6. 28 7月, 2009 1 次提交
    • J
      cfg80211: make aware of net namespaces · 463d0183
      Johannes Berg 提交于
      In order to make cfg80211/nl80211 aware of network namespaces,
      we have to do the following things:
      
       * del_virtual_intf method takes an interface index rather
         than a netdev pointer - simply change this
      
       * nl80211 uses init_net a lot, it changes to use the sender's
         network namespace
      
       * scan requests use the interface index, hold a netdev pointer
         and reference instead
      
       * we want a wiphy and its associated virtual interfaces to be
         in one netns together, so
          - we need to be able to change ns for a given interface, so
            export dev_change_net_namespace()
          - for each virtual interface set the NETIF_F_NETNS_LOCAL
            flag, and clear that flag only when the wiphy changes ns,
            to disallow breaking this invariant
      
       * when a network namespace goes away, we need to reparent the
         wiphy to init_net
      
       * cfg80211 users that support creating virtual interfaces must
         create them in the wiphy's namespace, currently this affects
         only mac80211
      
      The end result is that you can now switch an entire wiphy into
      a different network namespace with the new command
      	iw phy#<idx> set netns <pid>
      and all virtual interfaces will follow (or the operation fails).
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      463d0183
  7. 25 7月, 2009 1 次提交
  8. 06 7月, 2009 1 次提交
  9. 27 6月, 2009 1 次提交
  10. 24 6月, 2009 1 次提交
    • H
      net: Move rx skb_orphan call to where needed · d55d87fd
      Herbert Xu 提交于
      In order to get the tun driver to account packets, we need to be
      able to receive packets with destructors set.  To be on the safe
      side, I added an skb_orphan call for all protocols by default since
      some of them (IP in particular) cannot handle receiving packets
      destructors properly.
      
      Now it seems that at least one protocol (CAN) expects to be able
      to pass skb->sk through the rx path without getting clobbered.
      
      So this patch attempts to fix this properly by moving the skb_orphan
      call to where it's actually needed.  In particular, I've added it
      to skb_set_owner_[rw] which is what most users of skb->destructor
      call.
      
      This is actually an improvement for tun too since it means that
      we only give back the amount charged to the socket when the skb
      is passed to another socket that will also be charged accordingly.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: NOliver Hartkopp <olver@hartkopp.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55d87fd
  11. 18 6月, 2009 1 次提交
    • J
      net: group address list and its count · 31278e71
      Jiri Pirko 提交于
      This patch is inspired by patch recently posted by Johannes Berg. Basically what
      my patch does is to group list and a count of addresses into newly introduced
      structure netdev_hw_addr_list. This brings us two benefits:
      1) struct net_device becames a bit nicer.
      2) in the future there will be a possibility to operate with lists independently
         on netdevices (with exporting right functions).
      I wanted to introduce this patch before I'll post a multicast lists conversion.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      
       drivers/net/bnx2.c              |    4 +-
       drivers/net/e1000/e1000_main.c  |    4 +-
       drivers/net/ixgbe/ixgbe_main.c  |    6 +-
       drivers/net/mv643xx_eth.c       |    2 +-
       drivers/net/niu.c               |    4 +-
       drivers/net/virtio_net.c        |   10 ++--
       drivers/s390/net/qeth_l2_main.c |    2 +-
       include/linux/netdevice.h       |   17 +++--
       net/core/dev.c                  |  130 ++++++++++++++++++--------------------
       9 files changed, 89 insertions(+), 90 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31278e71
  12. 12 6月, 2009 2 次提交
  13. 09 6月, 2009 3 次提交
  14. 04 6月, 2009 1 次提交
  15. 03 6月, 2009 1 次提交
  16. 30 5月, 2009 1 次提交
    • J
      net: convert unicast addr list · ccffad25
      Jiri Pirko 提交于
      This patch converts unicast address list to standard list_head using
      previously introduced struct netdev_hw_addr. It also relaxes the
      locking. Original spinlock (still used for multicast addresses) is not
      needed and is no longer used for a protection of this list. All
      reading and writing takes place under rtnl (with no changes).
      
      I also removed a possibility to specify the length of the address
      while adding or deleting unicast address. It's always dev->addr_len.
      
      The convertion touched especially e1000 and ixgbe codes when the
      change is not so trivial.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      
       drivers/net/bnx2.c               |   13 +--
       drivers/net/e1000/e1000_main.c   |   24 +++--
       drivers/net/ixgbe/ixgbe_common.c |   14 ++--
       drivers/net/ixgbe/ixgbe_common.h |    4 +-
       drivers/net/ixgbe/ixgbe_main.c   |    6 +-
       drivers/net/ixgbe/ixgbe_type.h   |    4 +-
       drivers/net/macvlan.c            |   11 +-
       drivers/net/mv643xx_eth.c        |   11 +-
       drivers/net/niu.c                |    7 +-
       drivers/net/virtio_net.c         |    7 +-
       drivers/s390/net/qeth_l2_main.c  |    6 +-
       drivers/scsi/fcoe/fcoe.c         |   16 ++--
       include/linux/netdevice.h        |   18 ++--
       net/8021q/vlan.c                 |    4 +-
       net/8021q/vlan_dev.c             |   10 +-
       net/core/dev.c                   |  195 +++++++++++++++++++++++++++-----------
       net/dsa/slave.c                  |   10 +-
       net/packet/af_packet.c           |    4 +-
       18 files changed, 227 insertions(+), 137 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccffad25
  17. 28 5月, 2009 1 次提交
  18. 27 5月, 2009 5 次提交
  19. 26 5月, 2009 1 次提交
    • E
      net: txq_trans_update() helper · 08baf561
      Eric Dumazet 提交于
      We would like to get rid of netdev->trans_start = jiffies; that about all net
      drivers have to use in their start_xmit() function, and use txq->trans_start
      instead.
      
      This can be done generically in core network, as suggested by David.
      
      Some devices, (particularly loopback) dont need trans_start update, because
      they dont have transmit watchdog. We could add a new device flag, or rely
      on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
      different than -1. Use a helper function to hide our choice.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08baf561
  20. 25 5月, 2009 1 次提交
  21. 22 5月, 2009 1 次提交
    • N
      dropmon: add ability to detect when hardware dropsrxpackets · 4ea7e386
      Neil Horman 提交于
      Patch to add the ability to detect drops in hardware interfaces via dropwatch.
      Adds a tracepoint to net_rx_action to signal everytime a napi instance is
      polled.  The dropmon code then periodically checks to see if the rx_frames
      counter has changed, and if so, adds a drop notification to the netlink
      protocol, using the reserved all-0's vector to indicate the drop location was in
      hardware, rather than somewhere in the code.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/linux/net_dropmon.h |    8 ++
       include/trace/napi.h        |   11 +++
       net/core/dev.c              |    5 +
       net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
       net/core/net-traces.c       |    4 +
       net/core/netpoll.c          |    2
       6 files changed, 149 insertions(+), 5 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea7e386
  22. 19 5月, 2009 2 次提交
    • E
      net: release dst entry in dev_hard_start_xmit() · 93f154b5
      Eric Dumazet 提交于
      One point of contention in high network loads is the dst_release() performed
      when a transmited skb is freed. This is because NIC tx completion calls
      dev_kree_skb() long after original call to dev_queue_xmit(skb).
      
      CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
      quite visible if one CPU is 100% handling softirqs for a network device,
      since dst_clone() is done by other cpus, involving cache line ping pongs.
      
      It seems right place to release dst is in dev_hard_start_xmit(), for most
      devices but ones that are virtual, and some exceptions.
      
      David Miller suggested to define a new device flag, set in alloc_netdev_mq()
      (so that most devices set it at init time), and carefuly unset in devices
      which dont want a NULL skb->dst in their ndo_start_xmit().
      
      List of devices that must clear this flag is :
      
      - loopback device, because it calls netif_rx() and quoting Patrick :
          "ip_route_input() doesn't accept loopback addresses, so loopback packets
           already need to have a dst_entry attached."
      - appletalk/ipddp.c : needs skb->dst in its xmit function
      
      - And all devices that call again dev_queue_xmit() from their xmit function
      (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f154b5
    • E
      net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25
      Eric Dumazet 提交于
      offsetof(struct net_device, features)=0x44
      offsetof(struct net_device, stats.tx_packets)=0x54
      offsetof(struct net_device, stats.tx_bytes)=0x5c
      offsetof(struct net_device, stats.tx_dropped)=0x6c
      
      Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
      tx path can slow down SMP operations, since they dirty a cache line
      that should stay shared (dev->features is needed in rx and tx paths)
      
      We could move away stats field in net_device but it wont help that much.
      (Two cache lines dirtied in tx path, we can do one only)
      
      Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
      netdev_queue because this structure is already touched in tx path and
      counters updates will then be free (no increase in size)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7004bf25
  23. 10 5月, 2009 1 次提交
  24. 06 5月, 2009 1 次提交
    • J
      net: introduce a list of device addresses dev_addr_list (v6) · f001fde5
      Jiri Pirko 提交于
      v5 -> v6 (current):
      -removed so far unused static functions
      -corrected dev_addr_del_multiple to call del instead of add
      
      v4 -> v5:
      -added device address type (suggested by davem)
      -removed refcounting (better to have simplier code then safe potentially few
       bytes)
      
      v3 -> v4:
      -changed kzalloc to kmalloc in __hw_addr_add_ii()
      -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()
      
      v2 -> v3:
      -removed unnecessary rcu read locking
      -moved dev_addr_flush() calling to ensure no null dereference of dev_addr
      
      v1 -> v2:
      -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
      -removed unnecessary rcu_read locking in dev_addr_init
      -use compare_ether_addr_64bits instead of compare_ether_addr
      -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
      -use call_rcu instead of rcu_synchronize
      -moved is_etherdev_addr into __KERNEL__ ifdef
      
      This patch introduces a new list in struct net_device and brings a set of
      functions to handle the work with device address list. The list is a replacement
      for the original dev_addr field and because in some situations there is need to
      carry several device addresses with the net device. To be backward compatible,
      dev_addr is made to point to the first member of the list so original drivers
      sees no difference.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f001fde5
  25. 04 5月, 2009 1 次提交
  26. 02 5月, 2009 1 次提交
  27. 27 4月, 2009 1 次提交
  28. 20 4月, 2009 3 次提交
  29. 16 4月, 2009 1 次提交
  30. 15 4月, 2009 1 次提交