1. 27 5月, 2009 1 次提交
  2. 26 5月, 2009 1 次提交
    • E
      net: txq_trans_update() helper · 08baf561
      Eric Dumazet 提交于
      We would like to get rid of netdev->trans_start = jiffies; that about all net
      drivers have to use in their start_xmit() function, and use txq->trans_start
      instead.
      
      This can be done generically in core network, as suggested by David.
      
      Some devices, (particularly loopback) dont need trans_start update, because
      they dont have transmit watchdog. We could add a new device flag, or rely
      on fact that txq->tran_start can be updated is txq->xmit_lock_owner is
      different than -1. Use a helper function to hide our choice.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08baf561
  3. 25 5月, 2009 1 次提交
  4. 22 5月, 2009 1 次提交
    • N
      dropmon: add ability to detect when hardware dropsrxpackets · 4ea7e386
      Neil Horman 提交于
      Patch to add the ability to detect drops in hardware interfaces via dropwatch.
      Adds a tracepoint to net_rx_action to signal everytime a napi instance is
      polled.  The dropmon code then periodically checks to see if the rx_frames
      counter has changed, and if so, adds a drop notification to the netlink
      protocol, using the reserved all-0's vector to indicate the drop location was in
      hardware, rather than somewhere in the code.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
       include/linux/net_dropmon.h |    8 ++
       include/trace/napi.h        |   11 +++
       net/core/dev.c              |    5 +
       net/core/drop_monitor.c     |  124 ++++++++++++++++++++++++++++++++++++++++++--
       net/core/net-traces.c       |    4 +
       net/core/netpoll.c          |    2
       6 files changed, 149 insertions(+), 5 deletions(-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea7e386
  5. 19 5月, 2009 2 次提交
    • E
      net: release dst entry in dev_hard_start_xmit() · 93f154b5
      Eric Dumazet 提交于
      One point of contention in high network loads is the dst_release() performed
      when a transmited skb is freed. This is because NIC tx completion calls
      dev_kree_skb() long after original call to dev_queue_xmit(skb).
      
      CPU cache is cold and the atomic op in dst_release() stalls. On SMP, this is
      quite visible if one CPU is 100% handling softirqs for a network device,
      since dst_clone() is done by other cpus, involving cache line ping pongs.
      
      It seems right place to release dst is in dev_hard_start_xmit(), for most
      devices but ones that are virtual, and some exceptions.
      
      David Miller suggested to define a new device flag, set in alloc_netdev_mq()
      (so that most devices set it at init time), and carefuly unset in devices
      which dont want a NULL skb->dst in their ndo_start_xmit().
      
      List of devices that must clear this flag is :
      
      - loopback device, because it calls netif_rx() and quoting Patrick :
          "ip_route_input() doesn't accept loopback addresses, so loopback packets
           already need to have a dst_entry attached."
      - appletalk/ipddp.c : needs skb->dst in its xmit function
      
      - And all devices that call again dev_queue_xmit() from their xmit function
      (as some classifiers need skb->dst) : bonding, vlan, macvlan, eql, ifb, hdlc_fr
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      93f154b5
    • E
      net: add tx_packets/tx_bytes/tx_dropped counters in struct netdev_queue · 7004bf25
      Eric Dumazet 提交于
      offsetof(struct net_device, features)=0x44
      offsetof(struct net_device, stats.tx_packets)=0x54
      offsetof(struct net_device, stats.tx_bytes)=0x5c
      offsetof(struct net_device, stats.tx_dropped)=0x6c
      
      Network drivers that touch dev->stats.tx_packets/stats.tx_bytes in their
      tx path can slow down SMP operations, since they dirty a cache line
      that should stay shared (dev->features is needed in rx and tx paths)
      
      We could move away stats field in net_device but it wont help that much.
      (Two cache lines dirtied in tx path, we can do one only)
      
      Better solution is to add tx_packets/tx_bytes/tx_dropped in struct
      netdev_queue because this structure is already touched in tx path and
      counters updates will then be free (no increase in size)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7004bf25
  6. 10 5月, 2009 1 次提交
  7. 06 5月, 2009 1 次提交
    • J
      net: introduce a list of device addresses dev_addr_list (v6) · f001fde5
      Jiri Pirko 提交于
      v5 -> v6 (current):
      -removed so far unused static functions
      -corrected dev_addr_del_multiple to call del instead of add
      
      v4 -> v5:
      -added device address type (suggested by davem)
      -removed refcounting (better to have simplier code then safe potentially few
       bytes)
      
      v3 -> v4:
      -changed kzalloc to kmalloc in __hw_addr_add_ii()
      -ASSERT_RTNL() avoided in dev_addr_flush() and dev_addr_init()
      
      v2 -> v3:
      -removed unnecessary rcu read locking
      -moved dev_addr_flush() calling to ensure no null dereference of dev_addr
      
      v1 -> v2:
      -added forgotten ASSERT_RTNL to dev_addr_init and dev_addr_flush
      -removed unnecessary rcu_read locking in dev_addr_init
      -use compare_ether_addr_64bits instead of compare_ether_addr
      -use L1_CACHE_BYTES as size for allocating struct netdev_hw_addr
      -use call_rcu instead of rcu_synchronize
      -moved is_etherdev_addr into __KERNEL__ ifdef
      
      This patch introduces a new list in struct net_device and brings a set of
      functions to handle the work with device address list. The list is a replacement
      for the original dev_addr field and because in some situations there is need to
      carry several device addresses with the net device. To be backward compatible,
      dev_addr is made to point to the first member of the list so original drivers
      sees no difference.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f001fde5
  8. 04 5月, 2009 1 次提交
  9. 02 5月, 2009 1 次提交
  10. 27 4月, 2009 1 次提交
  11. 20 4月, 2009 3 次提交
  12. 16 4月, 2009 1 次提交
  13. 15 4月, 2009 1 次提交
  14. 11 4月, 2009 1 次提交
  15. 02 4月, 2009 1 次提交
  16. 27 3月, 2009 1 次提交
    • H
      GRO: Disable GRO on legacy netif_rx path · 8f1ead2d
      Herbert Xu 提交于
      When I fixed the GRO crash in the legacy receive path I used
      napi_complete to replace __napi_complete.  Unfortunately they're
      not the same when NETPOLL is enabled, which may result in us
      not calling __napi_complete at all.
      
      What's more, we really do need to keep the __napi_complete call
      within the IRQ-off section since in theory an IRQ can occur in
      between and fill up the backlog to the maximum, causing us to
      lock up.
      
      Since we can't seem to find a fix that works properly right now,
      this patch reverts all the GRO support from the netif_rx path.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f1ead2d
  17. 22 3月, 2009 2 次提交
  18. 19 3月, 2009 1 次提交
  19. 18 3月, 2009 1 次提交
  20. 17 3月, 2009 1 次提交
    • H
      GRO: Move netpoll checks to correct location · d1c76af9
      Herbert Xu 提交于
      As my netpoll fix for net doesn't really work for net-next, we
      need this update to move the checks into the right place.  As it
      stands we may pass freed skbs to netpoll_receive_skb.
      
      This patch also introduces a netpoll_rx_on function to avoid GRO
      completely if we're invoked through netpoll.  This might seem
      paranoid but as netpoll may have an external receive hook it's
      better to be safe than sorry.  I don't think we need this for
      2.6.29 though since there's nothing immediately broken by it.
      
      This patch also moves the GRO_* return values to netdevice.h since
      VLAN needs them too (I tried to avoid this originally but alas
      this seems to be the easiest way out).  This fixes a bug in VLAN
      where it continued to use the old return value 2 instead of the
      correct GRO_DROP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1c76af9
  21. 14 3月, 2009 1 次提交
  22. 05 3月, 2009 2 次提交
  23. 03 3月, 2009 1 次提交
    • E
      netns: Remove net_alive · 17edde52
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17edde52
  24. 01 3月, 2009 1 次提交
    • H
      netpoll: Add drop checks to all entry points · 4ead4431
      Herbert Xu 提交于
      The netpoll entry checks are required to ensure that we don't
      receive normal packets when invoked via netpoll.  Unfortunately
      it only ever worked for the netif_receive_skb/netif_rx entry
      points.  The VLAN (and subsequently GRO) entry point didn't
      have the check and therefore can trigger all sorts of weird
      problems.
      
      This patch adds the netpoll check to all entry points.
      
      I'm still uneasy with receiving at all under netpoll (which
      apparently is only used by the out-of-tree kdump code).  The
      reason is it is perfectly legal to receive all data including
      headers into highmem if netpoll is off, but if you try to do
      that with netpoll on and someone gets a printk in an IRQ handler                                             
      you're going to get a nice BUG_ON.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ead4431
  25. 23 2月, 2009 1 次提交
    • E
      netns: Remove net_alive · ce16c533
      Eric W. Biederman 提交于
      It turns out that net_alive is unnecessary, and the original problem
      that led to it being added was simply that the icmp code thought
      it was a network device and wound up being unable to handle packets
      while there were still packets in the network namespace.
      
      Now that icmp and tcp have been fixed to properly register themselves
      this problem is no longer present and we have a stronger guarantee
      that packets will not arrive in a network namespace then that provided
      by net_alive in netif_receive_skb.  So remove net_alive allowing
      packet reception run a little faster.
      
      Additionally document the strong reason why network namespace cleanup
      is safe so that if something happens again someone else will have
      a chance of figuring it out.
      Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce16c533
  26. 21 2月, 2009 1 次提交
  27. 19 2月, 2009 1 次提交
  28. 16 2月, 2009 2 次提交
    • P
      d24fff22
    • P
      net: infrastructure for hardware time stamping · ac45f602
      Patrick Ohly 提交于
      The additional per-packet information (16 bytes for time stamps, 1
      byte for flags) is stored for all packets in the skb_shared_info
      struct. This implementation detail is hidden from users of that
      information via skb_* accessor functions. A separate struct resp.
      union is used for the additional information so that it can be
      stored/copied easily outside of skb_shared_info.
      
      Compared to previous implementations (reusing the tstamp field
      depending on the context, optional additional structures) this
      is the simplest solution. It does not extend sk_buff itself.
      
      TX time stamping is implemented in software if the device driver
      doesn't support hardware time stamping.
      
      The new semantic for hardware/software time stamping around
      ndo_start_xmit() is based on two assumptions about existing
      network device drivers which don't support hardware time
      stamping and know nothing about it:
       - they leave the new skb_shared_tx unmodified
       - the keep the connection to the originating socket in skb->sk
         alive, i.e., don't call skb_orphan()
      
      Given that skb_shared_tx is new, the first assumption is safe.
      The second is only true for some drivers. As a result, software
      TX time stamping currently works with the bnx2 driver, but not
      with the unmodified igb driver (the two drivers this patch series
      was tested with).
      Signed-off-by: NPatrick Ohly <patrick.ohly@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac45f602
  29. 09 2月, 2009 2 次提交
  30. 07 2月, 2009 1 次提交
  31. 06 2月, 2009 1 次提交
    • H
      gro: Fix frag_list merging on imprecisely split packets · 56035022
      Herbert Xu 提交于
      The previous fix ad0f9904 (gro:
      Fix handling of imprecisely split packets) only fixed the case
      of frags merging, frag_list merging in the same circumstances
      were still broken.
      
      In particular, the packet headers end up in the data stream.
      
      This patch fixes this plus another issue where an imprecisely
      split packet header may be read incorrectly (this is mostly
      harmless since it'll simply cause the packet to not match and
      be rejected for GRO).
      
      Thanks to Emil Tantilov and Jeff Kirsher for helping to track
      this down.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56035022
  32. 05 2月, 2009 1 次提交
    • H
      net: Partially allow skb destructors to be used on receive path · 9a279bcb
      Herbert Xu 提交于
      As it currently stands, skb destructors are forbidden on the
      receive path because the protocol end-points will overwrite
      any existing destructor with their own.
      
      This is the reason why we have to call skb_orphan in the loopback
      driver before we reinject the packet back into the stack, thus
      creating a period during which loopback traffic isn't charged
      to any socket.
      
      With virtualisation, we have a similar problem in that traffic
      is reinjected into the stack without being associated with any
      socket entity, thus providing no natural congestion push-back
      for those poor folks still stuck with UDP.
      
      Now had we been consistent in telling them that UDP simply has
      no congestion feedback, I could just fob them off.  Unfortunately,
      we appear to have gone to some length in catering for this on
      the standard UDP path, with skb/socket accounting so that has
      created a very unhealthy dependency.
      
      Alas habits are difficult to break out of, so we may just have
      to allow skb destructors on the receive path.
      
      It turns out that making skb destructors useable on the receive path
      isn't as easy as it seems.  For instance, simply adding skb_orphan
      to skb_set_owner_r isn't enough.  This is because we assume all
      over the IP stack that skb->sk is an IP socket if present.
      
      The new transparent proxy code goes one step further and assumes
      that skb->sk is the receiving socket if present.
      
      Now all of this can be dealt with by adding simple checks such
      as only treating skb->sk as an IP socket if skb->sk->sk_family
      matches.  However, it turns out that for bridging at least we
      don't need to do all of this work.
      
      This is of interest because most virtualisation setups use bridging
      so we don't actually go through the IP stack on the host (with
      the exception of our old nemesis the bridge netfilter, but that's
      easily taken care of).
      
      So this patch simply adds skb_orphan to the point just before we
      enter the IP stack, but after we've gone through the bridge on the
      receive path.  It also adds an skb_orphan to the one place in
      netfilter that touches skb->sk/skb->destructor, that is, tproxy.
      
      One word of caution, because of the internal code structure, anyone
      wishing to deploy this must use skb_set_owner_w as opposed to
      skb_set_owner_r since many functions that create a new skb from
      an existing one will invoke skb_set_owner_w on the new skb.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a279bcb
  33. 01 2月, 2009 1 次提交
    • H
      gro: Fix handling of imprecisely split packets · ad0f9904
      Herbert Xu 提交于
      The commit 89a1b249edcf9be884e71f92df84d48355c576aa (gro: Avoid
      copying headers of unmerged packets) only worked for packets
      which are either completely linear, completely non-linear, or
      packets which exactly split at the boundary between headers and
      payload.
      
      Anything else would cause bits in the header to go missing if
      the packet is held by GRO.
      
      This may have broken drivers such as ixgbe.
      
      This patch fixes the places that assumed or only worked with
      the above cases.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad0f9904