1. 27 8月, 2010 1 次提交
    • E
      gro: __napi_gro_receive() optimizations · 40d0802b
      Eric Dumazet 提交于
      compare_ether_header() can have a special implementation on 64 bit
      arches if CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is defined.
      
      __napi_gro_receive() and vlan_gro_common() can avoid a conditional
      branch to perform device match.
      
      On x86_64, __napi_gro_receive() has now 38 instructions instead of 53
      
      As gcc-4.4.3 still choose to not inline it, add inline keyword to this
      performance critical function.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40d0802b
  2. 23 8月, 2010 2 次提交
  3. 22 8月, 2010 1 次提交
  4. 20 8月, 2010 3 次提交
  5. 19 8月, 2010 1 次提交
  6. 18 8月, 2010 1 次提交
  7. 17 8月, 2010 1 次提交
    • K
      core: Factor out flow calculation from get_rps_cpu · bfb564e7
      Krishna Kumar 提交于
      Factor out flow calculation code from get_rps_cpu, since other
      functions can use the same code.
      
      Revisions:
      
      v2 (Ben): Separate flow calcuation out and use in select queue.
      v3 (Arnd): Don't re-implement MIN.
      v4 (Changli): skb->data points to ethernet header in macvtap, and
      	make a fast path. Tested macvtap with this patch.
      v5 (Changli):
      	- Cache skb->rxhash in skb_get_rxhash
      	- macvtap may not have pow(2) queues, so change code for
      	  queue selection.
          (Arnd):
      	- Use first available queue if all fails.
      Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfb564e7
  8. 08 8月, 2010 1 次提交
  9. 06 8月, 2010 1 次提交
  10. 03 8月, 2010 2 次提交
  11. 01 8月, 2010 1 次提交
  12. 26 7月, 2010 1 次提交
  13. 20 7月, 2010 1 次提交
  14. 19 7月, 2010 1 次提交
    • R
      net: support time stamping in phy devices. · c1f19b51
      Richard Cochran 提交于
      This patch adds a new networking option to allow hardware time stamps
      from PHY devices. When enabled, likely candidates among incoming and
      outgoing network packets are offered to the PHY driver for possible
      time stamping. When accepted by the PHY driver, incoming packets are
      deferred for later delivery by the driver.
      
      The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
      and callbacks for transmit and receive time stamping. Drivers may
      optionally implement these functions.
      Signed-off-by: NRichard Cochran <richard.cochran@omicron.at>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1f19b51
  15. 15 7月, 2010 2 次提交
    • T
      net: fix problem in reading sock TX queue · b0f77d0e
      Tom Herbert 提交于
      Fix problem in reading the tx_queue recorded in a socket.  In
      dev_pick_tx, the TX queue is read by doing a check with
      sk_tx_queue_recorded on the socket, followed by a sk_tx_queue_get.
      The problem is that there is not mutual exclusion across these
      calls in the socket so it it is possible that the queue in the
      sock can be invalidated after sk_tx_queue_recorded is called so
      that sk_tx_queue get returns -1, which sets 65535 in queue_index
      and thus dev_pick_tx returns 65536 which is a bogus queue and
      can cause crash in dev_queue_xmit.
      
      We fix this by only calling sk_tx_queue_get which does the proper
      checks.  The interface is that sk_tx_queue_get returns the TX queue
      if the sock argument is non-NULL and TX queue is recorded, else it
      returns -1.  sk_tx_queue_recorded is no longer used so it can be
      completely removed.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0f77d0e
    • E
      net: skb_tx_hash() fix relative to skb_orphan_try() · 87fd308c
      Eric Dumazet 提交于
      commit fc6055a5 (net: Introduce skb_orphan_try()) added early
      orphaning of skbs.
      
      This unfortunately added a performance regression in skb_tx_hash() in
      case of stacked devices (bonding, vlans, ...)
      
      Since skb->sk is now NULL, we cannot access sk->sk_hash anymore to
      spread tx packets to multiple NIC queues on multiqueue devices.
      
      skb_tx_hash() in this case only uses skb->protocol, same value for all
      flows.
      
      skb_orphan_try() can copy sk->sk_hash into skb->rxhash and skb_tx_hash()
      can use this saved sk_hash value to compute its internal hash value.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      87fd308c
  16. 10 7月, 2010 2 次提交
    • B
      net: Document that dev_get_stats() returns the given pointer · d7753516
      Ben Hutchings 提交于
      Document that dev_get_stats() returns the same stats pointer it was
      given.  Remove const qualification from the returned pointer since the
      caller may do what it likes with that structure.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7753516
    • B
      net: Get rid of rtnl_link_stats64 / net_device_stats union · 3cfde79c
      Ben Hutchings 提交于
      In commit be1f3c2c "net: Enable 64-bit
      net device statistics on 32-bit architectures" I redefined struct
      net_device_stats so that it could be used in a union with struct
      rtnl_link_stats64, avoiding the need for explicit copying or
      conversion between the two.  However, this is unsafe because there is
      no locking required and no lock consistently held around calls to
      dev_get_stats() and use of the statistics structure it returns.
      
      In commit 28172739 "net: fix 64 bit
      counters on 32 bit arches" Eric Dumazet dealt with that problem by
      requiring callers of dev_get_stats() to provide storage for the
      result.  This means that the net_device::stats64 field and the padding
      in struct net_device_stats are now redundant, so remove them.
      
      Update the comment on net_device_ops::ndo_get_stats64 to reflect its
      new usage.
      
      Change dev_txq_stats_fold() to use struct rtnl_link_stats64, since
      that is what all its callers are really using and it is no longer
      going to be compatible with struct net_device_stats.
      
      Eric Dumazet suggested the separate function for the structure
      conversion.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3cfde79c
  17. 08 7月, 2010 1 次提交
    • E
      net: fix 64 bit counters on 32 bit arches · 28172739
      Eric Dumazet 提交于
      There is a small possibility that a reader gets incorrect values on 32
      bit arches. SNMP applications could catch incorrect counters when a
      32bit high part is changed by another stats consumer/provider.
      
      One way to solve this is to add a rtnl_link_stats64 param to all
      ndo_get_stats64() methods, and also add such a parameter to
      dev_get_stats().
      
      Rule is that we are not allowed to use dev->stats64 as a temporary
      storage for 64bit stats, but a caller provided area (usually on stack)
      
      Old drivers (only providing get_stats() method) need no changes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28172739
  18. 05 7月, 2010 1 次提交
  19. 03 7月, 2010 1 次提交
    • J
      net: decreasing real_num_tx_queues needs to flush qdisc · f0796d5c
      John Fastabend 提交于
      Reducing real_num_queues needs to flush the qdisc otherwise
      skbs with queue_mappings greater then real_num_tx_queues can
      be sent to the underlying driver.
      
      The flow for this is,
      
      dev_queue_xmit()
      	dev_pick_tx()
      		skb_tx_hash()  => hash using real_num_tx_queues
      		skb_set_queue_mapping()
      	...
      	qdisc_enqueue_root() => enqueue skb on txq from hash
      ...
      dev->real_num_tx_queues -= n
      ...
      sch_direct_xmit()
      	dev_hard_start_xmit()
      		ndo_start_xmit(skb,dev) => skb queue set with old hash
      
      skbs are enqueued on the qdisc with skb->queue_mapping set
      0 < queue_mappings < real_num_tx_queues.  When the driver
      decreases real_num_tx_queues skb's may be dequeued from the
      qdisc with a queue_mapping greater then real_num_tx_queues.
      
      This fixes a case in ixgbe where this was occurring with DCB
      and FCoE. Because the driver is using queue_mapping to map
      skbs to tx descriptor rings we can potentially map skbs to
      rings that no longer exist.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Tested-by: NRoss Brattain <ross.b.brattain@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0796d5c
  20. 01 7月, 2010 1 次提交
  21. 24 6月, 2010 1 次提交
  22. 16 6月, 2010 2 次提交
  23. 13 6月, 2010 1 次提交
    • B
      net: Enable 64-bit net device statistics on 32-bit architectures · be1f3c2c
      Ben Hutchings 提交于
      Use struct rtnl_link_stats64 as the statistics structure.
      
      On 32-bit architectures, insert 32 bits of padding after/before each
      field of struct net_device_stats to make its layout compatible with
      struct rtnl_link_stats64.  Add an anonymous union in net_device; move
      stats into the union and add struct rtnl_link_stats64 stats64.
      
      Add net_device_ops::ndo_get_stats64, implementations of which will
      return a pointer to struct rtnl_link_stats64.  Drivers that implement
      this operation must not update the structure asynchronously.
      
      Change dev_get_stats() to call ndo_get_stats64 if available, and to
      return a pointer to struct rtnl_link_stats64.  Change callers of
      dev_get_stats() accordingly.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be1f3c2c
  24. 11 6月, 2010 1 次提交
    • J
      net: deliver skbs on inactive slaves to exact matches · 597a264b
      John Fastabend 提交于
      Currently, the accelerated receive path for VLAN's will
      drop packets if the real device is an inactive slave and
      is not one of the special pkts tested for in
      skb_bond_should_drop().  This behavior is different then
      the non-accelerated path and for pkts over a bonded vlan.
      
      For example,
      
      vlanx -> bond0 -> ethx
      
      will be dropped in the vlan path and not delivered to any
      packet handlers at all.  However,
      
      bond0 -> vlanx -> ethx
      
      and
      
      bond0 -> ethx
      
      will be delivered to handlers that match the exact dev,
      because the VLAN path checks the real_dev which is not a
      slave and netif_recv_skb() doesn't drop frames but only
      delivers them to exact matches.
      
      This patch adds a sk_buff flag which is used for tagging
      skbs that would previously been dropped and allows the
      skb to continue to skb_netif_recv().  Here we add
      logic to check for the deliver_no_wcard flag and if it
      is set only deliver to handlers that match exactly.  This
      makes both paths above consistent and gives pkt handlers
      a way to identify skbs that come from inactive slaves.
      Without this patch in some configurations skbs will be
      delivered to handlers with exact matches and in others
      be dropped out right in the vlan path.
      
      I have tested the following 4 configurations in failover modes
      and load balancing modes.
      
      # bond0 -> ethx
      
      # vlanx -> bond0 -> ethx
      
      # bond0 -> vlanx -> ethx
      
      # bond0 -> ethx
                  |
        vlanx -> --
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      597a264b
  25. 10 6月, 2010 1 次提交
  26. 08 6月, 2010 1 次提交
    • E
      anycast: Some RCU conversions · bb69ae04
      Eric Dumazet 提交于
      - dev_get_by_flags() changed to dev_get_by_flags_rcu()
      
      - ipv6_sock_ac_join() dont touch dev & idev refcounts
      - ipv6_sock_ac_drop() dont touch dev & idev refcounts
      - ipv6_sock_ac_close() dont touch dev & idev refcounts
      - ipv6_dev_ac_dec() dount touch idev refcount
      - ipv6_chk_acast_addr() dont touch idev refcount
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb69ae04
  27. 07 6月, 2010 1 次提交
    • J
      net: Remove unnecessary net action assertion · 271c1dfa
      jamal 提交于
      The extra assertion to allow packet munging only when there are
      no other ptypes listening which may have worked around an old bug
      is unnecessary. It is sufficient to check if the skb is cloned before
      trampling on it. Thanks to Herbert Xu for being persistent and patient
      in getting this across.
      [Note that cloning checks and assertions are the general rule used
      by tc actions (documentation/networking/tc-actions-env-rules.txt)].
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271c1dfa
  28. 05 6月, 2010 1 次提交
  29. 02 6月, 2010 4 次提交
    • J
      net: replace hooks in __netif_receive_skb V5 · ab95bfe0
      Jiri Pirko 提交于
      What this patch does is it removes two receive frame hooks (for bridge and for
      macvlan) from __netif_receive_skb. These are replaced them with a single
      hook for both. It only supports one hook per device because it makes no
      sense to do bridging and macvlan on the same device.
      
      Then a network driver (of virtual netdev like macvlan or bridge) can register
      an rx_handler for needed net device.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab95bfe0
    • E
      net: add additional lock to qdisc to increase throughput · 79640a4c
      Eric Dumazet 提交于
      When many cpus compete for sending frames on a given qdisc, the qdisc
      spinlock suffers from very high contention.
      
      The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire
      the lock, and cannot dequeue packets fast enough, since it must wait for
      this lock for each dequeued packet.
      
      One solution to this problem is to force all cpus spinning on a second
      lock before trying to get the main lock, when/if they see
      __QDISC_STATE_RUNNING already set.
      
      The owning cpu then compete with at most one other cpu for the main
      lock, allowing for higher dequeueing rate.
      
      Based on a previous patch from Alexander Duyck. I added the heuristic to
      avoid the atomic in fast path, and put the new lock far away from the
      cache line used by the dequeue worker. Also try to release the busylock
      lock as late as possible.
      
      Tests with following script gave a boost from ~50.000 pps to ~600.000
      pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver.
      (A single netperf flow can reach ~800.000 pps on this platform)
      
      for j in `seq 0 3`; do
        for i in `seq 0 7`; do
          netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 &
        done
      done
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79640a4c
    • J
      net: fix conflict between null_or_orig and null_or_bond · 2df4a0fa
      John Fastabend 提交于
      If a skb is received on an inactive bond that does not meet
      the special cases checked for by skb_bond_should_drop it should
      only be delivered to exact matches as the comment in
      netif_receive_skb() says.
      
      However because null_or_bond could also be null this is not
      always true.  This patch renames null_or_bond to orig_or_bond
      and initializes it to orig_dev.  This keeps the intent of
      null_or_bond to pass frames received on VLAN interfaces stacked
      on bonding interfaces without invalidating the statement for
      null_or_orig.
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df4a0fa
    • E
      net: Define accessors to manipulate QDISC_STATE_RUNNING · bc135b23
      Eric Dumazet 提交于
      Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a
      second patch will move on another location.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc135b23
  30. 31 5月, 2010 1 次提交