1. 06 10月, 2014 1 次提交
  2. 04 10月, 2014 1 次提交
    • E
      qdisc: validate skb without holding lock · 55a93b3e
      Eric Dumazet 提交于
      Validation of skb can be pretty expensive :
      
      GSO segmentation and/or checksum computations.
      
      We can do this without holding qdisc lock, so that other cpus
      can queue additional packets.
      
      Trick is that requeued packets were already validated, so we carry
      a boolean so that sch_direct_xmit() can validate a fresh skb list,
      or directly use an old one.
      
      Tested on 40Gb NIC (8 TX queues) and 200 concurrent flows, 48 threads
      host.
      
      Turning TSO on or off had no effect on throughput, only few more cpu
      cycles. Lock contention on qdisc lock disappeared.
      
      Same if disabling TX checksum offload.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      55a93b3e
  3. 27 9月, 2014 1 次提交
  4. 26 9月, 2014 1 次提交
  5. 23 9月, 2014 1 次提交
  6. 16 9月, 2014 1 次提交
  7. 14 9月, 2014 3 次提交
  8. 04 9月, 2014 1 次提交
    • J
      qdisc: validate frames going through the direct_xmit path · 1f59533f
      Jesper Dangaard Brouer 提交于
      In commit 50cbe9ab ("net: Validate xmit SKBs right when we
      pull them out of the qdisc") the validation code was moved out of
      dev_hard_start_xmit and into dequeue_skb.
      
      However this overlooked the fact that we do not always enqueue
      the skb onto a qdisc. First situation is if qdisc have flag
      TCQ_F_CAN_BYPASS and qdisc is empty.  Second situation is if
      there is no qdisc on the device, which is a common case for
      software devices.
      
      Originally spotted and inital patch by Alexander Duyck.
      As a result Alex was seeing issues trying to connect to a
      vhost_net interface after commit 50cbe9ab was applied.
      
      Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the
      code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in
      __dev_queue_xmit() when no qdisc.
      
      Also handle the error situation where dev_hard_start_xmit() could
      return a skb list, and does not return dev_xmit_complete(rc) and
      falls through to the kfree_skb(), in that situation it should
      call kfree_skb_list().
      
      Fixes:  50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f59533f
  9. 02 9月, 2014 10 次提交
  10. 30 8月, 2014 1 次提交
    • T
      net: Allow GRO to use and set levels of checksum unnecessary · 662880f4
      Tom Herbert 提交于
      Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
      and to report new checksums verfied for use in fallback to normal
      path.
      
      Change GRO checksum path to track csum_level using a csum_cnt field
      in NAPI_GRO_CB. On GRO initialization, if ip_summed is
      CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
      skb->csum_level + 1. For each checksum verified, decrement
      NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
      is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
      deeper checksum than originally indicated in skbuf so increment
      csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
      CHECKSUM_NONE or CHECKSUM_COMPLETE).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662880f4
  11. 26 8月, 2014 2 次提交
  12. 25 8月, 2014 2 次提交
  13. 24 8月, 2014 1 次提交
  14. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1
  15. 06 8月, 2014 1 次提交
  16. 31 7月, 2014 1 次提交
  17. 21 7月, 2014 2 次提交
    • V
      net: print a notification on device rename · 6fe82a39
      Veaceslav Falico 提交于
      Currently it's done silently (from the kernel part), and thus it might be
      hard to track the renames from logs.
      
      Add a simple netdev_info() to notify the rename, but only in case the
      previous name was valid.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: stephen hemminger <stephen@networkplumber.org>
      CC: Jerry Chu <hkchu@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: David Laight <David.Laight@ACULAB.COM>
      Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fe82a39
    • V
      net: print net_device reg_state in netdev_* unless it's registered · ccc7f496
      Veaceslav Falico 提交于
      This way we'll always know in what status the device is, unless it's
      running normally (i.e. NETDEV_REGISTERED).
      
      Also, emit a warning once in case of a bad reg_state.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jason Baron <jbaron@akamai.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: stephen hemminger <stephen@networkplumber.org>
      CC: Jerry Chu <hkchu@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Joe Perches <joe@perches.com>
      Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc7f496
  18. 17 7月, 2014 2 次提交
  19. 16 7月, 2014 2 次提交
  20. 08 7月, 2014 2 次提交
    • L
      net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush · 54951194
      Loic Prylli 提交于
      A bug was introduced in NETDEV_CHANGE notifier sequence causing the
      arp table to be sometimes spuriously cleared (including manual arp
      entries marked permanent), upon network link carrier changes.
      
      The changed argument for the notifier was applied only to a single
      caller of NETDEV_CHANGE, missing among others netdev_state_change().
      So upon net_carrier events induced by the network, which are
      triggering a call to netdev_state_change(), arp_netdev_event() would
      decide whether to clear or not arp cache based on random/junk stack
      values (a kind of read buffer overflow).
      
      Fixes: be9efd36 ("net: pass changed flags along with NETDEV_CHANGE event")
      Fixes: 6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
      Signed-off-by: NLoic Prylli <loicp@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54951194
    • T
      net: Performance fix for process_backlog · 11ef7a89
      Tom Herbert 提交于
      In process_backlog the input_pkt_queue is only checked once for new
      packets and quota is artificially reduced to reflect precisely the
      number of packets on the input_pkt_queue so that the loop exits
      appropriately.
      
      This patches changes the behavior to be more straightforward and
      less convoluted. Packets are processed until either the quota
      is met or there are no more packets to process.
      
      This patch seems to provide a small, but noticeable performance
      improvement. The performance improvement is a result of staying
      in the process_backlog loop longer which can reduce number of IPI's.
      
      Performance data using super_netperf TCP_RR with 200 flows:
      
      Before fix:
      
      88.06% CPU utilization
      125/190/309 90/95/99% latencies
      1.46808e+06 tps
      1145382 intrs.sec.
      
      With fix:
      
      87.73% CPU utilization
      122/183/296 90/95/99% latencies
      1.4921e+06 tps
      1021674.30 intrs./sec.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11ef7a89
  21. 02 7月, 2014 2 次提交
  22. 18 6月, 2014 1 次提交