1. 14 9月, 2014 1 次提交
  2. 04 9月, 2014 1 次提交
    • J
      qdisc: validate frames going through the direct_xmit path · 1f59533f
      Jesper Dangaard Brouer 提交于
      In commit 50cbe9ab ("net: Validate xmit SKBs right when we
      pull them out of the qdisc") the validation code was moved out of
      dev_hard_start_xmit and into dequeue_skb.
      
      However this overlooked the fact that we do not always enqueue
      the skb onto a qdisc. First situation is if qdisc have flag
      TCQ_F_CAN_BYPASS and qdisc is empty.  Second situation is if
      there is no qdisc on the device, which is a common case for
      software devices.
      
      Originally spotted and inital patch by Alexander Duyck.
      As a result Alex was seeing issues trying to connect to a
      vhost_net interface after commit 50cbe9ab was applied.
      
      Added a call to validate_xmit_skb() in __dev_xmit_skb(), in the
      code path for qdiscs with TCQ_F_CAN_BYPASS flag, and in
      __dev_queue_xmit() when no qdisc.
      
      Also handle the error situation where dev_hard_start_xmit() could
      return a skb list, and does not return dev_xmit_complete(rc) and
      falls through to the kfree_skb(), in that situation it should
      call kfree_skb_list().
      
      Fixes:  50cbe9ab ("net: Validate xmit SKBs right when we pull them out of the qdisc")
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f59533f
  3. 02 9月, 2014 10 次提交
  4. 30 8月, 2014 1 次提交
    • T
      net: Allow GRO to use and set levels of checksum unnecessary · 662880f4
      Tom Herbert 提交于
      Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
      and to report new checksums verfied for use in fallback to normal
      path.
      
      Change GRO checksum path to track csum_level using a csum_cnt field
      in NAPI_GRO_CB. On GRO initialization, if ip_summed is
      CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
      skb->csum_level + 1. For each checksum verified, decrement
      NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
      is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
      deeper checksum than originally indicated in skbuf so increment
      csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
      CHECKSUM_NONE or CHECKSUM_COMPLETE).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662880f4
  5. 26 8月, 2014 2 次提交
  6. 25 8月, 2014 2 次提交
  7. 24 8月, 2014 1 次提交
  8. 12 8月, 2014 1 次提交
    • V
      net: Always untag vlan-tagged traffic on input. · 0d5501c1
      Vlad Yasevich 提交于
      Currently the functionality to untag traffic on input resides
      as part of the vlan module and is build only when VLAN support
      is enabled in the kernel.  When VLAN is disabled, the function
      vlan_untag() turns into a stub and doesn't really untag the
      packets.  This seems to create an interesting interaction
      between VMs supporting checksum offloading and some network drivers.
      
      There are some drivers that do not allow the user to change
      tx-vlan-offload feature of the driver.  These drivers also seem
      to assume that any VLAN-tagged traffic they transmit will
      have the vlan information in the vlan_tci and not in the vlan
      header already in the skb.  When transmitting skbs that already
      have tagged data with partial checksum set, the checksum doesn't
      appear to be updated correctly by the card thus resulting in a
      failure to establish TCP connections.
      
      The following is a packet trace taken on the receiver where a
      sender is a VM with a VLAN configued.  The host VM is running on
      doest not have VLAN support and the outging interface on the
      host is tg3:
      10:12:43.503055 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27243,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x48d9), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294837885 ecr 0,nop,wscale 7], length 0
      10:12:44.505556 52:54:00:ae:42:3f > 28:d2:44:7d:c2:de, ethertype 802.1Q
      (0x8100), length 78: vlan 100, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 27244,
      offset 0, flags [DF], proto TCP (6), length 60)
          10.0.100.1.58545 > 10.0.100.10.ircu-2: Flags [S], cksum 0xdc39 (incorrect
      -> 0x44ee), seq 1069378582, win 29200, options [mss 1460,sackOK,TS val
      4294838888 ecr 0,nop,wscale 7], length 0
      
      This connection finally times out.
      
      I've only access to the TG3 hardware in this configuration thus have
      only tested this with TG3 driver.  There are a lot of other drivers
      that do not permit user changes to vlan acceleration features, and
      I don't know if they all suffere from a similar issue.
      
      The patch attempt to fix this another way.  It moves the vlan header
      stipping code out of the vlan module and always builds it into the
      kernel network core.  This way, even if vlan is not supported on
      a virtualizatoin host, the virtual machines running on top of such
      host will still work with VLANs enabled.
      
      CC: Patrick McHardy <kaber@trash.net>
      CC: Nithin Nayak Sujir <nsujir@broadcom.com>
      CC: Michael Chan <mchan@broadcom.com>
      CC: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NVladislav Yasevich <vyasevic@redhat.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5501c1
  9. 06 8月, 2014 1 次提交
  10. 31 7月, 2014 1 次提交
  11. 21 7月, 2014 2 次提交
    • V
      net: print a notification on device rename · 6fe82a39
      Veaceslav Falico 提交于
      Currently it's done silently (from the kernel part), and thus it might be
      hard to track the renames from logs.
      
      Add a simple netdev_info() to notify the rename, but only in case the
      previous name was valid.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: stephen hemminger <stephen@networkplumber.org>
      CC: Jerry Chu <hkchu@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: David Laight <David.Laight@ACULAB.COM>
      Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fe82a39
    • V
      net: print net_device reg_state in netdev_* unless it's registered · ccc7f496
      Veaceslav Falico 提交于
      This way we'll always know in what status the device is, unless it's
      running normally (i.e. NETDEV_REGISTERED).
      
      Also, emit a warning once in case of a bad reg_state.
      
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Jason Baron <jbaron@akamai.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: stephen hemminger <stephen@networkplumber.org>
      CC: Jerry Chu <hkchu@google.com>
      CC: Ben Hutchings <bhutchings@solarflare.com>
      CC: Joe Perches <joe@perches.com>
      Signed-off-by: NVeaceslav Falico <vfalico@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccc7f496
  12. 17 7月, 2014 2 次提交
  13. 16 7月, 2014 2 次提交
  14. 08 7月, 2014 2 次提交
    • L
      net: Fix NETDEV_CHANGE notifier usage causing spurious arp flush · 54951194
      Loic Prylli 提交于
      A bug was introduced in NETDEV_CHANGE notifier sequence causing the
      arp table to be sometimes spuriously cleared (including manual arp
      entries marked permanent), upon network link carrier changes.
      
      The changed argument for the notifier was applied only to a single
      caller of NETDEV_CHANGE, missing among others netdev_state_change().
      So upon net_carrier events induced by the network, which are
      triggering a call to netdev_state_change(), arp_netdev_event() would
      decide whether to clear or not arp cache based on random/junk stack
      values (a kind of read buffer overflow).
      
      Fixes: be9efd36 ("net: pass changed flags along with NETDEV_CHANGE event")
      Fixes: 6c8b4e3f ("arp: flush arp cache on IFF_NOARP change")
      Signed-off-by: NLoic Prylli <loicp@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54951194
    • T
      net: Performance fix for process_backlog · 11ef7a89
      Tom Herbert 提交于
      In process_backlog the input_pkt_queue is only checked once for new
      packets and quota is artificially reduced to reflect precisely the
      number of packets on the input_pkt_queue so that the loop exits
      appropriately.
      
      This patches changes the behavior to be more straightforward and
      less convoluted. Packets are processed until either the quota
      is met or there are no more packets to process.
      
      This patch seems to provide a small, but noticeable performance
      improvement. The performance improvement is a result of staying
      in the process_backlog loop longer which can reduce number of IPI's.
      
      Performance data using super_netperf TCP_RR with 200 flows:
      
      Before fix:
      
      88.06% CPU utilization
      125/190/309 90/95/99% latencies
      1.46808e+06 tps
      1145382 intrs.sec.
      
      With fix:
      
      87.73% CPU utilization
      122/183/296 90/95/99% latencies
      1.4921e+06 tps
      1021674.30 intrs./sec.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11ef7a89
  15. 02 7月, 2014 2 次提交
  16. 18 6月, 2014 1 次提交
  17. 09 6月, 2014 1 次提交
  18. 06 6月, 2014 1 次提交
    • S
      MPLS: Use mpls_features to activate software MPLS GSO segmentation · 3b392ddb
      Simon Horman 提交于
      If an MPLS packet requires segmentation then use mpls_features
      to determine if the software implementation should be used.
      
      As no driver advertises MPLS GSO segmentation this will always be
      the case.
      
      I had not noticed that this was necessary before as software MPLS GSO
      segmentation was already being used in my test environment. I believe that
      the reason for that is the skbs in question always had fragments and the
      driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the
      case for most drivers). Thus software segmentation was activated by
      skb_gso_ok().
      
      This introduces the overhead of an extra call to skb_network_protocol()
      in the case where where CONFIG_NET_MPLS_GSO is set and
      skb->ip_summed == CHECKSUM_NONE.
      
      Thanks to Jesse Gross for prompting me to investigate this.
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      Acked-by: NYAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b392ddb
  19. 05 6月, 2014 1 次提交
  20. 04 6月, 2014 1 次提交
  21. 02 6月, 2014 1 次提交
    • N
      net: fix wrong mac_len calculation for vlans · 4b9b1cdf
      Nikolay Aleksandrov 提交于
      After 1e785f48 ("net: Start with correct mac_len in
      skb_network_protocol") skb->mac_len is used as a start of the
      calculation in skb_network_protocol() but that is not always correct. If
      skb->protocol == 8021Q/AD, usually the vlan header is already inserted
      in the skb (i.e. vlan reorder hdr == 0). Usually when the packet enters
      dev_hard_xmit it has mac_len == 0 so we take 2 bytes from the
      destination mac address (skb->data + VLAN_HLEN) as a type in
      skb_network_protocol() and return vlan_depth == 4. In the case where TSO is
      off, then the mac_len is set but it's == 18 (ETH_HLEN + VLAN_HLEN), so
      skb_network_protocol() returns a type from inside the packet and
      offset == 22. Also make vlan_depth unsigned as suggested before.
      As suggested by Eric Dumazet, move the while() loop in the if() so we
      can avoid additional testing in fast path.
      
      Here are few netperf tests + debug printk's to illustrate:
      cat netperf.tso-on.reorder-on.bugged
      - Vlan -> device (reorder on, default, this case is okay)
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      192.168.3.1 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    7111.54
      [   81.605435] skb->len 65226 skb->gso_size 1448 skb->proto 0x800
      skb->mac_len 0 vlan_depth 0 type 0x800
      
      - Vlan -> device (reorder off, bad)
      cat netperf.tso-on.reorder-off.bugged
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      192.168.3.1 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00     241.35
      [  204.578332] skb->len 1518 skb->gso_size 0 skb->proto 0x8100
      skb->mac_len 0 vlan_depth 4 type 0x5301
      0x5301 are the last two bytes of the destination mac.
      
      And if we stop TSO, we may get even the following:
      [   83.343156] skb->len 2966 skb->gso_size 1448 skb->proto 0x8100
      skb->mac_len 18 vlan_depth 22 type 0xb84
      Because mac_len already accounts for VLAN_HLEN.
      
      After the fix:
      cat netperf.tso-on.reorder-off.fixed
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      192.168.3.1 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.01    5001.46
      [   81.888489] skb->len 65230 skb->gso_size 1448 skb->proto 0x8100
      skb->mac_len 0 vlan_depth 18 type 0x800
      
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Eric Dumazet <eric.dumazet@gmail.com>
      CC: Daniel Borkman <dborkman@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      
      Fixes:1e785f48 ("net: Start with correct mac_len in
      skb_network_protocol")
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b9b1cdf
  22. 17 5月, 2014 3 次提交
    • V
      bonding: Fix stacked device detection in arp monitoring · 44a40855
      Vlad Yasevich 提交于
      Prior to commit fbd929f2
      	bonding: support QinQ for bond arp interval
      
      the arp monitoring code allowed for proper detection of devices
      stacked on top of vlans.  Since the above commit, the
      code can still detect a device stacked on top of single
      vlan, but not a device stacked on top of Q-in-Q configuration.
      The search will only set the inner vlan tag if the route
      device is the vlan device.  However, this is not always the
      case, as it is possible to extend the stacked configuration.
      
      With this patch it is possible to provision devices on
      top Q-in-Q vlan configuration that should be used as
      a source of ARP monitoring information.
      
      For example:
      ip link add link bond0 vlan10 type vlan proto 802.1q id 10
      ip link add link vlan10 vlan100 type vlan proto 802.1q id 100
      ip link add link vlan100 type macvlan
      
      Note:  This patch limites the number of stacked VLANs to 2,
      just like before.  The original, however had another issue
      in that if we had more then 2 levels of VLANs, we would end
      up generating incorrectly tagged traffic.  This is no longer
      possible.
      
      Fixes: fbd929f2 (bonding: support QinQ for bond arp interval)
      CC: Jay Vosburgh <j.vosburgh@gmail.com>
      CC: Veaceslav Falico <vfalico@redhat.com>
      CC: Andy Gospodarek <andy@greyhouse.net>
      CC: Ding Tianhong <dingtianhong@huawei.com>
      CC: Patric McHardy <kaber@trash.net>
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a40855
    • V
      vlan: Fix lockdep warning with stacked vlan devices. · d38569ab
      Vlad Yasevich 提交于
      This reverts commit dc8eaaa0.
      	vlan: Fix lockdep warning when vlan dev handle notification
      
      Instead we use the new new API to find the lock subclass of
      our vlan device.  This way we can support configurations where
      vlans are interspersed with other devices:
        bond -> vlan -> macvlan -> vlan
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d38569ab
    • V
      net: Find the nesting level of a given device by type. · 4085ebe8
      Vlad Yasevich 提交于
      Multiple devices in the kernel can be stacked/nested and they
      need to know their nesting level for the purposes of lockdep.
      This patch provides a generic function that determines a nesting
      level of a particular device by its type (ex: vlan, macvlan, etc).
      We only care about nesting of the same type of devices.
      
      For example:
        eth0 <- vlan0.10 <- macvlan0 <- vlan1.20
      
      The nesting level of vlan1.20 would be 1, since there is another vlan
      in the stack under it.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4085ebe8