1. 23 9月, 2014 23 次提交
  2. 20 9月, 2014 17 次提交
    • A
      udp_tunnel: Only build ip6_udp_tunnel.c when IPV6 is selected · 6d967f87
      Andy Zhou 提交于
      Functions supplied in ip6_udp_tunnel.c are only needed when IPV6 is
      selected. When IPV6 is not selected, those functions are stubbed out
      in udp_tunnel.h.
      
      ==================================================================
       net/ipv6/ip6_udp_tunnel.c:15:5: error: redefinition of 'udp_sock_create6'
           int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
       In file included from net/ipv6/ip6_udp_tunnel.c:9:0:
            include/net/udp_tunnel.h:36:19: note: previous definition of 'udp_sock_create6' was here
             static inline int udp_sock_create6(struct net *net, struct udp_port_cfg *cfg,
      ==================================================================
      
      Fixes:  fd384412 udp_tunnel: Seperate ipv6 functions into its own file
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NAndy Zhou <azhou@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d967f87
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next · 6c62f606
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2014-09-18
      
      This series contains updates to ixgbe and ixgbevf.
      
      Ethan Zhao cleans up ixgbe and ixgbevf by removing bd_number from the
      adapter struct because it is not longer useful.
      
      Mark fixes ixgbe where if a hardware transmit timestamp is requested,
      an uninitialized workqueue entry may be scheduled.  Added a check for
      a PTP clock to avoid that.
      
      Jacob provides a number of cleanups for ixgbe.  Since we may call
      ixgbe_acquire_msix_vectors() prior to registering our netdevice, we
      should not use the netdevice specific printk and use e_dev_warn()
      instead.  Similar to how ixgbevf handles acquiring MSI-X vectors, we
      can return an error code instead of relying on the flag being set.
      This makes it more clear that we have failed to setup MSI-X mode and
      will make it easier to consolidate MSI-X related code into a single
      function.  In the case of disabling DCB, it is not an error since we
      still can function, we just have to let the user know.  So use
      e_dev_warn() instead of e_err().  Added warnings for other features
      that are disabled when we are without MSI-X support.  Cleanup flags
      that are no longer used or needed.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c62f606
    • D
      Merge branch 'mlx4-next' · 58310b3f
      David S. Miller 提交于
      Or Gerlitz says:
      
      ====================
      mlx4: CQE/EQE stride support
      
      This series from Ido Shamay is intended for archs having
      cache line larger then 64 bytes.
      
      Since our CQE/EQEs are generally 64B in those systems, HW will write
      twice to the same cache line consecutively, causing pipe locks due to
      he hazard prevention mechanism. For elements in a cyclic buffer, writes
      are consecutive, so entries smaller than a cache line should be
      avoided, especially if they are written at a high rate.
      
      Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
      driver to increase the distance between entries so that each will reside
      in a different cache line.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58310b3f
    • I
      net/mlx4_en: Add mlx4_en_get_cqe helper · b1b6b4da
      Ido Shamay 提交于
      This function derives the base address of the CQE from the CQE size,
      and calculates the real CQE context segment in it from the factor
      (this is like before). Before this change the code used the factor to
      calculate the base address of the CQE as well.
      
      The factor indicates in which segment of the cqe stride the cqe information
      is located. For 32-byte strides, the segment is 0, and for 64 byte strides,
      the segment is 1 (bytes 32..63). Using the factor was ok as long as we had
      only 32 and 64 byte strides. However, with larger strides, the factor is zero,
      and so cannot be used to calculate the base of the CQE.
      
      The helper uses the same method of CQE buffer pulling made by all other
      components that reads the CQE buffer (mlx4_ib driver and libmlx4).
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1b6b4da
    • I
      net/mlx4_core: Cache line EQE size support · 43c816c6
      Ido Shamay 提交于
      Enable mlx4 interrupt handler to work with EQE stride feature,
      The feature may be enabled when cache line is bigger than 64B.
      The EQE size will then be the cache line size, and the context
      segment resides in [0-31] offset.
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43c816c6
    • I
      net/mlx4_core: Enable CQE/EQE stride support · 77507aa2
      Ido Shamay 提交于
      This feature is intended for archs having cache line larger then 64B.
      
      Since our CQE/EQEs are generally 64B in those systems, HW will write
      twice to the same cache line consecutively, causing pipe locks due to
      he hazard prevention mechanism. For elements in a cyclic buffer, writes
      are consecutive, so entries smaller than a cache line should be
      avoided, especially if they are written at a high rate.
      
      Reduce consecutive writes to same cache line in CQs/EQs, by allowing the
      driver to increase the distance between entries so that each will reside
      in a different cache line. Until the introduction of this feature, there
      were two types of CQE/EQE:
      
      1. 32B stride and context in the [0-31] segment
      2. 64B stride and context in the [32-63] segment
      
      This feature introduces two additional types:
      
      3. 128B stride and context in the [0-31] segment (128B cache line)
      4. 256B stride and context in the [0-31] segment (256B cache line)
      
      Modify the mlx4_core driver to query the device for the CQE/EQE cache
      line stride capability and to enable that capability when the host
      cache line size is larger than 64 bytes (supported cache lines are
      128B and 256B).
      
      The mlx4 IB driver and libmlx4 need not be aware of this change. The PF
      context behaviour is changed to require this change in VF drivers
      running on such archs.
      Signed-off-by: NIdo Shamay <idos@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77507aa2
    • S
      net: fix sparse warnings in SNMP_UPD_PO_STATS(_BH) · 54003f11
      Sabrina Dubroca 提交于
      ptr used to be a non __percpu pointer (result of a this_cpu_ptr
      assignment, 7d720c3e ("percpu: add __percpu sparse annotations to
      net")). Since d25398df ("net: avoid reloads in SNMP_UPD_PO_STATS"),
      that's no longer the case, SNMP_UPD_PO_STATS uses this_cpu_add and ptr
      is now __percpu.
      
      Silence sparse warnings by preserving the original type and
      annotation, and remove the out-of-date comment.
      
      warning: incorrect type in initializer (different address spaces)
         expected unsigned long long *ptr
         got unsigned long long [noderef] <asn:3>*<noident>
      warning: incorrect type in initializer (different address spaces)
         expected void const [noderef] <asn:3>*__vpp_verify
         got unsigned long long *<noident>
      warning: incorrect type in initializer (different address spaces)
         expected void const [noderef] <asn:3>*__vpp_verify
         got unsigned long long *<noident>
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54003f11
    • D
      Merge branch 'fou-next' · fb5690d2
      David S. Miller 提交于
      Tom Herbert says:
      
      ====================
      net: foo-over-udp (fou)
      
      This patch series implements foo-over-udp. The idea is that we can
      encapsulate different IP protocols in UDP packets. The rationale for
      this is that networking devices such as NICs and switches are usually
      implemented with UDP (and TCP) specific mechanims for processing. For
      instance, many switches and routers will implement a 5-tuple hash
      for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
      RSS (on NICs). Many NICs also only provide rudimentary checksum
      offload (basic TCP and UDP packet), with foo-over-udp we may be
      able to leverage these NICs to offload checksums of tunneled packets
      (using checksum unnecessary conversion and eventually remote checksum
      offload)
      
      An example encapsulation of IPIP over FOU is diagrammed below. As
      illustrated, the packet overhead for FOU is the 8 byte UDP header.
      
      +------------------+
      |    IPv4 hdr      |
      +------------------+
      |     UDP hdr      |
      +------------------+
      |    IPv4 hdr      |
      +------------------+
      |     TCP hdr      |
      +------------------+
      |   TCP payload    |
      +------------------+
      
      Conceptually, FOU should be able to encapsulate any IP protocol.
      The FOU header (UDP hdr.) is essentially an inserted header between the
      IP header and transport, so in the case of TCP or UDP encapsulation
      the pseudo header would be based on the outer IP header and its length
      field must not include the UDP header.
      
      * Receive
      
      In this patch set the RX path for FOU is implemented in a new fou
      module. To enable FOU for a particular protocol, a UDP-FOU socket is
      opened to the port to receive FOU packets. The socket is mapped to the
      IP protocol for the packets. The XFRM mechanism used to receive
      encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
      UDP is removed and packet is reinjected in the stack for the
      corresponding protocol associated with the socket (return -protocol
      from udp_encap_rcv function).
      
      GRO is provided with the appropriate fou_gro_receive and
      fou_gro_complete. These routines need to know the encapsulation
      protocol so we save that in udp_offloads structure with the port
      and pass it in the napi_gro_cb structure.
      
      * TX
      
      This patch series implements FOU transmit encapsulation for IPIP, GRE, and
      SIT. This done by some common infrastructure in ip_tunnel including an
      ip_tunnel_encap to perform FOU encapsulation and common configuration
      to enable FOU on IP tunnels. FOU is configured on existing tunnels and
      does not create any new interfaces. The transmit and receive paths are
      independent, so use of FOU may be assymetric between tunnel endpoints.
      
      * Configuration
      
      The fou module using netlink to configure FOU receive ports. The ip
      command can be augmented with a fou subcommand to support this. e.g. to
      configure FOU for IPIP on port 5555:
      
        ip fou add port 5555 ipproto 4
      
      GRE, IPIP, and SIT have been modified with netlink commands to
      configure use of FOU on transmit. The "ip link" command will be
      augmented with an encap subcommand (for supporting various forms of
      secondary encapsulation). For instance, to configure an ipip tunnel
      with FOU on port 5555:
      
        ip link add name tun1 type ipip \
          remote 192.168.1.1 local 192.168.1.2 ttl 225 \
          encap fou encap-sport auto encap-dport 5555
      
      * Notes
        - This patch set does not implement GSO for FOU. The UDP encapsulation
          code assumes TEB, so that will need to be reimplemented.
        - When a packet is received through FOU, the UDP header is not
          actually removed for the skbuf, pointers to transport header
          and length in the IP header are updated (like in ESP/UDP RX). A
          side effect is the IP header will now appear to have an incorrect
          checksum by an external observer (e.g. tcpdump), it will be off
          by sizeof UDP header. If necessary we could adjust the checksum
          to compensate.
        - Performance results are below. My expectation is that FOU should
          entail little overhead (clearly there is some work to do :-) ).
          Optimizing UDP socket lookup for encapsulation ports should help
          significantly.
        - I really don't expect/want devices to have special support for any
          of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
          and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
          steering is provided by commonly implemented UDP hashing. GRO/GSO
          seem fairly comparable with LRO/TSO already.
      
      * Performance
      
      Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
      This was performed on bnx2x and I disabled TSO/GSO on sender to get
      fair comparison for FOU versus non-FOU. CPU utilization is reported
      for receive in TCP_STREAM.
      
        GRE
          IPv4, FOU, UDP checksum enabled
            TCP_STREAM
              24.85% CPU utilization
              9310.6 Mbps
            TCP_RR
              94.2% CPU utilization
              155/249/460 90/95/99% latencies
              1.17018e+06 tps
          IPv4, FOU, UDP checksum disabled
            TCP_STREAM
              31.04% CPU utilization
              9302.22 Mbps
            TCP_RR
              94.13% CPU utilization
              154/239/419 90/95/99% latencies
              1.17555e+06 tps
          IPv4, no FOU
            TCP_STREAM
              23.13% CPU utilization
              9354.58 Mbps
            TCP_RR
              90.24% CPU utilization
              156/228/360 90/95/99% latencies
              1.18169e+06 tps
      
        IPIP
          FOU, UDP checksum enabled
            TCP_STREAM
              24.13% CPU utilization
              9328 Mbps
            TCP_RR
              94.23
              149/237/429 90/95/99% latencies
              1.19553e+06 tps
          FOU, UDP checksum disabled
            TCP_STREAM
              29.13% CPU utilization
              9370.25 Mbps
            TCP_RR
              94.13% CPU utilization
              149/232/398 90/95/99% latencies
              1.19225e+06 tps
          No FOU
            TCP_STREAM
              10.43% CPU utilization
              5302.03 Mbps
            TCP_RR
              51.53% CPU utilization
              215/324/475 90/95/99% latencies
              864998 tps
      
        SIT
          FOU, UDP checksum enabled
            TCP_STREAM
              30.38% CPU utilization
              9176.76 Mbps
            TCP_RR
              96.9% CPU utilization
              170/281/581 90/95/99% latencies
              1.03372e+06 tps
          FOU, UDP checksum disabled
            TCP_STREAM
              39.6% CPU utilization
              9176.57 Mbps
            TCP_RR
              97.14% CPU utilization
              167/272/548 90/95/99% latencies
              1.03203e+06 tps
          No FOU
            TCP_STREAM
              11.2% CPU utilization
              4636.05 Mbps
            TCP_RR
              59.51% CPU utilization
              232/346/489 90/95/99% latencies
              813199 tps
      
      v2:
        - Removed encap IP tunnel ioctls, configuration is done by netlink
          only.
        - Don't export fou_create and fou_destroy, they are currently
          intended to be called within fou module only.
        - Filled on tunnel netlink structures and functions for new values.
      
      v3:
        - Fixed change logs for some of the patches.
        - Remove inline from fou_gro_receive and fou_gro_complete, let
          compiler decide on these.
      
      v4:
        - Don't need to cast void in fou_from_sock
        - Removed incorrest htons for port in fou_destroy
        - Some minor cleanup for readability
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb5690d2
    • T
      gre: Setup and TX path for gre/UDP foo-over-udp encapsulation · 4565e991
      Tom Herbert 提交于
      Added netlink attrs to configure FOU encapsulation for GRE, netlink
      handling of these flags, and properly adjust MTU for encapsulation.
      ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
      encapsulation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4565e991
    • T
      ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation · 473ab820
      Tom Herbert 提交于
      Add netlink handling for IP tunnel encapsulation parameters and
      and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
      from ip_tunnel_xmit to actually perform FOU encapsulation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      473ab820
    • T
      sit: Setup and TX path for sit/UDP foo-over-udp encapsulation · 14909664
      Tom Herbert 提交于
      Added netlink handling of IP tunnel encapulation paramters, properly
      adjust MTU for encapsulation. Added ip_tunnel_encap call to
      ipip6_tunnel_xmit to actually perform FOU encapsulation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14909664
    • T
      net: Changes to ip_tunnel to support foo-over-udp encapsulation · 56328486
      Tom Herbert 提交于
      This patch changes IP tunnel to support (secondary) encapsulation,
      Foo-over-UDP. Changes include:
      
      1) Adding tun_hlen as the tunnel header length, encap_hlen as the
         encapsulation header length, and hlen becomes the grand total
         of these.
      2) Added common netlink define to support FOU encapsulation.
      3) Routines to perform FOU encapsulation.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56328486
    • T
      fou: Add GRO support · afe93325
      Tom Herbert 提交于
      Implement fou_gro_receive and fou_gro_complete, and populate these
      in the correponsing udp_offloads for the socket. Added ipproto to
      udp_offloads and pass this from UDP to the fou GRO routine in proto
      field of napi_gro_cb structure.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afe93325
    • T
      fou: Support for foo-over-udp RX path · 23461551
      Tom Herbert 提交于
      This patch provides a receive path for foo-over-udp. This allows
      direct encapsulation of IP protocols over UDP. The bound destination
      port is used to map to an IP protocol, and the XFRM framework
      (udp_encap_rcv) is used to receive encapsulated packets. Upon
      reception, the encapsulation header is logically removed (pointer
      to transport header is advanced) and the packet is reinjected into
      the receive path with the IP protocol indicated by the mapping.
      
      Netlink is used to configure FOU ports. The configuration information
      includes the port number to bind to and the IP protocol corresponding
      to that port.
      
      This should support GRE/UDP
      (http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
      as will as the other IP tunneling protocols (IPIP, SIT).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23461551
    • T
      net: Export inet_offloads and inet6_offloads · ce3e0286
      Tom Herbert 提交于
      Want to be able to use these in foo-over-udp offloads, etc.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce3e0286
    • J
      net: sched: cls_u32: rcu can not be last node · 4e2840ee
      John Fastabend 提交于
      tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
      structure and pads the structure with tc_u32_key fields for each key.
      
       kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)
      
      CC: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4e2840ee
    • E
      net: sched: use __skb_queue_head_init() where applicable · ab34f648
      Eric Dumazet 提交于
      pfifo_fast and htb use skb lists, without needing their spinlocks.
      (They instead use the standard qdisc lock)
      
      We can use __skb_queue_head_init() instead of skb_queue_head_init()
      to be consistent.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab34f648