1. 22 7月, 2015 2 次提交
    • T
      vxlan: Factor out device configuration · 0dfbdf41
      Thomas Graf 提交于
      This factors out the device configuration out of the RTNL newlink
      API which allows for in-kernel creation of VXLAN net_devices.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0dfbdf41
    • T
      vxlan: Flow based tunneling · ee122c79
      Thomas Graf 提交于
      Allows putting a VXLAN device into a new flow-based mode in which
      skbs with a ip_tunnel_info dst metadata attached will be encapsulated
      according to the instructions stored in there with the VXLAN device
      defaults taken into consideration.
      
      Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
      set, the packet processing will populate a ip_tunnel_info struct for
      each packet received and attach it to the skb using the new metadata
      dst.  The metadata structure will contain the outer header and tunnel
      header fields which have been stripped off. Layers further up in the
      stack such as routing, tc or netfitler can later match on these fields
      and perform forwarding. It is the responsibility of upper layers to
      ensure that the flag is set if the metadata is needed. The flag limits
      the additional cost of metadata collecting based on demand.
      
      This prepares the VXLAN device to be steered by the routing and other
      subsystems which allows to support encapsulation for a large number
      of tunnel endpoints and tunnel ids through a single net_device which
      improves the scalability.
      
      It also allows for OVS to leverage this mode which in turn allows for
      the removal of the OVS specific VXLAN code.
      
      Because the skb is currently scrubed in vxlan_rcv(), the attachment of
      the new dst metadata is postponed until after scrubing which requires
      the temporary addition of a new member to vxlan_metadata. This member
      is removed again in a later commit after the indirect VXLAN receive API
      has been removed.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee122c79
  2. 08 4月, 2015 1 次提交
  3. 14 3月, 2015 1 次提交
    • A
      vxlan: fix wrong usage of VXLAN_VID_MASK · 40fb70f3
      Alexey Kodanev 提交于
      commit dfd8645e wrongly assumes that VXLAN_VDI_MASK includes
      eight lower order reserved bits of VNI field that are using for remote
      checksum offload.
      
      Right now, when VNI number greater then 0xffff, vxlan_udp_encap_recv()
      will always return with 'bad_flag' error, reducing the usable vni range
      from 0..16777215 to 0..65535. Also, it doesn't really check whether RCO
      bits processed or not.
      
      Fix it by adding new VNI mask which has all 32 bits of VNI field:
      24 bits for id and 8 bits for other usage.
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40fb70f3
  4. 12 3月, 2015 1 次提交
  5. 12 2月, 2015 1 次提交
  6. 25 1月, 2015 1 次提交
  7. 15 1月, 2015 3 次提交
    • T
      vxlan: Only bind to sockets with compatible flags enabled · ac5132d1
      Thomas Graf 提交于
      A VXLAN net_device looking for an appropriate socket may only consider
      a socket which has a matching set of flags/extensions enabled. If
      incompatible flags are enabled, return a conflict to have the caller
      create a distinct socket with distinct port.
      
      The OVS VXLAN port is kept unaware of extensions at this point.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac5132d1
    • T
      vxlan: Group Policy extension · 3511494c
      Thomas Graf 提交于
      Implements supports for the Group Policy VXLAN extension [0] to provide
      a lightweight and simple security label mechanism across network peers
      based on VXLAN. The security context and associated metadata is mapped
      to/from skb->mark. This allows further mapping to a SELinux context
      using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
      tc, etc.
      
      The group membership is defined by the lower 16 bits of skb->mark, the
      upper 16 bits are used for flags.
      
      SELinux allows to manage label to secure local resources. However,
      distributed applications require ACLs to implemented across hosts. This
      is typically achieved by matching on L2-L4 fields to identify the
      original sending host and process on the receiver. On top of that,
      netlabel and specifically CIPSO [1] allow to map security contexts to
      universal labels.  However, netlabel and CIPSO are relatively complex.
      This patch provides a lightweight alternative for overlay network
      environments with a trusted underlay. No additional control protocol
      is required.
      
                 Host 1:                       Host 2:
      
            Group A        Group B        Group B     Group A
            +-----+   +-------------+    +-------+   +-----+
            | lxc |   | SELinux CTX |    | httpd |   | VM  |
            +--+--+   +--+----------+    +---+---+   +--+--+
      	  \---+---/                     \----+---/
      	      |                              |
      	  +---+---+                      +---+---+
      	  | vxlan |                      | vxlan |
      	  +---+---+                      +---+---+
      	      +------------------------------+
      
      Backwards compatibility:
      A VXLAN-GBP socket can receive standard VXLAN frames and will assign
      the default group 0x0000 to such frames. A Linux VXLAN socket will
      drop VXLAN-GBP  frames. The extension is therefore disabled by default
      and needs to be specifically enabled:
      
         ip link add [...] type vxlan [...] gbp
      
      In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
      must run on a separate port number.
      
      Examples:
       iptables:
        host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
        host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
      
       OVS:
        # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
        # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
      
      [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
      [1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3511494c
    • T
      vxlan: Remote checksum offload · dfd8645e
      Tom Herbert 提交于
      Add support for remote checksum offload in VXLAN. This uses a
      reserved bit to indicate that RCO is being done, and uses the low order
      reserved eight bits of the VNI to hold the start and offset values in a
      compressed manner.
      
      Start is encoded in the low order seven bits of VNI. This is start >> 1
      so that the checksum start offset is 0-254 using even values only.
      Checksum offset (transport checksum field) is indicated in the high
      order bit in the low order byte of the VNI. If the bit is set, the
      checksum field is for UDP (so offset = start + 6), else checksum
      field is for TCP (so offset = start + 16). Only TCP and UDP are
      supported in this implementation.
      
      Remote checksum offload for VXLAN is described in:
      
      https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
      
      Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
      
      With UDP checksums and Remote Checksum Offload
        IPv4
            Client
              11.84% CPU utilization
            Server
              12.96% CPU utilization
            9197 Mbps
        IPv6
            Client
              12.46% CPU utilization
            Server
              14.48% CPU utilization
            8963 Mbps
      
      With UDP checksums, no remote checksum offload
        IPv4
            Client
              15.67% CPU utilization
            Server
              14.83% CPU utilization
            9094 Mbps
        IPv6
            Client
              16.21% CPU utilization
            Server
              14.32% CPU utilization
            9058 Mbps
      
      No UDP checksums
        IPv4
            Client
              15.03% CPU utilization
            Server
              23.09% CPU utilization
            9089 Mbps
        IPv6
            Client
              16.18% CPU utilization
            Server
              26.57% CPU utilization
             8954 Mbps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd8645e
  8. 13 1月, 2015 1 次提交
    • T
      vxlan: Improve support for header flags · 3bf39475
      Tom Herbert 提交于
      This patch cleans up the header flags of VXLAN in anticipation of
      defining some new ones:
      
      - Move header related definitions from vxlan.c to vxlan.h
      - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
      - Move check for unknown flags to after we find vxlan_sock, this
        assumes that some flags may be processed based on tunnel
        configuration
      - Add a comment about why the stack treating unknown set flags as an
        error instead of ignoring them
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bf39475
  9. 27 12月, 2014 1 次提交
    • J
      net: Generalize ndo_gso_check to ndo_features_check · 5f35227e
      Jesse Gross 提交于
      GSO isn't the only offload feature with restrictions that
      potentially can't be expressed with the current features mechanism.
      Checksum is another although it's a general issue that could in
      theory apply to anything. Even if it may be possible to
      implement these restrictions in other ways, it can result in
      duplicate code or inefficient per-packet behavior.
      
      This generalizes ndo_gso_check so that drivers can remove any
      features that don't make sense for a given packet, similar to
      netif_skb_features(). It also converts existing driver
      restrictions to the new format, completing the work that was
      done to support tunnel protocols since the issues apply to
      checksums as well.
      
      By actually removing features from the set that are used to do
      offloading, it solves another problem with the existing
      interface. In these cases, GSO would run with the original set
      of features and not do anything because it appears that
      segmentation is not required.
      
      CC: Tom Herbert <therbert@google.com>
      CC: Joe Stringer <joestringer@nicira.com>
      CC: Eric Dumazet <edumazet@google.com>
      CC: Hayes Wang <hayeswang@realtek.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Acked-by: NTom Herbert <therbert@google.com>
      Fixes: 04ffcb25 ("net: Add ndo_gso_check")
      Tested-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f35227e
  10. 19 11月, 2014 1 次提交
  11. 15 11月, 2014 1 次提交
  12. 08 7月, 2014 1 次提交
  13. 05 6月, 2014 1 次提交
  14. 25 4月, 2014 1 次提交
    • N
      vxlan: add x-netns support · f01ec1c0
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated.
      The vxlan socket is openned into the i/o netns, ie into the netns where
      encapsulated packets are received. The socket lookup is done into this netns to
      find the corresponding vxlan tunnel. After decapsulation, the packet is
      injecting into the corresponding interface which may stand to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      
      Configuration example:
      ip netns add netns1
      ip netns exec netns1 ip link set lo up
      ip link add vxlan10 type vxlan id 10 group 239.0.0.10 dev eth0 dstport 0
      ip link set vxlan10 netns netns1
      ip netns exec netns1 ip addr add 192.168.0.249/24 broadcast 192.168.0.255 dev vxlan10
      ip netns exec netns1 ip link set vxlan10 up
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f01ec1c0
  15. 22 1月, 2014 1 次提交
    • O
      net: Add GRO support for vxlan traffic · dc01e7d3
      Or Gerlitz 提交于
      Add GRO handlers for vxlann, by using the UDP GRO infrastructure.
      
      For single TCP session that goes through vxlan tunneling I got nice
      improvement from 6.8Gbs to 11.5Gbs
      
      --> UDP/VXLAN GRO disabled
      $ netperf  -H 192.168.52.147 -c -C
      
      $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  65536  65536    10.00      6799.75   12.54    24.79    0.604   1.195
      
      --> UDP/VXLAN GRO enabled
      
      $ netperf -t TCP_STREAM -H 192.168.52.147 -c -C
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.52.147 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
       87380  65536  65536    10.00      11562.72   24.90    20.34    0.706   0.577
      Signed-off-by: NShlomo Pongratz <shlomop@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc01e7d3
  16. 29 10月, 2013 1 次提交
  17. 06 9月, 2013 1 次提交
    • J
      vxlan: Notify drivers for listening UDP port changes · 53cf5275
      Joseph Gasparakis 提交于
      This patch adds two more ndo ops: ndo_add_rx_vxlan_port() and
      ndo_del_rx_vxlan_port().
      
      Drivers can get notifications through the above functions about changes
      of the UDP listening port of VXLAN. Also, when physical ports come up,
      now they can call vxlan_get_rx_port() in order to obtain the port number(s)
      of the existing VXLAN interface in case they already up before them.
      
      This information about the listening UDP port would be used for VXLAN
      related offloads.
      
      A big thank you to John Fastabend (john.r.fastabend@intel.com) for his
      input and his suggestions on this patch set.
      
      CC: John Fastabend <john.r.fastabend@intel.com>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJoseph Gasparakis <joseph.gasparakis@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53cf5275
  18. 04 9月, 2013 1 次提交
  19. 01 9月, 2013 1 次提交
  20. 20 8月, 2013 2 次提交