1. 22 7月, 2015 1 次提交
    • T
      vxlan: Flow based tunneling · ee122c79
      Thomas Graf 提交于
      Allows putting a VXLAN device into a new flow-based mode in which
      skbs with a ip_tunnel_info dst metadata attached will be encapsulated
      according to the instructions stored in there with the VXLAN device
      defaults taken into consideration.
      
      Similar on the receive side, if the VXLAN_F_COLLECT_METADATA flag is
      set, the packet processing will populate a ip_tunnel_info struct for
      each packet received and attach it to the skb using the new metadata
      dst.  The metadata structure will contain the outer header and tunnel
      header fields which have been stripped off. Layers further up in the
      stack such as routing, tc or netfitler can later match on these fields
      and perform forwarding. It is the responsibility of upper layers to
      ensure that the flag is set if the metadata is needed. The flag limits
      the additional cost of metadata collecting based on demand.
      
      This prepares the VXLAN device to be steered by the routing and other
      subsystems which allows to support encapsulation for a large number
      of tunnel endpoints and tunnel ids through a single net_device which
      improves the scalability.
      
      It also allows for OVS to leverage this mode which in turn allows for
      the removal of the OVS specific VXLAN code.
      
      Because the skb is currently scrubed in vxlan_rcv(), the attachment of
      the new dst metadata is postponed until after scrubing which requires
      the temporary addition of a new member to vxlan_metadata. This member
      is removed again in a later commit after the indirect VXLAN receive API
      has been removed.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee122c79
  2. 21 7月, 2015 1 次提交
  3. 28 5月, 2015 1 次提交
  4. 19 5月, 2015 1 次提交
  5. 10 5月, 2015 1 次提交
  6. 06 5月, 2015 1 次提交
  7. 23 4月, 2015 1 次提交
  8. 10 4月, 2015 1 次提交
  9. 09 4月, 2015 2 次提交
    • W
      vxlan: do not exit on error in vxlan_stop() · f13b1689
      WANG Cong 提交于
      We need to clean up vxlan despite vxlan_igmp_leave() fails.
      
      This fixes the following kernel warning:
      
       WARNING: CPU: 0 PID: 6 at lib/debugobjects.c:263 debug_print_object+0x7c/0x8d()
       ODEBUG: free active (active state 0) object type: timer_list hint: vxlan_cleanup+0x0/0xd0
       CPU: 0 PID: 6 Comm: kworker/u8:0 Not tainted 4.0.0-rc7+ #953
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       Workqueue: netns cleanup_net
        0000000000000009 ffff88011955f948 ffffffff81a25f5a 00000000253f253e
        ffff88011955f998 ffff88011955f988 ffffffff8107608e 0000000000000000
        ffffffff814deba2 ffff8800d4e94000 ffffffff82254c30 ffffffff81fbe455
       Call Trace:
        [<ffffffff81a25f5a>] dump_stack+0x4c/0x65
        [<ffffffff8107608e>] warn_slowpath_common+0x9c/0xb6
        [<ffffffff814deba2>] ? debug_print_object+0x7c/0x8d
        [<ffffffff81076116>] warn_slowpath_fmt+0x46/0x48
        [<ffffffff814deba2>] debug_print_object+0x7c/0x8d
        [<ffffffff81666bf1>] ? vxlan_fdb_destroy+0x5b/0x5b
        [<ffffffff814dee02>] __debug_check_no_obj_freed+0xc3/0x15f
        [<ffffffff814df728>] debug_check_no_obj_freed+0x12/0x16
        [<ffffffff8117ae4e>] slab_free_hook+0x64/0x6c
        [<ffffffff8114deaa>] ? kvfree+0x31/0x33
        [<ffffffff8117dc66>] kfree+0x101/0x1ac
        [<ffffffff8114deaa>] kvfree+0x31/0x33
        [<ffffffff817d4137>] netdev_freemem+0x18/0x1a
        [<ffffffff817e8b52>] netdev_release+0x2e/0x32
        [<ffffffff815b4163>] device_release+0x5a/0x92
        [<ffffffff814bd4dd>] kobject_cleanup+0x49/0x5e
        [<ffffffff814bd3ff>] kobject_put+0x45/0x49
        [<ffffffff817d3fc1>] netdev_run_todo+0x26f/0x283
        [<ffffffff817d4873>] ? rollback_registered_many+0x20f/0x23b
        [<ffffffff817e0c80>] rtnl_unlock+0xe/0x10
        [<ffffffff817d4af0>] default_device_exit_batch+0x12a/0x139
        [<ffffffff810aadfa>] ? wait_woken+0x8f/0x8f
        [<ffffffff817c8e14>] ops_exit_list+0x2b/0x57
        [<ffffffff817c9b21>] cleanup_net+0x154/0x1e7
        [<ffffffff8108b05d>] process_one_work+0x255/0x4ad
        [<ffffffff8108af69>] ? process_one_work+0x161/0x4ad
        [<ffffffff8108b4b1>] worker_thread+0x1cd/0x2ab
        [<ffffffff8108b2e4>] ? process_scheduled_works+0x2f/0x2f
        [<ffffffff81090686>] kthread+0xd4/0xdc
        [<ffffffff8109eca3>] ? local_clock+0x19/0x22
        [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
        [<ffffffff81a31c48>] ret_from_fork+0x58/0x90
        [<ffffffff810905b2>] ? __kthread_parkme+0x83/0x83
      
      For the long-term, we should handle NETDEV_{UP,DOWN} event
      from the lower device of a tunnel device.
      
      Fixes: 56ef9c90 ("vxlan: Move socket initialization to within rtnl scope")
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f13b1689
    • W
      vxlan: fix a shadow local variable · 2790460e
      WANG Cong 提交于
      Commit 79b16aad
      ("udp_tunnel: Pass UDP socket down through udp_tunnel{, 6}_xmit_skb()")
      introduce 'sk' but we already have one inner 'sk'.
      
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2790460e
  10. 08 4月, 2015 1 次提交
  11. 02 4月, 2015 1 次提交
  12. 01 4月, 2015 3 次提交
  13. 24 3月, 2015 1 次提交
  14. 21 3月, 2015 1 次提交
  15. 19 3月, 2015 2 次提交
  16. 14 3月, 2015 1 次提交
    • A
      vxlan: fix wrong usage of VXLAN_VID_MASK · 40fb70f3
      Alexey Kodanev 提交于
      commit dfd8645e wrongly assumes that VXLAN_VDI_MASK includes
      eight lower order reserved bits of VNI field that are using for remote
      checksum offload.
      
      Right now, when VNI number greater then 0xffff, vxlan_udp_encap_recv()
      will always return with 'bad_flag' error, reducing the usable vni range
      from 0..16777215 to 0..65535. Also, it doesn't really check whether RCO
      bits processed or not.
      
      Fix it by adding new VNI mask which has all 32 bits of VNI field:
      24 bits for id and 8 bits for other usage.
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40fb70f3
  17. 13 3月, 2015 1 次提交
  18. 12 2月, 2015 3 次提交
    • T
      vxlan: Use checksum partial with remote checksum offload · 0ace2ca8
      Tom Herbert 提交于
      Change remote checksum handling to set checksum partial as default
      behavior. Added an iflink parameter to configure not using
      checksum partial (calling csum_partial to update checksum).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ace2ca8
    • T
      net: Infrastructure for CHECKSUM_PARTIAL with remote checsum offload · 15e2396d
      Tom Herbert 提交于
      This patch adds infrastructure so that remote checksum offload can
      set CHECKSUM_PARTIAL instead of calling csum_partial and writing
      the modfied checksum field.
      
      Add skb_remcsum_adjust_partial function to set an skb for using
      CHECKSUM_PARTIAL with remote checksum offload.  Changed
      skb_remcsum_process and skb_gro_remcsum_process to take a boolean
      argument to indicate if checksum partial can be set or the
      checksum needs to be modified using the normal algorithm.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e2396d
    • T
      net: Fix remcsum in GRO path to not change packet · 26c4f7da
      Tom Herbert 提交于
      Remote checksum offload processing is currently the same for both
      the GRO and non-GRO path. When the remote checksum offload option
      is encountered, the checksum field referred to is modified in
      the packet. So in the GRO case, the packet is modified in the
      GRO path and then the operation is skipped when the packet goes
      through the normal path based on skb->remcsum_offload. There is
      a problem in that the packet may be modified in the GRO path, but
      then forwarded off host still containing the remote checksum option.
      A remote host will again perform RCO but now the checksum verification
      will fail since GRO RCO already modified the checksum.
      
      To fix this, we ensure that GRO restores a packet to it's original
      state before returning. In this model, when GRO processes a remote
      checksum option it still changes the checksum per the algorithm
      but on return from lower layer processing the checksum is restored
      to its original value.
      
      In this patch we add define gro_remcsum structure which is passed
      to skb_gro_remcsum_process to save offset and delta for the checksum
      being changed. After lower layer processing, skb_gro_remcsum_cleanup
      is called to restore the checksum before returning from GRO.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26c4f7da
  19. 09 2月, 2015 1 次提交
  20. 05 2月, 2015 2 次提交
  21. 30 1月, 2015 1 次提交
  22. 28 1月, 2015 1 次提交
  23. 25 1月, 2015 2 次提交
    • T
      vxlan: Eliminate dependency on UDP socket in transmit path · af33c1ad
      Tom Herbert 提交于
      In the vxlan transmit path there is no need to reference the socket
      for a tunnel which is needed for the receive side. We do, however,
      need the vxlan_dev flags. This patch eliminate references
      to the socket in the transmit path, and changes VXLAN_F_UNSHAREABLE
      to be VXLAN_F_RCV_FLAGS. This mask is used to store the flags
      applicable to receive (GBP, CSUM6_RX, and REMCSUM_RX) in the
      vxlan_sock flags.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af33c1ad
    • T
      udp: Do not require sock in udp_tunnel_xmit_skb · d998f8ef
      Tom Herbert 提交于
      The UDP tunnel transmit functions udp_tunnel_xmit_skb and
      udp_tunnel6_xmit_skb include a socket argument. The socket being
      passed to the functions (from VXLAN) is a UDP created for receive
      side. The only thing that the socket is used for in the transmit
      functions is to get the setting for checksum (enabled or zero).
      This patch removes the argument and and adds a nocheck argument
      for checksum setting. This eliminates the unnecessary dependency
      on a UDP socket for UDP tunnel transmit.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d998f8ef
  24. 24 1月, 2015 1 次提交
  25. 20 1月, 2015 1 次提交
  26. 18 1月, 2015 1 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
  27. 15 1月, 2015 4 次提交
    • T
      vxlan: Only bind to sockets with compatible flags enabled · ac5132d1
      Thomas Graf 提交于
      A VXLAN net_device looking for an appropriate socket may only consider
      a socket which has a matching set of flags/extensions enabled. If
      incompatible flags are enabled, return a conflict to have the caller
      create a distinct socket with distinct port.
      
      The OVS VXLAN port is kept unaware of extensions at this point.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac5132d1
    • T
      vxlan: Group Policy extension · 3511494c
      Thomas Graf 提交于
      Implements supports for the Group Policy VXLAN extension [0] to provide
      a lightweight and simple security label mechanism across network peers
      based on VXLAN. The security context and associated metadata is mapped
      to/from skb->mark. This allows further mapping to a SELinux context
      using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
      tc, etc.
      
      The group membership is defined by the lower 16 bits of skb->mark, the
      upper 16 bits are used for flags.
      
      SELinux allows to manage label to secure local resources. However,
      distributed applications require ACLs to implemented across hosts. This
      is typically achieved by matching on L2-L4 fields to identify the
      original sending host and process on the receiver. On top of that,
      netlabel and specifically CIPSO [1] allow to map security contexts to
      universal labels.  However, netlabel and CIPSO are relatively complex.
      This patch provides a lightweight alternative for overlay network
      environments with a trusted underlay. No additional control protocol
      is required.
      
                 Host 1:                       Host 2:
      
            Group A        Group B        Group B     Group A
            +-----+   +-------------+    +-------+   +-----+
            | lxc |   | SELinux CTX |    | httpd |   | VM  |
            +--+--+   +--+----------+    +---+---+   +--+--+
      	  \---+---/                     \----+---/
      	      |                              |
      	  +---+---+                      +---+---+
      	  | vxlan |                      | vxlan |
      	  +---+---+                      +---+---+
      	      +------------------------------+
      
      Backwards compatibility:
      A VXLAN-GBP socket can receive standard VXLAN frames and will assign
      the default group 0x0000 to such frames. A Linux VXLAN socket will
      drop VXLAN-GBP  frames. The extension is therefore disabled by default
      and needs to be specifically enabled:
      
         ip link add [...] type vxlan [...] gbp
      
      In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
      must run on a separate port number.
      
      Examples:
       iptables:
        host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
        host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
      
       OVS:
        # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
        # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
      
      [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
      [1] http://lwn.net/Articles/204905/Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3511494c
    • T
      vxlan: Remote checksum offload · dfd8645e
      Tom Herbert 提交于
      Add support for remote checksum offload in VXLAN. This uses a
      reserved bit to indicate that RCO is being done, and uses the low order
      reserved eight bits of the VNI to hold the start and offset values in a
      compressed manner.
      
      Start is encoded in the low order seven bits of VNI. This is start >> 1
      so that the checksum start offset is 0-254 using even values only.
      Checksum offset (transport checksum field) is indicated in the high
      order bit in the low order byte of the VNI. If the bit is set, the
      checksum field is for UDP (so offset = start + 6), else checksum
      field is for TCP (so offset = start + 16). Only TCP and UDP are
      supported in this implementation.
      
      Remote checksum offload for VXLAN is described in:
      
      https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
      
      Tested by running 200 TCP_STREAM connections with VXLAN (over IPv4).
      
      With UDP checksums and Remote Checksum Offload
        IPv4
            Client
              11.84% CPU utilization
            Server
              12.96% CPU utilization
            9197 Mbps
        IPv6
            Client
              12.46% CPU utilization
            Server
              14.48% CPU utilization
            8963 Mbps
      
      With UDP checksums, no remote checksum offload
        IPv4
            Client
              15.67% CPU utilization
            Server
              14.83% CPU utilization
            9094 Mbps
        IPv6
            Client
              16.21% CPU utilization
            Server
              14.32% CPU utilization
            9058 Mbps
      
      No UDP checksums
        IPv4
            Client
              15.03% CPU utilization
            Server
              23.09% CPU utilization
            9089 Mbps
        IPv6
            Client
              16.18% CPU utilization
            Server
              26.57% CPU utilization
             8954 Mbps
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd8645e
    • T
      udp: pass udp_offload struct to UDP gro callbacks · a2b12f3c
      Tom Herbert 提交于
      This patch introduces udp_offload_callbacks which has the same
      GRO functions (but not a GSO function) as offload_callbacks,
      except there is an argument to a udp_offload struct passed to
      gro_receive and gro_complete functions. This additional argument
      can be used to retrieve the per port structure of the encapsulation
      for use in gro processing (mostly by doing container_of on the
      structure).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2b12f3c
  28. 14 1月, 2015 1 次提交
  29. 13 1月, 2015 1 次提交
    • T
      vxlan: Improve support for header flags · 3bf39475
      Tom Herbert 提交于
      This patch cleans up the header flags of VXLAN in anticipation of
      defining some new ones:
      
      - Move header related definitions from vxlan.c to vxlan.h
      - Change VXLAN_FLAGS to be VXLAN_HF_VNI (only currently defined flag)
      - Move check for unknown flags to after we find vxlan_sock, this
        assumes that some flags may be processed based on tunnel
        configuration
      - Add a comment about why the stack treating unknown set flags as an
        error instead of ignoring them
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3bf39475