1. 02 8月, 2017 1 次提交
    • K
      vxlan: fix remcsum when GRO on and CHECKSUM_PARTIAL boundary is outer UDP · be73b304
      K. Den 提交于
      In the case that GRO is turned on and the original received packet is
      CHECKSUM_PARTIAL, if the outer UDP header is exactly at the last
      csum-unnecessary point, which for instance could occur if the packet
      comes from another Linux guest on the same Linux host, we have to do
      either remcsum_adjust or set up CHECKSUM_PARTIAL again with its
      csum_start properly reset considering RCO.
      
      However, since b7fe10e5("gro: Fix remcsum offload to deal with frags
      in GRO") that barrier in such case could be skipped if GRO turned on,
      hence we pass over it and the inner L4 validation mistakenly reckons
      it as a bad csum.
      
      This patch makes remcsum_offload being reset at the same time of GRO
      remcsum cleanup, so as to make it work in such case as before.
      
      Fixes: b7fe10e5 ("gro: Fix remcsum offload to deal with frags in GRO")
      Signed-off-by: NKoichiro Den <den@klaipeden.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be73b304
  2. 05 7月, 2017 1 次提交
  3. 03 7月, 2017 2 次提交
  4. 28 6月, 2017 1 次提交
  5. 27 6月, 2017 3 次提交
  6. 21 6月, 2017 6 次提交
  7. 16 6月, 2017 2 次提交
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
    • J
      networking: convert many more places to skb_put_zero() · b080db58
      Johannes Berg 提交于
      There were many places that my previous spatch didn't find,
      as pointed out by yuan linyu in various patches.
      
      The following spatch found many more and also removes the
      now unnecessary casts:
      
          @@
          identifier p, p2;
          expression len;
          expression skb;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, len);
          |
          -memset(p, 0, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, sizeof(*p));
          |
          -memset(p, 0, sizeof(*p));
          )
      
          @@
          expression skb, len;
          @@
          -memset(skb_put(skb, len), 0, len);
          +skb_put_zero(skb, len);
      
      Apply it to the tree (with one manual fixup to keep the
      comment in vxlan.c, which spatch removed.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b080db58
  8. 12 6月, 2017 1 次提交
  9. 08 6月, 2017 2 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
    • M
      vxlan: use a more suitable function when assigning NULL · 57d88182
      Mark Bloch 提交于
      When stopping the vxlan interface we detach it from the socket.
      Use RCU_INIT_POINTER() and not rcu_assign_pointer() to do so.
      Suggested-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57d88182
  10. 03 6月, 2017 1 次提交
  11. 02 6月, 2017 1 次提交
  12. 01 5月, 2017 2 次提交
    • J
      vxlan: do not output confusing error message · baf4d786
      Jiri Benc 提交于
      The message "Cannot bind port X, err=Y" creates only confusion. In metadata
      based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
      no error message should be printed. But when IPv6 tunnel was requested, such
      failure is fatal. The vxlan_socket_create does not know when the error is
      harmless and when it's not.
      
      Instead of passing such information down to vxlan_socket_create, remove the
      message completely. It's not useful. We propagate the error code up to the
      user space and the port number comes from the user space. There's nothing in
      the message that the process creating vxlan interface does not know.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf4d786
    • J
      vxlan: correctly handle ipv6.disable module parameter · d074bf96
      Jiri Benc 提交于
      When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
      -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
      operation of bringing up the tunnel.
      
      Ignore failure of IPv6 socket creation for metadata based tunnels caused by
      IPv6 not being available.
      
      Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d074bf96
  13. 04 4月, 2017 1 次提交
    • V
      vxlan: fix ND proxy when skb doesn't have transport header offset · f1fb08f6
      Vincent Bernat 提交于
      When an incoming frame is tagged or when GRO is disabled, the skb
      handled to vxlan_xmit() doesn't contain a valid transport header
      offset. This makes ND proxying fail.
      
      We combine two changes: replace use of skb_transport_offset() and ensure
      the necessary amount of skb is linear just before using it:
      
       - In vxlan_xmit(), when determining if we have an ICMPv6 neighbor
         discovery packet, just check if it is an ICMPv6 packet and rely on
         neigh_reduce() to do more checks if this is the case. The use of
         pskb_may_pull() is replaced by skb_header_pointer() for just the IPv6
         header.
      
       - In neigh_reduce(), add pskb_may_pull() for IPv6 header and neighbor
         discovery message since this was removed from vxlan_xmit(). Replace
         skb_transport_header() with ipv6_hdr() + 1.
      
       - In vxlan_na_create(), replace first skb_transport_offset() with
         ipv6_hdr() + 1 and second with skb_network_offset() + sizeof(struct
         ipv6hdr). Additionally, ensure we pskb_may_pull() the whole skb as we
         need it to iterate over the options.
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1fb08f6
  14. 02 4月, 2017 1 次提交
  15. 29 3月, 2017 1 次提交
  16. 14 3月, 2017 1 次提交
  17. 13 3月, 2017 1 次提交
  18. 02 3月, 2017 1 次提交
  19. 25 2月, 2017 2 次提交
  20. 22 2月, 2017 2 次提交
  21. 18 2月, 2017 1 次提交
    • P
      vxlan: fix oops in dev_fill_metadata_dst · 22f0708a
      Paolo Abeni 提交于
      Since the commit 0c1d70af ("net: use dst_cache for vxlan device")
      vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
      dst_cache pointer, so the latter should explicitly check for
      valid dst_cache ptr. Unfortunately the commit d71785ff ("net: add
      dst_cache to ovs vxlan lwtunnel") removed said check.
      
      As a result is possible to trigger a null pointer access calling
      vxlan_fill_metadata_dst(), e.g. with:
      
      ovs-vsctl add-br ovs-br0
      ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
      	type=vxlan options:remote_ip=192.168.1.1 \
      	options:key=1234 options:dst_port=4789 ofport_request=10
      ip address add dev ovs-br0 172.16.1.2/24
      ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
      	targets=\"172.16.1.1:1234\" sampling=1
      iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234
      
      This commit addresses the issue passing to vxlan_get_route() the
      dst_cache already available into the lwt info processed by
      vxlan_fill_metadata_dst().
      
      Fixes: d71785ff ("net: add dst_cache to ovs vxlan lwtunnel")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f0708a
  22. 12 2月, 2017 1 次提交
  23. 04 2月, 2017 1 次提交
    • R
      vxlan: support fdb and learning in COLLECT_METADATA mode · 3ad7a4b1
      Roopa Prabhu 提交于
      Vxlan COLLECT_METADATA mode today solves the per-vni netdev
      scalability problem in l3 networks. It expects all forwarding
      information to be present in dst_metadata. This patch series
      enhances collect metadata mode to include the case where only
      vni is present in dst_metadata, and the vxlan driver can then use
      the rest of the forwarding information datbase to make forwarding
      decisions. There is no change to default COLLECT_METADATA
      behaviour. These changes only apply to COLLECT_METADATA when
      used with the bridging use-case with a special dst_metadata
      tunnel info flag (eg: where vxlan device is part of a bridge).
      For all this to work, the vxlan driver will need to now support a
      single fdb table hashed by mac + vni. This series essentially makes
      this happen.
      
      use-case and workflow:
      vxlan collect metadata device participates in bridging vlan
      to vn-segments. Bridge driver above the vxlan device,
      sends the vni corresponding to the vlan in the dst_metadata.
      vxlan driver will lookup forwarding database with (mac + vni)
      for the required remote destination information to forward the
      packet.
      
      Changes introduced by this patch:
          - allow learning and forwarding database state in vxlan netdev in
            COLLECT_METADATA mode. Current behaviour is not changed
            by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
            to support the new bridge friendly mode.
          - A single fdb table hashed by (mac, vni) to allow fdb entries with
            multiple vnis in the same fdb table
          - rx path already has the vni
          - tx path expects a vni in the packet with dst_metadata
          - prior to this series, fdb remote_dsts carried remote vni and
            the vxlan device carrying the fdb table represented the
            source vni. With the vxlan device now representing multiple vnis,
            this patch adds a src vni attribute to the fdb entry. The remote
            vni already uses NDA_VNI attribute. This patch introduces
            NDA_SRC_VNI netlink attribute to represent the src vni in a multi
            vni fdb table.
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's.
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metadata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ad7a4b1
  24. 25 1月, 2017 2 次提交
  25. 21 1月, 2017 1 次提交
  26. 18 1月, 2017 1 次提交