1. 01 5月, 2017 2 次提交
    • J
      vxlan: do not output confusing error message · baf4d786
      Jiri Benc 提交于
      The message "Cannot bind port X, err=Y" creates only confusion. In metadata
      based mode, failure of IPv6 socket creation is okay if IPv6 is disabled and
      no error message should be printed. But when IPv6 tunnel was requested, such
      failure is fatal. The vxlan_socket_create does not know when the error is
      harmless and when it's not.
      
      Instead of passing such information down to vxlan_socket_create, remove the
      message completely. It's not useful. We propagate the error code up to the
      user space and the port number comes from the user space. There's nothing in
      the message that the process creating vxlan interface does not know.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      baf4d786
    • J
      vxlan: correctly handle ipv6.disable module parameter · d074bf96
      Jiri Benc 提交于
      When IPv6 is compiled but disabled at runtime, __vxlan_sock_add returns
      -EAFNOSUPPORT. For metadata based tunnels, this causes failure of the whole
      operation of bringing up the tunnel.
      
      Ignore failure of IPv6 socket creation for metadata based tunnels caused by
      IPv6 not being available.
      
      Fixes: b1be00a6 ("vxlan: support both IPv4 and IPv6 sockets in a single vxlan device")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d074bf96
  2. 04 4月, 2017 1 次提交
    • V
      vxlan: fix ND proxy when skb doesn't have transport header offset · f1fb08f6
      Vincent Bernat 提交于
      When an incoming frame is tagged or when GRO is disabled, the skb
      handled to vxlan_xmit() doesn't contain a valid transport header
      offset. This makes ND proxying fail.
      
      We combine two changes: replace use of skb_transport_offset() and ensure
      the necessary amount of skb is linear just before using it:
      
       - In vxlan_xmit(), when determining if we have an ICMPv6 neighbor
         discovery packet, just check if it is an ICMPv6 packet and rely on
         neigh_reduce() to do more checks if this is the case. The use of
         pskb_may_pull() is replaced by skb_header_pointer() for just the IPv6
         header.
      
       - In neigh_reduce(), add pskb_may_pull() for IPv6 header and neighbor
         discovery message since this was removed from vxlan_xmit(). Replace
         skb_transport_header() with ipv6_hdr() + 1.
      
       - In vxlan_na_create(), replace first skb_transport_offset() with
         ipv6_hdr() + 1 and second with skb_network_offset() + sizeof(struct
         ipv6hdr). Additionally, ensure we pskb_may_pull() the whole skb as we
         need it to iterate over the options.
      Signed-off-by: NVincent Bernat <vincent@bernat.im>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1fb08f6
  3. 02 4月, 2017 1 次提交
  4. 29 3月, 2017 1 次提交
  5. 14 3月, 2017 1 次提交
  6. 13 3月, 2017 1 次提交
  7. 02 3月, 2017 1 次提交
  8. 25 2月, 2017 2 次提交
  9. 22 2月, 2017 2 次提交
  10. 18 2月, 2017 1 次提交
    • P
      vxlan: fix oops in dev_fill_metadata_dst · 22f0708a
      Paolo Abeni 提交于
      Since the commit 0c1d70af ("net: use dst_cache for vxlan device")
      vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
      dst_cache pointer, so the latter should explicitly check for
      valid dst_cache ptr. Unfortunately the commit d71785ff ("net: add
      dst_cache to ovs vxlan lwtunnel") removed said check.
      
      As a result is possible to trigger a null pointer access calling
      vxlan_fill_metadata_dst(), e.g. with:
      
      ovs-vsctl add-br ovs-br0
      ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
      	type=vxlan options:remote_ip=192.168.1.1 \
      	options:key=1234 options:dst_port=4789 ofport_request=10
      ip address add dev ovs-br0 172.16.1.2/24
      ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
      	targets=\"172.16.1.1:1234\" sampling=1
      iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234
      
      This commit addresses the issue passing to vxlan_get_route() the
      dst_cache already available into the lwt info processed by
      vxlan_fill_metadata_dst().
      
      Fixes: d71785ff ("net: add dst_cache to ovs vxlan lwtunnel")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22f0708a
  11. 12 2月, 2017 1 次提交
  12. 04 2月, 2017 1 次提交
    • R
      vxlan: support fdb and learning in COLLECT_METADATA mode · 3ad7a4b1
      Roopa Prabhu 提交于
      Vxlan COLLECT_METADATA mode today solves the per-vni netdev
      scalability problem in l3 networks. It expects all forwarding
      information to be present in dst_metadata. This patch series
      enhances collect metadata mode to include the case where only
      vni is present in dst_metadata, and the vxlan driver can then use
      the rest of the forwarding information datbase to make forwarding
      decisions. There is no change to default COLLECT_METADATA
      behaviour. These changes only apply to COLLECT_METADATA when
      used with the bridging use-case with a special dst_metadata
      tunnel info flag (eg: where vxlan device is part of a bridge).
      For all this to work, the vxlan driver will need to now support a
      single fdb table hashed by mac + vni. This series essentially makes
      this happen.
      
      use-case and workflow:
      vxlan collect metadata device participates in bridging vlan
      to vn-segments. Bridge driver above the vxlan device,
      sends the vni corresponding to the vlan in the dst_metadata.
      vxlan driver will lookup forwarding database with (mac + vni)
      for the required remote destination information to forward the
      packet.
      
      Changes introduced by this patch:
          - allow learning and forwarding database state in vxlan netdev in
            COLLECT_METADATA mode. Current behaviour is not changed
            by default. tunnel info flag IP_TUNNEL_INFO_BRIDGE is used
            to support the new bridge friendly mode.
          - A single fdb table hashed by (mac, vni) to allow fdb entries with
            multiple vnis in the same fdb table
          - rx path already has the vni
          - tx path expects a vni in the packet with dst_metadata
          - prior to this series, fdb remote_dsts carried remote vni and
            the vxlan device carrying the fdb table represented the
            source vni. With the vxlan device now representing multiple vnis,
            this patch adds a src vni attribute to the fdb entry. The remote
            vni already uses NDA_VNI attribute. This patch introduces
            NDA_SRC_VNI netlink attribute to represent the src vni in a multi
            vni fdb table.
      
      iproute2 example (patched and pruned iproute2 output to just show
      relevant fdb entries):
      example shows same host mac learnt on two vni's.
      
      before (netdev per vni):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan1000 dst 12.0.0.8 self
      
      after this patch with collect metadata in bridged mode (single netdev):
      $bridge fdb show | grep "00:02:00:00:00:03"
      00:02:00:00:00:03 dev vxlan0 src_vni 1001 dst 12.0.0.8 self
      00:02:00:00:00:03 dev vxlan0 src_vni 1000 dst 12.0.0.8 self
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ad7a4b1
  13. 25 1月, 2017 2 次提交
  14. 21 1月, 2017 1 次提交
  15. 18 1月, 2017 1 次提交
  16. 12 1月, 2017 1 次提交
  17. 01 12月, 2016 1 次提交
  18. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  19. 16 11月, 2016 7 次提交
  20. 10 11月, 2016 1 次提交
  21. 30 10月, 2016 1 次提交
  22. 21 10月, 2016 2 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
    • S
      net: add recursion limit to GRO · fcd91dd4
      Sabrina Dubroca 提交于
      Currently, GRO can do unlimited recursion through the gro_receive
      handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
      to one level with encap_mark, but both VLAN and TEB still have this
      problem.  Thus, the kernel is vulnerable to a stack overflow, if we
      receive a packet composed entirely of VLAN headers.
      
      This patch adds a recursion counter to the GRO layer to prevent stack
      overflow.  When a gro_receive function hits the recursion limit, GRO is
      aborted for this skb and it is processed normally.  This recursion
      counter is put in the GRO CB, but could be turned into a percpu counter
      if we run out of space in the CB.
      
      Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
      
      Fixes: CVE-2016-7039
      Fixes: 9b174d88 ("net: Add Transparent Ethernet Bridging GRO support.")
      Fixes: 66e5133f ("vlan: Add GRO support for non hardware accelerated vlan")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcd91dd4
  23. 11 9月, 2016 1 次提交
  24. 07 9月, 2016 1 次提交
  25. 05 9月, 2016 3 次提交
    • J
      vxlan: fix duplicated and wrong error messages · 3555621d
      Jiri Benc 提交于
      vxlan_dev_configure outputs error messages before returning, no need to
      print again the same mesages in vxlan_newlink. Also, vxlan_dev_configure may
      return a particular error code for a different reason than vxlan_newlink
      thinks.
      
      Move the remaining error messages into vxlan_dev_configure and let
      vxlan_newlink just pass on the error code.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3555621d
    • J
      vxlan: reject multicast destination without an interface · 9b4cdd51
      Jiri Benc 提交于
      Currently, kernel accepts configurations such as:
      
        ip l a type vxlan dstport 4789 id 1 group 239.192.0.1
        ip l a type vxlan dstport 4789 id 1 group ff0e::110
      
      However, neither of those really works. In the IPv4 case, the interface
      cannot be brought up ("RTNETLINK answers: No such device"). This is because
      multicast join will be rejected without the interface being specified.
      
      In the IPv6 case, multicast wil be joined on the first interface found. This
      is not what the user wants as it depends on random factors (order of
      interfaces).
      
      Note that it's possible to add a local address but it doesn't solve
      anything. For IPv4, it's not considered in the multicast join (thus the same
      error as above is returned on ifup). This could be added but it wouldn't
      help for IPv6 anyway. For IPv6, we do need the interface.
      
      Just reject a configuration that sets multicast address and does not provide
      an interface. Nobody can depend on the previous behavior as it never worked.
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b4cdd51
    • W
      vxlan: call peernet2id() in fdb notification · 38f507f1
      WANG Cong 提交于
      netns id should be already allocated each time we change
      netns, that is, in dev_change_net_namespace() (more precisely
      in rtnl_fill_ifinfo()). It is safe to just call peernet2id() here.
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38f507f1
  26. 02 9月, 2016 1 次提交
    • R
      rtnetlink: fdb dump: optimize by saving last interface markers · d297653d
      Roopa Prabhu 提交于
      fdb dumps spanning multiple skb's currently restart from the first
      interface again for every skb. This results in unnecessary
      iterations on the already visited interfaces and their fdb
      entries. In large scale setups, we have seen this to slow
      down fdb dumps considerably. On a system with 30k macs we
      see fdb dumps spanning across more than 300 skbs.
      
      To fix the problem, this patch replaces the existing single fdb
      marker with three markers: netdev hash entries, netdevs and fdb
      index to continue where we left off instead of restarting from the
      first netdev. This is consistent with link dumps.
      
      In the process of fixing the performance issue, this patch also
      re-implements fix done by
      commit 472681d5 ("net: ndo_fdb_dump should report -EMSGSIZE to rtnl_fdb_dump")
      (with an internal fix from Wilson Kok) in the following ways:
      - change ndo_fdb_dump handlers to return error code instead
      of the last fdb index
      - use cb->args strictly for dump frag markers and not error codes.
      This is consistent with other dump functions.
      
      Below results were taken on a system with 1000 netdevs
      and 35085 fdb entries:
      before patch:
      $time bridge fdb show | wc -l
      15065
      
      real    1m11.791s
      user    0m0.070s
      sys 1m8.395s
      
      (existing code does not return all macs)
      
      after patch:
      $time bridge fdb show | wc -l
      35085
      
      real    0m2.017s
      user    0m0.113s
      sys 0m1.942s
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NWilson Kok <wkok@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d297653d
  27. 27 8月, 2016 1 次提交