1. 11 8月, 2020 2 次提交
  2. 09 8月, 2020 5 次提交
  3. 08 8月, 2020 2 次提交
  4. 07 8月, 2020 1 次提交
    • Y
      bpf: Change uapi for bpf iterator map elements · 5e7b3020
      Yonghong Song 提交于
      Commit a5cbe05a ("bpf: Implement bpf iterator for
      map elements") added bpf iterator support for
      map elements. The map element bpf iterator requires
      info to identify a particular map. In the above
      commit, the attr->link_create.target_fd is used
      to carry map_fd and an enum bpf_iter_link_info
      is added to uapi to specify the target_fd actually
      representing a map_fd:
          enum bpf_iter_link_info {
      	BPF_ITER_LINK_UNSPEC = 0,
      	BPF_ITER_LINK_MAP_FD = 1,
      
      	MAX_BPF_ITER_LINK_INFO,
          };
      
      This is an extensible approach as we can grow
      enumerator for pid, cgroup_id, etc. and we can
      unionize target_fd for pid, cgroup_id, etc.
      But in the future, there are chances that
      more complex customization may happen, e.g.,
      for tasks, it could be filtered based on
      both cgroup_id and user_id.
      
      This patch changed the uapi to have fields
      	__aligned_u64	iter_info;
      	__u32		iter_info_len;
      for additional iter_info for link_create.
      The iter_info is defined as
      	union bpf_iter_link_info {
      		struct {
      			__u32   map_fd;
      		} map;
      	};
      
      So future extension for additional customization
      will be easier. The bpf_iter_link_info will be
      passed to target callback to validate and generic
      bpf_iter framework does not need to deal it any
      more.
      
      Note that map_fd = 0 will be considered invalid
      and -EBADF will be returned to user space.
      
      Fixes: a5cbe05a ("bpf: Implement bpf iterator for map elements")
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200805055056.1457463-1-yhs@fb.com
      5e7b3020
  5. 06 8月, 2020 5 次提交
    • S
      ip_tunnel_core: Fix build for archs without _HAVE_ARCH_IPV6_CSUM · 8ed54f16
      Stefano Brivio 提交于
      On architectures defining _HAVE_ARCH_IPV6_CSUM, we get
      csum_ipv6_magic() defined by means of arch checksum.h headers. On
      other architectures, we actually need to include net/ip6_checksum.h
      to be able to use it.
      
      Without this include, building with defconfig breaks at least for
      s390.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Fixes: 4cb47a86 ("tunnels: PMTU discovery support for directly bridged IP packets")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ed54f16
    • P
      mptcp: be careful on subflow creation · adf73410
      Paolo Abeni 提交于
      Nicolas reported the following oops:
      
      [ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [ 1521.394189] #PF: supervisor read access in kernel mode
      [ 1521.395376] #PF: error_code(0x0000) - not-present page
      [ 1521.396607] PGD 0 P4D 0
      [ 1521.397156] Oops: 0000 [#1] SMP PTI
      [ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109
      [ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 1521.401728] Workqueue: events mptcp_worker
      [ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0
      [ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b
      [ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246
      [ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000
      [ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80
      [ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00
      [ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0
      [ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000
      [ 1521.417592] FS:  0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000
      [ 1521.419490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0
      [ 1521.422511] Call Trace:
      [ 1521.423103]  __mptcp_subflow_connect+0x94/0x1f0
      [ 1521.425376]  mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0
      [ 1521.426736]  mptcp_worker+0x31b/0x390
      [ 1521.431324]  process_one_work+0x1fc/0x3f0
      [ 1521.432268]  worker_thread+0x2d/0x3b0
      [ 1521.434197]  kthread+0x117/0x130
      [ 1521.435783]  ret_from_fork+0x22/0x30
      
      on some unconventional configuration.
      
      The MPTCP protocol is trying to create a subflow for an
      unaccepted server socket. That is allowed by the RFC, even
      if subflow creation will likely fail.
      Unaccepted sockets have still a NULL sk_socket field,
      avoid the issue by failing earlier.
      Reported-and-tested-by: NNicolas Rybowski <nicolas.rybowski@tessares.net>
      Fixes: 7d14b0d2 ("mptcp: set correct vfs info for subflows")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adf73410
    • X
      tipc: set ub->ifindex for local ipv6 address · 5a6f6f57
      Xin Long 提交于
      Without ub->ifindex set for ipv6 address in tipc_udp_enable(),
      ipv6_sock_mc_join() may make the wrong dev join the multicast
      address in enable_mcast(). This causes that tipc links would
      never be created.
      
      So fix it by getting the right netdev and setting ub->ifindex,
      as it does for ipv4 address.
      Reported-by: NShuang Li <shuali@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6f6f57
    • X
      ipv6: add ipv6_dev_find() · 81f6cb31
      Xin Long 提交于
      This is to add an ip_dev_find like function for ipv6, used to find
      the dev by saddr.
      
      It will be used by TIPC protocol. So also export it.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81f6cb31
    • T
      net: openvswitch: silence suspicious RCU usage warning · 5845589e
      Tonghao Zhang 提交于
      ovs_flow_tbl_destroy always is called from RCU callback
      or error path. It is no need to check if rcu_read_lock
      or lockdep_ovsl_is_held was held.
      
      ovs_dp_cmd_fill_info always is called with ovs_mutex,
      So use the rcu_dereference_ovsl instead of rcu_dereference
      in ovs_flow_tbl_masks_cache_size.
      
      Fixes: 9bf24f59 ("net: openvswitch: make masks cache size configurable")
      Cc: Eelco Chaudron <echaudro@redhat.com>
      Reported-by: syzbot+c0eb9e7cdde04e4eb4be@syzkaller.appspotmail.com
      Reported-by: syzbot+f612c02823acb02ff9bc@syzkaller.appspotmail.com
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5845589e
  6. 05 8月, 2020 2 次提交
    • S
      tunnels: PMTU discovery support for directly bridged IP packets · 4cb47a86
      Stefano Brivio 提交于
      It's currently possible to bridge Ethernet tunnels carrying IP
      packets directly to external interfaces without assigning them
      addresses and routes on the bridged network itself: this is the case
      for UDP tunnels bridged with a standard bridge or by Open vSwitch.
      
      PMTU discovery is currently broken with those configurations, because
      the encapsulation effectively decreases the MTU of the link, and
      while we are able to account for this using PMTU discovery on the
      lower layer, we don't have a way to relay ICMP or ICMPv6 messages
      needed by the sender, because we don't have valid routes to it.
      
      On the other hand, as a tunnel endpoint, we can't fragment packets
      as a general approach: this is for instance clearly forbidden for
      VXLAN by RFC 7348, section 4.3:
      
         VTEPs MUST NOT fragment VXLAN packets.  Intermediate routers may
         fragment encapsulated VXLAN packets due to the larger frame size.
         The destination VTEP MAY silently discard such VXLAN fragments.
      
      The same paragraph recommends that the MTU over the physical network
      accomodates for encapsulations, but this isn't a practical option for
      complex topologies, especially for typical Open vSwitch use cases.
      
      Further, it states that:
      
         Other techniques like Path MTU discovery (see [RFC1191] and
         [RFC1981]) MAY be used to address this requirement as well.
      
      Now, PMTU discovery already works for routed interfaces, we get
      route exceptions created by the encapsulation device as they receive
      ICMP Fragmentation Needed and ICMPv6 Packet Too Big messages, and
      we already rebuild those messages with the appropriate MTU and route
      them back to the sender.
      
      Add the missing bits for bridged cases:
      
      - checks in skb_tunnel_check_pmtu() to understand if it's appropriate
        to trigger a reply according to RFC 1122 section 3.2.2 for ICMP and
        RFC 4443 section 2.4 for ICMPv6. This function is already called by
        UDP tunnels
      
      - a new function generating those ICMP or ICMPv6 replies. We can't
        reuse icmp_send() and icmp6_send() as we don't see the sender as a
        valid destination. This doesn't need to be generic, as we don't
        cover any other type of ICMP errors given that we only provide an
        encapsulation function to the sender
      
      While at it, make the MTU check in skb_tunnel_check_pmtu() accurate:
      we might receive GSO buffers here, and the passed headroom already
      includes the inner MAC length, so we don't have to account for it
      a second time (that would imply three MAC headers on the wire, but
      there are just two).
      
      This issue became visible while bridging IPv6 packets with 4500 bytes
      of payload over GENEVE using IPv4 with a PMTU of 4000. Given the 50
      bytes of encapsulation headroom, we would advertise MTU as 3950, and
      we would reject fragmented IPv6 datagrams of 3958 bytes size on the
      wire. We're exclusively dealing with network MTU here, though, so we
      could get Ethernet frames up to 3964 octets in that case.
      
      v2:
      - moved skb_tunnel_check_pmtu() to ip_tunnel_core.c (David Ahern)
      - split IPv4/IPv6 functions (David Ahern)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cb47a86
    • S
      ipv4: route: Ignore output interface in FIB lookup for PMTU route · df23bb18
      Stefano Brivio 提交于
      Currently, processes sending traffic to a local bridge with an
      encapsulation device as a port don't get ICMP errors if they exceed
      the PMTU of the encapsulated link.
      
      David Ahern suggested this as a hack, but it actually looks like
      the correct solution: when we update the PMTU for a given destination
      by means of updating or creating a route exception, the encapsulation
      might trigger this because of PMTU discovery happening either on the
      encapsulation device itself, or its lower layer. This happens on
      bridged encapsulations only.
      
      The output interface shouldn't matter, because we already have a
      valid destination. Drop the output interface restriction from the
      associated route lookup.
      
      For UDP tunnels, we will now have a route exception created for the
      encapsulation itself, with a MTU value reflecting its headroom, which
      allows a bridge forwarding IP packets originated locally to deliver
      errors back to the sending socket.
      
      The behaviour is now consistent with IPv6 and verified with selftests
      pmtu_ipv{4,6}_br_{geneve,vxlan}{4,6}_exception introduced later in
      this series.
      
      v2:
      - reset output interface only for bridge ports (David Ahern)
      - add and use netif_is_any_bridge_port() helper (David Ahern)
      Suggested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df23bb18
  7. 04 8月, 2020 23 次提交