1. 29 5月, 2020 5 次提交
  2. 28 5月, 2020 3 次提交
  3. 27 5月, 2020 1 次提交
  4. 26 5月, 2020 2 次提交
    • E
      tcp: allow traceroute -Mtcp for unpriv users · 45af29ca
      Eric Dumazet 提交于
      Unpriv users can use traceroute over plain UDP sockets, but not TCP ones.
      
      $ traceroute -Mtcp 8.8.8.8
      You do not have enough privileges to use this traceroute method.
      
      $ traceroute -n -Mudp 8.8.8.8
      traceroute to 8.8.8.8 (8.8.8.8), 30 hops max, 60 byte packets
       1  192.168.86.1  3.631 ms  3.512 ms  3.405 ms
       2  10.1.10.1  4.183 ms  4.125 ms  4.072 ms
       3  96.120.88.125  20.621 ms  19.462 ms  20.553 ms
       4  96.110.177.65  24.271 ms  25.351 ms  25.250 ms
       5  69.139.199.197  44.492 ms  43.075 ms  44.346 ms
       6  68.86.143.93  27.969 ms  25.184 ms  25.092 ms
       7  96.112.146.18  25.323 ms 96.112.146.22  25.583 ms 96.112.146.26  24.502 ms
       8  72.14.239.204  24.405 ms 74.125.37.224  16.326 ms  17.194 ms
       9  209.85.251.9  18.154 ms 209.85.247.55  14.449 ms 209.85.251.9  26.296 ms^C
      
      We can easily support traceroute over TCP, by queueing an error message
      into socket error queue.
      
      Note that applications need to set IP_RECVERR/IPV6_RECVERR option to
      enable this feature, and that the error message is only queued
      while in SYN_SNT state.
      
      socket(AF_INET6, SOCK_STREAM, IPPROTO_IP) = 3
      setsockopt(3, SOL_IPV6, IPV6_RECVERR, [1], 4) = 0
      setsockopt(3, SOL_SOCKET, SO_TIMESTAMP_OLD, [1], 4) = 0
      setsockopt(3, SOL_IPV6, IPV6_UNICAST_HOPS, [5], 4) = 0
      connect(3, {sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0),
              inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0}, 28) = -1 EHOSTUNREACH (No route to host)
      recvmsg(3, {msg_name={sa_family=AF_INET6, sin6_port=htons(8787), sin6_flowinfo=htonl(0),
              inet_pton(AF_INET6, "2002:a05:6608:297::", &sin6_addr), sin6_scope_id=0},
              msg_namelen=1024->28, msg_iov=[{iov_base="`\r\337\320\0004\6\1&\7\370\260\200\231\16\27\0\0\0\0\0\0\0\0 \2\n\5f\10\2\227"..., iov_len=1024}],
              msg_iovlen=1, msg_control=[{cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=SO_TIMESTAMP_OLD, cmsg_data={tv_sec=1590340680, tv_usec=272424}},
                                         {cmsg_len=60, cmsg_level=SOL_IPV6, cmsg_type=IPV6_RECVERR}],
              msg_controllen=96, msg_flags=MSG_ERRQUEUE}, MSG_ERRQUEUE) = 144
      
      Suggested-by: Maciej Żenczykowski <maze@google.com
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45af29ca
    • D
      ipv4: potential underflow in compat_ip_setsockopt() · 6a1015b0
      Dan Carpenter 提交于
      The value of "n" is capped at 0x1ffffff but it checked for negative
      values.  I don't think this causes a problem but I'm not certain and
      it's harmless to prevent it.
      
      Fixes: 2e041728 ("ipv4: do compat setsockopt for MCAST_MSFILTER directly")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a1015b0
  5. 23 5月, 2020 2 次提交
  6. 22 5月, 2020 2 次提交
    • S
      net: don't return invalid table id error when we fall back to PF_UNSPEC · 41b4bd98
      Sabrina Dubroca 提交于
      In case we can't find a ->dumpit callback for the requested
      (family,type) pair, we fall back to (PF_UNSPEC,type). In effect, we're
      in the same situation as if userspace had requested a PF_UNSPEC
      dump. For RTM_GETROUTE, that handler is rtnl_dump_all, which calls all
      the registered RTM_GETROUTE handlers.
      
      The requested table id may or may not exist for all of those
      families. commit ae677bbb ("net: Don't return invalid table id
      error when dumping all families") fixed the problem when userspace
      explicitly requests a PF_UNSPEC dump, but missed the fallback case.
      
      For example, when we pass ipv6.disable=1 to a kernel with
      CONFIG_IP_MROUTE=y and CONFIG_IP_MROUTE_MULTIPLE_TABLES=y,
      the (PF_INET6, RTM_GETROUTE) handler isn't registered, so we end up in
      rtnl_dump_all, and listing IPv6 routes will unexpectedly print:
      
        # ip -6 r
        Error: ipv4: MR table does not exist.
        Dump terminated
      
      commit ae677bbb introduced the dump_all_families variable, which
      gets set when userspace requests a PF_UNSPEC dump. However, we can't
      simply set the family to PF_UNSPEC in rtnetlink_rcv_msg in the
      fallback case to get dump_all_families == true, because some messages
      types (for example RTM_GETRULE and RTM_GETNEIGH) only register the
      PF_UNSPEC handler and use the family to filter in the kernel what is
      dumped to userspace. We would then export more entries, that userspace
      would have to filter. iproute does that, but other programs may not.
      
      Instead, this patch removes dump_all_families and updates the
      RTM_GETROUTE handlers to check if the family that is being dumped is
      their own. When it's not, which covers both the intentional PF_UNSPEC
      dumps (as dump_all_families did) and the fallback case, ignore the
      missing table id error.
      
      Fixes: cb167893 ("net: Plumb support for filtering ipv4 and ipv6 multicast route dumps")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41b4bd98
    • V
      net: ipip: fix wrong address family in init error path · 57ebc8f0
      Vadim Fedorenko 提交于
      In case of error with MPLS support the code is misusing AF_INET
      instead of AF_MPLS.
      
      Fixes: 1b69e7e6 ("ipip: support MPLS over IPv4")
      Signed-off-by: NVadim Fedorenko <vfedorenko@novek.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57ebc8f0
  7. 21 5月, 2020 9 次提交
    • S
      net: nlmsg_cancel() if put fails for nhmsg · d69100b8
      Stephen Worley 提交于
      Fixes data remnant seen when we fail to reserve space for a
      nexthop group during a larger dump.
      
      If we fail the reservation, we goto nla_put_failure and
      cancel the message.
      
      Reproduce with the following iproute2 commands:
      =====================
      ip link add dummy1 type dummy
      ip link add dummy2 type dummy
      ip link add dummy3 type dummy
      ip link add dummy4 type dummy
      ip link add dummy5 type dummy
      ip link add dummy6 type dummy
      ip link add dummy7 type dummy
      ip link add dummy8 type dummy
      ip link add dummy9 type dummy
      ip link add dummy10 type dummy
      ip link add dummy11 type dummy
      ip link add dummy12 type dummy
      ip link add dummy13 type dummy
      ip link add dummy14 type dummy
      ip link add dummy15 type dummy
      ip link add dummy16 type dummy
      ip link add dummy17 type dummy
      ip link add dummy18 type dummy
      ip link add dummy19 type dummy
      ip link add dummy20 type dummy
      ip link add dummy21 type dummy
      ip link add dummy22 type dummy
      ip link add dummy23 type dummy
      ip link add dummy24 type dummy
      ip link add dummy25 type dummy
      ip link add dummy26 type dummy
      ip link add dummy27 type dummy
      ip link add dummy28 type dummy
      ip link add dummy29 type dummy
      ip link add dummy30 type dummy
      ip link add dummy31 type dummy
      ip link add dummy32 type dummy
      
      ip link set dummy1 up
      ip link set dummy2 up
      ip link set dummy3 up
      ip link set dummy4 up
      ip link set dummy5 up
      ip link set dummy6 up
      ip link set dummy7 up
      ip link set dummy8 up
      ip link set dummy9 up
      ip link set dummy10 up
      ip link set dummy11 up
      ip link set dummy12 up
      ip link set dummy13 up
      ip link set dummy14 up
      ip link set dummy15 up
      ip link set dummy16 up
      ip link set dummy17 up
      ip link set dummy18 up
      ip link set dummy19 up
      ip link set dummy20 up
      ip link set dummy21 up
      ip link set dummy22 up
      ip link set dummy23 up
      ip link set dummy24 up
      ip link set dummy25 up
      ip link set dummy26 up
      ip link set dummy27 up
      ip link set dummy28 up
      ip link set dummy29 up
      ip link set dummy30 up
      ip link set dummy31 up
      ip link set dummy32 up
      
      ip link set dummy33 up
      ip link set dummy34 up
      
      ip link set vrf-red up
      ip link set vrf-blue up
      
      ip link set dummyVRFred up
      ip link set dummyVRFblue up
      
      ip ro add 1.1.1.1/32 dev dummy1
      ip ro add 1.1.1.2/32 dev dummy2
      ip ro add 1.1.1.3/32 dev dummy3
      ip ro add 1.1.1.4/32 dev dummy4
      ip ro add 1.1.1.5/32 dev dummy5
      ip ro add 1.1.1.6/32 dev dummy6
      ip ro add 1.1.1.7/32 dev dummy7
      ip ro add 1.1.1.8/32 dev dummy8
      ip ro add 1.1.1.9/32 dev dummy9
      ip ro add 1.1.1.10/32 dev dummy10
      ip ro add 1.1.1.11/32 dev dummy11
      ip ro add 1.1.1.12/32 dev dummy12
      ip ro add 1.1.1.13/32 dev dummy13
      ip ro add 1.1.1.14/32 dev dummy14
      ip ro add 1.1.1.15/32 dev dummy15
      ip ro add 1.1.1.16/32 dev dummy16
      ip ro add 1.1.1.17/32 dev dummy17
      ip ro add 1.1.1.18/32 dev dummy18
      ip ro add 1.1.1.19/32 dev dummy19
      ip ro add 1.1.1.20/32 dev dummy20
      ip ro add 1.1.1.21/32 dev dummy21
      ip ro add 1.1.1.22/32 dev dummy22
      ip ro add 1.1.1.23/32 dev dummy23
      ip ro add 1.1.1.24/32 dev dummy24
      ip ro add 1.1.1.25/32 dev dummy25
      ip ro add 1.1.1.26/32 dev dummy26
      ip ro add 1.1.1.27/32 dev dummy27
      ip ro add 1.1.1.28/32 dev dummy28
      ip ro add 1.1.1.29/32 dev dummy29
      ip ro add 1.1.1.30/32 dev dummy30
      ip ro add 1.1.1.31/32 dev dummy31
      ip ro add 1.1.1.32/32 dev dummy32
      
      ip next add id 1 via 1.1.1.1 dev dummy1
      ip next add id 2 via 1.1.1.2 dev dummy2
      ip next add id 3 via 1.1.1.3 dev dummy3
      ip next add id 4 via 1.1.1.4 dev dummy4
      ip next add id 5 via 1.1.1.5 dev dummy5
      ip next add id 6 via 1.1.1.6 dev dummy6
      ip next add id 7 via 1.1.1.7 dev dummy7
      ip next add id 8 via 1.1.1.8 dev dummy8
      ip next add id 9 via 1.1.1.9 dev dummy9
      ip next add id 10 via 1.1.1.10 dev dummy10
      ip next add id 11 via 1.1.1.11 dev dummy11
      ip next add id 12 via 1.1.1.12 dev dummy12
      ip next add id 13 via 1.1.1.13 dev dummy13
      ip next add id 14 via 1.1.1.14 dev dummy14
      ip next add id 15 via 1.1.1.15 dev dummy15
      ip next add id 16 via 1.1.1.16 dev dummy16
      ip next add id 17 via 1.1.1.17 dev dummy17
      ip next add id 18 via 1.1.1.18 dev dummy18
      ip next add id 19 via 1.1.1.19 dev dummy19
      ip next add id 20 via 1.1.1.20 dev dummy20
      ip next add id 21 via 1.1.1.21 dev dummy21
      ip next add id 22 via 1.1.1.22 dev dummy22
      ip next add id 23 via 1.1.1.23 dev dummy23
      ip next add id 24 via 1.1.1.24 dev dummy24
      ip next add id 25 via 1.1.1.25 dev dummy25
      ip next add id 26 via 1.1.1.26 dev dummy26
      ip next add id 27 via 1.1.1.27 dev dummy27
      ip next add id 28 via 1.1.1.28 dev dummy28
      ip next add id 29 via 1.1.1.29 dev dummy29
      ip next add id 30 via 1.1.1.30 dev dummy30
      ip next add id 31 via 1.1.1.31 dev dummy31
      ip next add id 32 via 1.1.1.32 dev dummy32
      
      i=100
      
      while [ $i -le 200 ]
      do
      ip next add id $i group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19
      
      	echo $i
      
      	((i++))
      
      done
      
      ip next add id 999 group 1/2/3/4/5/6
      
      ip next ls
      
      ========================
      
      Fixes: ab84be7e ("net: Initial nexthop code")
      Signed-off-by: NStephen Worley <sworley@cumulusnetworks.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d69100b8
    • A
      handle the group_source_req options directly · b212c322
      Al Viro 提交于
      Native ->setsockopt() handling of these options (MCAST_..._SOURCE_GROUP
      and MCAST_{,UN}BLOCK_SOURCE) consists of copyin + call of a helper that
      does the actual work.  The only change needed for ->compat_setsockopt()
      is a slightly different copyin - the helpers can be reused as-is.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b212c322
    • A
      2bbf8c1e
    • A
      ipv[46]: do compat setsockopt for MCAST_{JOIN,LEAVE}_GROUP directly · 2f984f11
      Al Viro 提交于
      direct parallel to the way these two are handled in the native
      ->setsockopt() instances - the helpers that do the real work
      are already separated and can be reused as-is in this case.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2f984f11
    • A
      ipv4: do compat setsockopt for MCAST_MSFILTER directly · 2e041728
      Al Viro 提交于
      Parallel to what the native setsockopt() does, except that unlike
      the native setsockopt() we do not use memdup_user() - we want
      the sockaddr_storage fields properly aligned, so we allocate
      4 bytes more and copy compat_group_filter at the offset 4,
      which yields the proper alignments.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2e041728
    • A
    • A
      get rid of compat_mc_getsockopt() · 0dfe6581
      Al Viro 提交于
      now we can do MCAST_MSFILTER in compat ->getsockopt() without
      playing silly buggers with copying things back and forth.
      We can form a native struct group_filter (sans the variable-length
      tail) on stack, pass that + pointer to the tail of original request
      to the helper doing the bulk of the work, then do the rest of
      copyout - same as the native getsockopt() does.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0dfe6581
    • A
      ip*_mc_gsfget(): lift copyout of struct group_filter into callers · 931ca7ab
      Al Viro 提交于
      pass the userland pointer to the array in its tail, so that part
      gets copied out by our functions; copyout of everything else is
      done in the callers.  Rationale: reuse for compat; the array
      is the same in native and compat, the layout of parts before it
      is different for compat.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      931ca7ab
    • A
      compat_ip{,v6}_setsockopt(): enumerate MCAST_... options explicitly · e9c375fb
      Al Viro 提交于
      We want to check if optname is among the MCAST_... ones; do that as
      an explicit switch.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e9c375fb
  8. 20 5月, 2020 6 次提交
    • C
      impr: use ->ndo_tunnel_ctl in ipmr_new_tunnel · c7e36705
      Christoph Hellwig 提交于
      Use the new ->ndo_tunnel_ctl instead of overriding the address limit
      and using ->ndo_do_ioctl just to do a pointless user copy.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7e36705
    • C
      net: add a new ndo_tunnel_ioctl method · 607259a6
      Christoph Hellwig 提交于
      This method is used to properly allow kernel callers of the IPv4 route
      management ioctls.  The exsting ip_tunnel_ioctl helper is renamed to
      ip_tunnel_ctl to better reflect that it doesn't directly implement ioctls
      touching user memory, and is used for the guts of ndo_tunnel_ctl
      implementations. A new ip_tunnel_ioctl helper is added that can be wired
      up directly to the ndo_do_ioctl method and takes care of the copy to and
      from userspace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607259a6
    • C
      ipv4: consolidate the VIFF_TUNNEL handling in ipmr_new_tunnel · c1fd1182
      Christoph Hellwig 提交于
      Also move the dev_set_allmulti call and the error handling into the
      ioctl helper.  This allows reusing already looked up tunnel_dev pointer
      and the set up argument structure for the deletion in the error handler.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1fd1182
    • C
      ipv4: streamline ipmr_new_tunnel · c384b8a7
      Christoph Hellwig 提交于
      Reduce a few level of indentation to simplify the function.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c384b8a7
    • M
      net: inet_csk: Fix so_reuseport bind-address cache in tb->fast* · 88d7fcfa
      Martin KaFai Lau 提交于
      The commit 637bc8bb ("inet: reset tb->fastreuseport when adding a reuseport sk")
      added a bind-address cache in tb->fast*.  The tb->fast* caches the address
      of a sk which has successfully been binded with SO_REUSEPORT ON.  The idea
      is to avoid the expensive conflict search in inet_csk_bind_conflict().
      
      There is an issue with wildcard matching where sk_reuseport_match() should
      have returned false but it is currently returning true.  It ends up
      hiding bind conflict.  For example,
      
      bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */
      bind("[::2]:443"); /* with    SO_REUSEPORT. Succeed. */
      bind("[::]:443");  /* with    SO_REUSEPORT. Still Succeed where it shouldn't */
      
      The last bind("[::]:443") with SO_REUSEPORT on should have failed because
      it should have a conflict with the very first bind("[::1]:443") which
      has SO_REUSEPORT off.  However, the address "[::2]" is cached in
      tb->fast* in the second bind. In the last bind, the sk_reuseport_match()
      returns true because the binding sk's wildcard addr "[::]" matches with
      the "[::2]" cached in tb->fast*.
      
      The correct bind conflict is reported by removing the second
      bind such that tb->fast* cache is not involved and forces the
      bind("[::]:443") to go through the inet_csk_bind_conflict():
      
      bind("[::1]:443"); /* without SO_REUSEPORT. Succeed. */
      bind("[::]:443");  /* with    SO_REUSEPORT. -EADDRINUSE */
      
      The expected behavior for sk_reuseport_match() is, it should only allow
      the "cached" tb->fast* address to be used as a wildcard match but not
      the address of the binding sk.  To do that, the current
      "bool match_wildcard" arg is split into
      "bool match_sk1_wildcard" and "bool match_sk2_wildcard".
      
      This change only affects the sk_reuseport_match() which is only
      used by inet_csk (e.g. TCP).
      The other use cases are calling inet_rcv_saddr_equal() and
      this patch makes it pass the same "match_wildcard" arg twice to
      the "ipv[46]_rcv_saddr_equal(..., match_wildcard, match_wildcard)".
      
      Cc: Josef Bacik <jbacik@fb.com>
      Fixes: 637bc8bb ("inet: reset tb->fastreuseport when adding a reuseport sk")
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88d7fcfa
    • D
      bpf: Add get{peer, sock}name attach types for sock_addr · 1b66d253
      Daniel Borkmann 提交于
      As stated in 983695fa ("bpf: fix unconnected udp hooks"), the objective
      for the existing cgroup connect/sendmsg/recvmsg/bind BPF hooks is to be
      transparent to applications. In Cilium we make use of these hooks [0] in
      order to enable E-W load balancing for existing Kubernetes service types
      for all Cilium managed nodes in the cluster. Those backends can be local
      or remote. The main advantage of this approach is that it operates as close
      as possible to the socket, and therefore allows to avoid packet-based NAT
      given in connect/sendmsg/recvmsg hooks we only need to xlate sock addresses.
      
      This also allows to expose NodePort services on loopback addresses in the
      host namespace, for example. As another advantage, this also efficiently
      blocks bind requests for applications in the host namespace for exposed
      ports. However, one missing item is that we also need to perform reverse
      xlation for inet{,6}_getname() hooks such that we can return the service
      IP/port tuple back to the application instead of the remote peer address.
      
      The vast majority of applications does not bother about getpeername(), but
      in a few occasions we've seen breakage when validating the peer's address
      since it returns unexpectedly the backend tuple instead of the service one.
      Therefore, this trivial patch allows to customise and adds a getpeername()
      as well as getsockname() BPF cgroup hook for both IPv4 and IPv6 in order
      to address this situation.
      
      Simple example:
      
        # ./cilium/cilium service list
        ID   Frontend     Service Type   Backend
        1    1.2.3.4:80   ClusterIP      1 => 10.0.0.10:80
      
      Before; curl's verbose output example, no getpeername() reverse xlation:
      
        # curl --verbose 1.2.3.4
        * Rebuilt URL to: 1.2.3.4/
        *   Trying 1.2.3.4...
        * TCP_NODELAY set
        * Connected to 1.2.3.4 (10.0.0.10) port 80 (#0)
        > GET / HTTP/1.1
        > Host: 1.2.3.4
        > User-Agent: curl/7.58.0
        > Accept: */*
        [...]
      
      After; with getpeername() reverse xlation:
      
        # curl --verbose 1.2.3.4
        * Rebuilt URL to: 1.2.3.4/
        *   Trying 1.2.3.4...
        * TCP_NODELAY set
        * Connected to 1.2.3.4 (1.2.3.4) port 80 (#0)
        > GET / HTTP/1.1
        >  Host: 1.2.3.4
        > User-Agent: curl/7.58.0
        > Accept: */*
        [...]
      
      Originally, I had both under a BPF_CGROUP_INET{4,6}_GETNAME type and exposed
      peer to the context similar as in inet{,6}_getname() fashion, but API-wise
      this is suboptimal as it always enforces programs having to test for ctx->peer
      which can easily be missed, hence BPF_CGROUP_INET{4,6}_GET{PEER,SOCK}NAME split.
      Similarly, the checked return code is on tnum_range(1, 1), but if a use case
      comes up in future, it can easily be changed to return an error code instead.
      Helper and ctx member access is the same as with connect/sendmsg/etc hooks.
      
        [0] https://github.com/cilium/cilium/blob/master/bpf/bpf_sock.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Link: https://lore.kernel.org/bpf/61a479d759b2482ae3efb45546490bacd796a220.1589841594.git.daniel@iogearbox.net
      1b66d253
  9. 19 5月, 2020 1 次提交
  10. 18 5月, 2020 2 次提交
  11. 17 5月, 2020 1 次提交
  12. 16 5月, 2020 2 次提交
  13. 15 5月, 2020 3 次提交
    • A
      ipmr: Add lockdep expression to ipmr_for_each_table macro · 7013908c
      Amol Grover 提交于
      During the initialization process, ipmr_new_table() is called
      to create new tables which in turn calls ipmr_get_table() which
      traverses net->ipv4.mr_tables without holding the writer lock.
      However, this is safe to do so as no tables exist at this time.
      Hence add a suitable lockdep expression to silence the following
      false-positive warning:
      
      =============================
      WARNING: suspicious RCU usage
      5.7.0-rc3-next-20200428-syzkaller #0 Not tainted
      -----------------------------
      net/ipv4/ipmr.c:136 RCU-list traversed in non-reader section!!
      
      ipmr_get_table+0x130/0x160 net/ipv4/ipmr.c:136
      ipmr_new_table net/ipv4/ipmr.c:403 [inline]
      ipmr_rules_init net/ipv4/ipmr.c:248 [inline]
      ipmr_net_init+0x133/0x430 net/ipv4/ipmr.c:3089
      
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Reported-by: syzbot+1519f497f2f9f08183c6@syzkaller.appspotmail.com
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NAmol Grover <frextrite@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7013908c
    • A
      ipmr: Fix RCU list debugging warning · a14fbcd4
      Amol Grover 提交于
      ipmr_for_each_table() macro uses list_for_each_entry_rcu()
      for traversing outside of an RCU read side critical section
      but under the protection of rtnl_mutex. Hence, add the
      corresponding lockdep expression to silence the following
      false-positive warning at boot:
      
      [    4.319347] =============================
      [    4.319349] WARNING: suspicious RCU usage
      [    4.319351] 5.5.4-stable #17 Tainted: G            E
      [    4.319352] -----------------------------
      [    4.319354] net/ipv4/ipmr.c:1757 RCU-list traversed in non-reader section!!
      
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Signed-off-by: NAmol Grover <frextrite@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a14fbcd4
    • E
      tcp: fix error recovery in tcp_zerocopy_receive() · e776af60
      Eric Dumazet 提交于
      If user provides wrong virtual address in TCP_ZEROCOPY_RECEIVE
      operation we want to return -EINVAL error.
      
      But depending on zc->recv_skip_hint content, we might return
      -EIO error if the socket has SOCK_DONE set.
      
      Make sure to return -EINVAL in this case.
      
      BUG: KMSAN: uninit-value in tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
      BUG: KMSAN: uninit-value in do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
      CPU: 1 PID: 625 Comm: syz-executor.0 Not tainted 5.7.0-rc4-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x220 lib/dump_stack.c:118
       kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:121
       __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215
       tcp_zerocopy_receive net/ipv4/tcp.c:1833 [inline]
       do_tcp_getsockopt+0x4494/0x6320 net/ipv4/tcp.c:3685
       tcp_getsockopt+0xf8/0x1f0 net/ipv4/tcp.c:3728
       sock_common_getsockopt+0x13f/0x180 net/core/sock.c:3131
       __sys_getsockopt+0x533/0x7b0 net/socket.c:2177
       __do_sys_getsockopt net/socket.c:2192 [inline]
       __se_sys_getsockopt+0xe1/0x100 net/socket.c:2189
       __x64_sys_getsockopt+0x62/0x80 net/socket.c:2189
       do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:297
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      RIP: 0033:0x45c829
      Code: 0d b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 db b6 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f1deeb72c78 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
      RAX: ffffffffffffffda RBX: 00000000004e01e0 RCX: 000000000045c829
      RDX: 0000000000000023 RSI: 0000000000000006 RDI: 0000000000000009
      RBP: 000000000078bf00 R08: 0000000020000200 R09: 0000000000000000
      R10: 00000000200001c0 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000001d8 R14: 00000000004d3038 R15: 00007f1deeb736d4
      
      Local variable ----zc@do_tcp_getsockopt created at:
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
       do_tcp_getsockopt+0x1a74/0x6320 net/ipv4/tcp.c:3670
      
      Fixes: 05255b82 ("tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e776af60
  14. 13 5月, 2020 1 次提交
    • P
      netlabel: cope with NULL catmap · eead1c2e
      Paolo Abeni 提交于
      The cipso and calipso code can set the MLS_CAT attribute on
      successful parsing, even if the corresponding catmap has
      not been allocated, as per current configuration and external
      input.
      
      Later, selinux code tries to access the catmap if the MLS_CAT flag
      is present via netlbl_catmap_getlong(). That may cause null ptr
      dereference while processing incoming network traffic.
      
      Address the issue setting the MLS_CAT flag only if the catmap is
      really allocated. Additionally let netlbl_catmap_getlong() cope
      with NULL catmap.
      Reported-by: NMatthew Sheets <matthew.sheets@gd-ms.com>
      Fixes: 4b8feff2 ("netlabel: fix the horribly broken catmap functions")
      Fixes: ceba1832 ("calipso: Set the calipso socket label to match the secattr.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eead1c2e