1. 22 10月, 2017 1 次提交
    • C
      soreuseport: fix initialization race · 1b5f962e
      Craig Gallek 提交于
      Syzkaller stumbled upon a way to trigger
      WARNING: CPU: 1 PID: 13881 at net/core/sock_reuseport.c:41
      reuseport_alloc+0x306/0x3b0 net/core/sock_reuseport.c:39
      
      There are two initialization paths for the sock_reuseport structure in a
      socket: Through the udp/tcp bind paths of SO_REUSEPORT sockets or through
      SO_ATTACH_REUSEPORT_[CE]BPF before bind.  The existing implementation
      assumedthat the socket lock protected both of these paths when it actually
      only protects the SO_ATTACH_REUSEPORT path.  Syzkaller triggered this
      double allocation by running these paths concurrently.
      
      This patch moves the check for double allocation into the reuseport_alloc
      function which is protected by a global spin lock.
      
      Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection")
      Fixes: c125e80b ("soreuseport: fast reuseport TCP socket selection")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b5f962e
  2. 21 10月, 2017 1 次提交
  3. 10 10月, 2017 1 次提交
    • P
      udp: fix bcast packet reception · 996b44fc
      Paolo Abeni 提交于
      The commit bc044e8d ("udp: perform source validation for
      mcast early demux") does not take into account that broadcast packets
      lands in the same code path and they need different checks for the
      source address - notably, zero source address are valid for bcast
      and invalid for mcast.
      
      As a result, 2nd and later broadcast packets with 0 source address
      landing to the same socket are dropped. This breaks dhcp servers.
      
      Since we don't have stringent performance requirements for ingress
      broadcast traffic, fix it by disabling UDP early demux such traffic.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Fixes: bc044e8d ("udp: perform source validation for mcast early demux")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      996b44fc
  4. 01 10月, 2017 2 次提交
    • P
      udp: perform source validation for mcast early demux · bc044e8d
      Paolo Abeni 提交于
      The UDP early demux can leverate the rx dst cache even for
      multicast unconnected sockets.
      
      In such scenario the ipv4 source address is validated only on
      the first packet in the given flow. After that, when we fetch
      the dst entry  from the socket rx cache, we stop enforcing
      the rp_filter and we even start accepting any kind of martian
      addresses.
      
      Disabling the dst cache for unconnected multicast socket will
      cause large performace regression, nearly reducing by half the
      max ingress tput.
      
      Instead we factor out a route helper to completely validate an
      skb source address for multicast packets and we call it from
      the UDP early demux for mcast packets landing on unconnected
      sockets, after successful fetching the related cached dst entry.
      
      This still gives a measurable, but limited performance
      regression:
      
      		rp_filter = 0		rp_filter = 1
      edmux disabled:	1182 Kpps		1127 Kpps
      edmux before:	2238 Kpps		2238 Kpps
      edmux after:	2037 Kpps		2019 Kpps
      
      The above figures are on top of current net tree.
      Applying the net-next commit 6e617de8 ("net: avoid a full
      fib lookup when rp_filter is disabled.") the delta with
      rp_filter == 0 will decrease even more.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc044e8d
    • P
      IPv4: early demux can return an error code · 7487449c
      Paolo Abeni 提交于
      Currently no error is emitted, but this infrastructure will
      used by the next patch to allow source address validation
      for mcast sockets.
      Since early demux can do a route lookup and an ipv4 route
      lookup can return an error code this is consistent with the
      current ipv4 route infrastructure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7487449c
  5. 08 9月, 2017 1 次提交
  6. 02 9月, 2017 1 次提交
  7. 26 8月, 2017 1 次提交
  8. 23 8月, 2017 1 次提交
  9. 19 8月, 2017 1 次提交
    • M
      datagram: When peeking datagrams with offset < 0 don't skip empty skbs · a0917e0b
      Matthew Dawson 提交于
      Due to commit e6afc8ac ("udp: remove
      headers from UDP packets before queueing"), when udp packets are being
      peeked the requested extra offset is always 0 as there is no need to skip
      the udp header.  However, when the offset is 0 and the next skb is
      of length 0, it is only returned once.  The behaviour can be seen with
      the following python script:
      
      from socket import *;
      f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      f.bind(('::', 0));
      addr=('::1', f.getsockname()[1]);
      g.sendto(b'', addr)
      g.sendto(b'b', addr)
      print(f.recvfrom(10, MSG_PEEK));
      print(f.recvfrom(10, MSG_PEEK));
      
      Where the expected output should be the empty string twice.
      
      Instead, make sk_peek_offset return negative values, and pass those values
      to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed offset
      to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
      __skb_try_recv_from_queue will then ensure the offset is reset back to 0
      if a peek is requested without an offset, unless no packets are found.
      
      Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
      greater then 0, and off is greater then or equal to skb->len, then
      (_off || skb->len) must always be true assuming skb->len >= 0 is always
      true.
      
      Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
      as it double checked if MSG_PEEK was set in the flags.
      
      V2:
       - Moved the negative fixup into __skb_try_recv_from_queue, and remove now
      redundant checks
       - Fix peeking in udp{,v6}_recvmsg to report the right value when the
      offset is 0
      
      V3:
       - Marked new branch in __skb_try_recv_from_queue as unlikely.
      Signed-off-by: NMatthew Dawson <matthew@mjdsystems.ca>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0917e0b
  10. 11 8月, 2017 1 次提交
  11. 10 8月, 2017 1 次提交
  12. 08 8月, 2017 3 次提交
  13. 07 8月, 2017 1 次提交
  14. 30 7月, 2017 1 次提交
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  15. 27 7月, 2017 1 次提交
  16. 26 7月, 2017 1 次提交
    • P
      udp: preserve head state for IP_CMSG_PASSSEC · dce4551c
      Paolo Abeni 提交于
      Paul Moore reported a SELinux/IP_PASSSEC regression
      caused by missing skb->sp at recvmsg() time. We need to
      preserve the skb head state to process the IP_CMSG_PASSSEC
      cmsg.
      
      With this commit we avoid releasing the skb head state in the
      BH even if a secpath is attached to the current skb, and stores
      the skb status (with/without head states) in the scratch area,
      so that we can access it at skb deallocation time, without
      incurring in cache-miss penalties.
      
      This also avoids misusing the skb CB for ipv6 packets,
      as introduced by the commit 0ddf3fb2 ("udp: preserve
      skb->dst if required for IP options processing").
      
      Clean a bit the scratch area helpers implementation, to
      reduce the code differences between 32 and 64 bits build.
      Reported-by: NPaul Moore <paul@paul-moore.com>
      Fixes: 0a463c78 ("udp: avoid a cache miss on dequeue")
      Fixes: 0ddf3fb2 ("udp: preserve skb->dst if required for IP options processing")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Tested-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dce4551c
  17. 19 7月, 2017 1 次提交
  18. 01 7月, 2017 1 次提交
  19. 28 6月, 2017 1 次提交
  20. 23 6月, 2017 1 次提交
    • P
      udp: fix poll() · 9bd780f5
      Paolo Abeni 提交于
      Michael reported an UDP breakage caused by the commit b65ac446
      ("udp: try to avoid 2 cache miss on dequeue").
      The function __first_packet_length() can update the checksum bits
      of the pending skb, making the scratched area out-of-sync, and
      setting skb->csum, if the skb was previously in need of checksum
      validation.
      
      On later recvmsg() for such skb, checksum validation will be
      invoked again - due to the wrong udp_skb_csum_unnecessary()
      value - and will fail, causing the valid skb to be dropped.
      
      This change addresses the issue refreshing the scratch area in
      __first_packet_length() after the possible checksum update.
      
      Fixes: b65ac446 ("udp: try to avoid 2 cache miss on dequeue")
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bd780f5
  21. 21 6月, 2017 1 次提交
  22. 18 6月, 2017 1 次提交
  23. 12 6月, 2017 2 次提交
    • P
      udp: try to avoid 2 cache miss on dequeue · b65ac446
      Paolo Abeni 提交于
      when udp_recvmsg() is executed, on x86_64 and other archs, most skb
      fields are on cold cachelines.
      If the skb are linear and the kernel don't need to compute the udp
      csum, only a handful of skb fields are required by udp_recvmsg().
      Since we already use skb->dev_scratch to cache hot data, and
      there are 32 bits unused on 64 bit archs, use such field to cache
      as much data as we can, and try to prefetch on dequeue the relevant
      fields that are left out.
      
      This can save up to 2 cache miss per packet.
      
      v1 -> v2:
        - changed udp_dev_scratch fields types to u{32,16} variant,
          replaced bitfiled with bool
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b65ac446
    • P
      udp: avoid a cache miss on dequeue · 0a463c78
      Paolo Abeni 提交于
      Since UDP no more uses sk->destructor, we can clear completely
      the skb head state before enqueuing. Amend and use
      skb_release_head_state() for that.
      
      All head states share a single cacheline, which is not
      normally used/accesses on dequeue. We can avoid entirely accessing
      such cacheline implementing and using in the UDP code a specialized
      skb free helper which ignores the skb head state.
      
      This saves a cacheline miss at skb deallocation time.
      
      v1 -> v2:
        replaced secpath_reset() with skb_release_head_state()
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a463c78
  24. 18 5月, 2017 3 次提交
  25. 17 5月, 2017 2 次提交
  26. 08 2月, 2017 2 次提交
  27. 31 1月, 2017 1 次提交
    • R
      net: Avoid receiving packets with an l3mdev on unbound UDP sockets · 63a6fff3
      Robert Shearman 提交于
      Packets arriving in a VRF currently are delivered to UDP sockets that
      aren't bound to any interface. TCP defaults to not delivering packets
      arriving in a VRF to unbound sockets. IP route lookup and socket
      transmit both assume that unbound means using the default table and
      UDP applications that haven't been changed to be aware of VRFs may not
      function correctly in this case since they may not be able to handle
      overlapping IP address ranges, or be able to send packets back to the
      original sender if required.
      
      So add a sysctl, udp_l3mdev_accept, to control this behaviour with it
      being analgous to the existing tcp_l3mdev_accept, namely to allow a
      process to have a VRF-global listen socket. Have this default to off
      as this is the behaviour that users will expect, given that there is
      no explicit mechanism to set unmodified VRF-unaware application into a
      default VRF.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Tested-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      63a6fff3
  28. 19 1月, 2017 1 次提交
  29. 07 1月, 2017 1 次提交
    • E
      udp: inuse checks can quit early for reuseport · df560056
      Eric Garver 提交于
      UDP lib inuse checks will walk the entire hash bucket to check if the
      portaddr is in use. In the case of reuseport we can stop searching when
      we find a matching reuseport.
      
      On a 16-core VM a test program that spawns 16 threads that each bind to
      1024 sockets (one per 10ms) takes 1m45s. With this change it takes 11s.
      
      Also add a cond_resched() when the port is not specified.
      Signed-off-by: NEric Garver <e@erig.me>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df560056
  30. 25 12月, 2016 1 次提交
  31. 10 12月, 2016 2 次提交