1. 06 6月, 2019 1 次提交
  2. 20 5月, 2019 1 次提交
  3. 09 4月, 2019 1 次提交
  4. 23 2月, 2019 1 次提交
  5. 18 1月, 2019 1 次提交
  6. 17 1月, 2019 1 次提交
  7. 15 12月, 2018 1 次提交
    • P
      net: udp: prefer listeners bound to an address · 4cdeeee9
      Peter Oskolkov 提交于
      A relatively common use case is to have several IPs configured
      on a host, and have different listeners for each of them. We would
      like to add a "catch all" listener on addr_any, to match incoming
      connections not served by any of the listeners bound to a specific
      address.
      
      However, port-only lookups can match addr_any sockets when sockets
      listening on specific addresses are present if so_reuseport flag
      is set. This patch eliminates lookups into port-only hashtable,
      as lookups by (addr,port) tuple are easily available.
      
      In addition, compute_score() is tweaked to _not_ match
      addr_any sockets to specific addresses, as hash collisions
      could result in the unwanted behavior described above.
      
      Tested: the patch compiles; full test in the last patch in this
      patchset. Existing reuseport_* selftests also pass.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPeter Oskolkov <posk@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cdeeee9
  8. 17 11月, 2018 1 次提交
  9. 09 11月, 2018 3 次提交
    • S
      udp: Support for error handlers of tunnels with arbitrary destination port · e7cc0824
      Stefano Brivio 提交于
      ICMP error handling is currently not possible for UDP tunnels not
      employing a receiving socket with local destination port matching the
      remote one, because we have no way to look them up.
      
      Add an err_handler tunnel encapsulation operation that can be exported by
      tunnels in order to pass the error to the protocol implementing the
      encapsulation. We can't easily use a lookup function as we did for VXLAN
      and GENEVE, as protocol error handlers, which would be in turn called by
      implementations of this new operation, handle the errors themselves,
      together with the tunnel lookup.
      
      Without a socket, we can't be sure which encapsulation error handler is
      the appropriate one: encapsulation handlers (the ones for FoU and GUE
      introduced in the next patch, e.g.) will need to check the new error codes
      returned by protocol handlers to figure out if errors match the given
      encapsulation, and, in turn, report this error back, so that we can try
      all of them in __udp{4,6}_lib_err_encap_no_sk() until we have a match.
      
      v2:
      - Name all arguments in err_handler prototypes (David Miller)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7cc0824
    • S
      net: Convert protocol error handlers from void to int · 32bbd879
      Stefano Brivio 提交于
      We'll need this to handle ICMP errors for tunnels without a sending socket
      (i.e. FoU and GUE). There, we might have to look up different types of IP
      tunnels, registered as network protocols, before we get a match, so we
      want this for the error handlers of IPPROTO_IPIP and IPPROTO_IPV6 in both
      inet_protos and inet6_protos. These error codes will be used in the next
      patch.
      
      For consistency, return sensible error codes in protocol error handlers
      whenever handlers can't handle errors because, even if valid, they don't
      match a protocol or any of its states.
      
      This has no effect on existing error handling paths.
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32bbd879
    • S
      udp: Handle ICMP errors for tunnels with same destination port on both endpoints · a36e185e
      Stefano Brivio 提交于
      For both IPv4 and IPv6, if we can't match errors to a socket, try
      tunnels before ignoring them. Look up a socket with the original source
      and destination ports as found in the UDP packet inside the ICMP payload,
      this will work for tunnels that force the same destination port for both
      endpoints, i.e. VXLAN and GENEVE.
      
      Actually, lwtunnels could break this assumption if they are configured by
      an external control plane to have different destination ports on the
      endpoints: in this case, we won't be able to trace ICMP messages back to
      them.
      
      For IPv6 redirect messages, call ip6_redirect() directly with the output
      interface argument set to the interface we received the packet from (as
      it's the very interface we should build the exception on), otherwise the
      new nexthop will be rejected. There's no such need for IPv4.
      
      Tunnels can now export an encap_err_lookup() operation that indicates a
      match. Pass the packet to the lookup function, and if the tunnel driver
      reports a matching association, continue with regular ICMP error handling.
      
      v2:
      - Added newline between network and transport header sets in
        __udp{4,6}_lib_err_encap() (David Miller)
      - Removed redundant skb_reset_network_header(skb); in
        __udp4_lib_err_encap()
      - Removed redundant reassignment of iph in __udp4_lib_err_encap()
        (Sabrina Dubroca)
      - Edited comment to __udp{4,6}_lib_err_encap() to reflect the fact this
        won't work with lwtunnels configured to use asymmetric ports. By the way,
        it's VXLAN, not VxLAN (Jiri Benc)
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a36e185e
  10. 08 11月, 2018 5 次提交
    • P
      udp: cope with UDP GRO packet misdirection · cf329aa4
      Paolo Abeni 提交于
      In some scenarios, the GRO engine can assemble an UDP GRO packet
      that ultimately lands on a non GRO-enabled socket.
      This patch tries to address the issue explicitly checking for the UDP
      socket features before enqueuing the packet, and eventually segmenting
      the unexpected GRO packet, as needed.
      
      We must also cope with re-insertion requests: after segmentation the
      UDP code calls the helper introduced by the previous patches, as needed.
      
      Segmentation is performed by a common helper, which takes care of
      updating socket and protocol stats is case of failure.
      
      rfc v3 -> v1
       - fix compile issues with rxrpc
       - when gso_segment returns NULL, treat is as an error
       - added 'ipv4' argument to udp_rcv_segment()
      
      rfc v2 -> rfc v3
       - moved udp_rcv_segment() into net/udp.h, account errors to socket
         and ns, always return NULL or segs list
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf329aa4
    • P
      udp: add support for UDP_GRO cmsg · bcd1665e
      Paolo Abeni 提交于
      When UDP GRO is enabled, the UDP_GRO cmsg will carry the ingress
      datagram size. User-space can use such info to compute the original
      packets layout.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcd1665e
    • P
      udp: implement GRO for plain UDP sockets. · e20cf8d3
      Paolo Abeni 提交于
      This is the RX counterpart of commit bec1f6f6 ("udp: generate gso
      with UDP_SEGMENT"). When UDP_GRO is enabled, such socket is also
      eligible for GRO in the rx path: UDP segments directed to such socket
      are assembled into a larger GSO_UDP_L4 packet.
      
      The core UDP GRO support is enabled with setsockopt(UDP_GRO).
      
      Initial benchmark numbers:
      
      Before:
      udp rx:   1079 MB/s   769065 calls/s
      
      After:
      udp rx:   1466 MB/s    24877 calls/s
      
      This change introduces a side effect in respect to UDP tunnels:
      after a UDP tunnel creation, now the kernel performs a lookup per ingress
      UDP packet, while before such lookup happened only if the ingress packet
      carried a valid internal header csum.
      
      rfc v2 -> rfc v3:
       - fixed typos in macro name and comments
       - really enforce UDP_GRO_CNT_MAX, instead of UDP_GRO_CNT_MAX + 1
       - acquire socket lock in UDP_GRO setsockopt
      
      rfc v1 -> rfc v2:
       - use a new option to enable UDP GRO
       - use static keys to protect the UDP GRO socket lookup
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e20cf8d3
    • P
      udp: implement complete book-keeping for encap_needed · 60fb9567
      Paolo Abeni 提交于
      The *encap_needed static keys are enabled by UDP tunnels
      and several UDP encapsulations type, but they are never
      turned off. This can cause unneeded overall performance
      degradation for systems where such features are used
      transiently.
      
      This patch introduces complete book-keeping for such keys,
      decreasing the usage at socket destruction time, if needed,
      and avoiding that the same socket could increase the key
      usage multiple times.
      
      rfc v3 -> v1:
       - add socket lock around udp_tunnel_encap_enable()
      
      rfc v2 -> rfc v3:
       - use udp_tunnel_encap_enable() in setsockopt()
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60fb9567
    • M
      net: ensure unbound datagram socket to be chosen when not in a VRF · 6da5b0f0
      Mike Manning 提交于
      Ensure an unbound datagram skt is chosen when not in a VRF. The check
      for a device match in compute_score() for UDP must be performed when
      there is no device match. For this, a failure is returned when there is
      no device match. This ensures that bound sockets are never selected,
      even if there is no unbound socket.
      
      Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
      packets are currently blocked, as flowi6_oif was set to that of the
      master vrf device, and the ipi6_ifindex is that of the slave device.
      Allow these packets to be sent by checking the device with ipi6_ifindex
      has the same L3 scope as that of the bound device of the skt, which is
      the master vrf device. Note that this check always succeeds if the skt
      is unbound.
      
      Even though the right datagram skt is now selected by compute_score(),
      a different skt is being returned that is bound to the wrong vrf. The
      difference between these and stream sockets is the handling of the skt
      option for SO_REUSEPORT. While the handling when adding a skt for reuse
      correctly checks that the bound device of the skt is a match, the skts
      in the hashslot are already incorrect. So for the same hash, a skt for
      the wrong vrf may be selected for the required port. The root cause is
      that the skt is immediately placed into a slot when it is created,
      but when the skt is then bound using SO_BINDTODEVICE, it remains in the
      same slot. The solution is to move the skt to the correct slot by
      forcing a rehash.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6da5b0f0
  11. 31 10月, 2018 1 次提交
  12. 27 10月, 2018 1 次提交
  13. 25 10月, 2018 1 次提交
    • S
      net: udp: fix handling of CHECKSUM_COMPLETE packets · db4f1be3
      Sean Tranchetti 提交于
      Current handling of CHECKSUM_COMPLETE packets by the UDP stack is
      incorrect for any packet that has an incorrect checksum value.
      
      udp4/6_csum_init() will both make a call to
      __skb_checksum_validate_complete() to initialize/validate the csum
      field when receiving a CHECKSUM_COMPLETE packet. When this packet
      fails validation, skb->csum will be overwritten with the pseudoheader
      checksum so the packet can be fully validated by software, but the
      skb->ip_summed value will be left as CHECKSUM_COMPLETE so that way
      the stack can later warn the user about their hardware spewing bad
      checksums. Unfortunately, leaving the SKB in this state can cause
      problems later on in the checksum calculation.
      
      Since the the packet is still marked as CHECKSUM_COMPLETE,
      udp_csum_pull_header() will SUBTRACT the checksum of the UDP header
      from skb->csum instead of adding it, leaving us with a garbage value
      in that field. Once we try to copy the packet to userspace in the
      udp4/6_recvmsg(), we'll make a call to skb_copy_and_csum_datagram_msg()
      to checksum the packet data and add it in the garbage skb->csum value
      to perform our final validation check.
      
      Since the value we're validating is not the proper checksum, it's possible
      that the folded value could come out to 0, causing us not to drop the
      packet. Instead, we believe that the packet was checksummed incorrectly
      by hardware since skb->ip_summed is still CHECKSUM_COMPLETE, and we attempt
      to warn the user with netdev_rx_csum_fault(skb->dev);
      
      Unfortunately, since this is the UDP path, skb->dev has been overwritten
      by skb->dev_scratch and is no longer a valid pointer, so we end up
      reading invalid memory.
      
      This patch addresses this problem in two ways:
      	1) Do not use the dev pointer when calling netdev_rx_csum_fault()
      	   from skb_copy_and_csum_datagram_msg(). Since this gets called
      	   from the UDP path where skb->dev has been overwritten, we have
      	   no way of knowing if the pointer is still valid. Also for the
      	   sake of consistency with the other uses of
      	   netdev_rx_csum_fault(), don't attempt to call it if the
      	   packet was checksummed by software.
      
      	2) Add better CHECKSUM_COMPLETE handling to udp4/6_csum_init().
      	   If we receive a packet that's CHECKSUM_COMPLETE that fails
      	   verification (i.e. skb->csum_valid == 0), check who performed
      	   the calculation. It's possible that the checksum was done in
      	   software by the network stack earlier (such as Netfilter's
      	   CONNTRACK module), and if that says the checksum is bad,
      	   we can drop the packet immediately instead of waiting until
      	   we try and copy it to userspace. Otherwise, we need to
      	   mark the SKB as CHECKSUM_NONE, since the skb->csum field
      	   no longer contains the full packet checksum after the
      	   call to __skb_checksum_validate_complete().
      
      Fixes: e6afc8ac ("udp: remove headers from UDP packets before queueing")
      Fixes: c84d9490 ("udp: copy skb->truesize in the first cache line")
      Cc: Sam Kumar <samanthakumar@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db4f1be3
  14. 08 10月, 2018 1 次提交
    • J
      udp: Unbreak modules that rely on external __skb_recv_udp() availability · 7e823644
      Jiri Kosina 提交于
      Commit 2276f58a ("udp: use a separate rx queue for packet reception")
      turned static inline __skb_recv_udp() from being a trivial helper around
      __skb_recv_datagram() into a UDP specific implementaion, making it
      EXPORT_SYMBOL_GPL() at the same time.
      
      There are external modules that got broken by __skb_recv_udp() not being
      visible to them. Let's unbreak them by making __skb_recv_udp EXPORT_SYMBOL().
      
      Rationale (one of those) why this is actually "technically correct" thing
      to do: __skb_recv_udp() used to be an inline wrapper around
      __skb_recv_datagram(), which itself (still, and correctly so, I believe)
      is EXPORT_SYMBOL().
      
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: 2276f58a ("udp: use a separate rx queue for packet reception")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e823644
  15. 06 10月, 2018 1 次提交
  16. 03 10月, 2018 1 次提交
  17. 17 9月, 2018 1 次提交
  18. 11 8月, 2018 2 次提交
    • M
      bpf: Enable BPF_PROG_TYPE_SK_REUSEPORT bpf prog in reuseport selection · 8217ca65
      Martin KaFai Lau 提交于
      This patch allows a BPF_PROG_TYPE_SK_REUSEPORT bpf prog to select a
      SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY introduced in
      the earlier patch.  "bpf_run_sk_reuseport()" will return -ECONNREFUSED
      when the BPF_PROG_TYPE_SK_REUSEPORT prog returns SK_DROP.
      The callers, in inet[6]_hashtable.c and ipv[46]/udp.c, are modified to
      handle this case and return NULL immediately instead of continuing the
      sk search from its hashtable.
      
      It re-uses the existing SO_ATTACH_REUSEPORT_EBPF setsockopt to attach
      BPF_PROG_TYPE_SK_REUSEPORT.  The "sk_reuseport_attach_bpf()" will check
      if the attaching bpf prog is in the new SK_REUSEPORT or the existing
      SOCKET_FILTER type and then check different things accordingly.
      
      One level of "__reuseport_attach_prog()" call is removed.  The
      "sk_unhashed() && ..." and "sk->sk_reuseport_cb" tests are pushed
      back to "reuseport_attach_prog()" in sock_reuseport.c.  sock_reuseport.c
      seems to have more knowledge on those test requirements than filter.c.
      In "reuseport_attach_prog()", after new_prog is attached to reuse->prog,
      the old_prog (if any) is also directly freed instead of returning the
      old_prog to the caller and asking the caller to free.
      
      The sysctl_optmem_max check is moved back to the
      "sk_reuseport_attach_filter()" and "sk_reuseport_attach_bpf()".
      As of other bpf prog types, the new BPF_PROG_TYPE_SK_REUSEPORT is only
      bounded by the usual "bpf_prog_charge_memlock()" during load time
      instead of bounded by both bpf_prog_charge_memlock and sysctl_optmem_max.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      8217ca65
    • M
      bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT · 2dbb9b9e
      Martin KaFai Lau 提交于
      This patch adds a BPF_PROG_TYPE_SK_REUSEPORT which can select
      a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY.  Like other
      non SK_FILTER/CGROUP_SKB program, it requires CAP_SYS_ADMIN.
      
      BPF_PROG_TYPE_SK_REUSEPORT introduces "struct sk_reuseport_kern"
      to store the bpf context instead of using the skb->cb[48].
      
      At the SO_REUSEPORT sk lookup time, it is in the middle of transiting
      from a lower layer (ipv4/ipv6) to a upper layer (udp/tcp).  At this
      point,  it is not always clear where the bpf context can be appended
      in the skb->cb[48] to avoid saving-and-restoring cb[].  Even putting
      aside the difference between ipv4-vs-ipv6 and udp-vs-tcp.  It is not
      clear if the lower layer is only ipv4 and ipv6 in the future and
      will it not touch the cb[] again before transiting to the upper
      layer.
      
      For example, in udp_gro_receive(), it uses the 48 byte NAPI_GRO_CB
      instead of IP[6]CB and it may still modify the cb[] after calling
      the udp[46]_lib_lookup_skb().  Because of the above reason, if
      sk->cb is used for the bpf ctx, saving-and-restoring is needed
      and likely the whole 48 bytes cb[] has to be saved and restored.
      
      Instead of saving, setting and restoring the cb[], this patch opts
      to create a new "struct sk_reuseport_kern" and setting the needed
      values in there.
      
      The new BPF_PROG_TYPE_SK_REUSEPORT and "struct sk_reuseport_(kern|md)"
      will serve all ipv4/ipv6 + udp/tcp combinations.  There is no protocol
      specific usage at this point and it is also inline with the current
      sock_reuseport.c implementation (i.e. no protocol specific requirement).
      
      In "struct sk_reuseport_md", this patch exposes data/data_end/len
      with semantic similar to other existing usages.  Together
      with "bpf_skb_load_bytes()" and "bpf_skb_load_bytes_relative()",
      the bpf prog can peek anywhere in the skb.  The "bind_inany" tells
      the bpf prog that the reuseport group is bind-ed to a local
      INANY address which cannot be learned from skb.
      
      The new "bind_inany" is added to "struct sock_reuseport" which will be
      used when running the new "BPF_PROG_TYPE_SK_REUSEPORT" bpf prog in order
      to avoid repeating the "bind INANY" test on
      "sk_v6_rcv_saddr/sk->sk_rcv_saddr" every time a bpf prog is run.  It can
      only be properly initialized when a "sk->sk_reuseport" enabled sk is
      adding to a hashtable (i.e. during "reuseport_alloc()" and
      "reuseport_add_sock()").
      
      The new "sk_select_reuseport()" is the main helper that the
      bpf prog will use to select a SO_REUSEPORT sk.  It is the only function
      that can use the new BPF_MAP_TYPE_REUSEPORT_ARRAY.  As mentioned in
      the earlier patch, the validity of a selected sk is checked in
      run time in "sk_select_reuseport()".  Doing the check in
      verification time is difficult and inflexible (consider the map-in-map
      use case).  The runtime check is to compare the selected sk's reuseport_id
      with the reuseport_id that we want.  This helper will return -EXXX if the
      selected sk cannot serve the incoming request (e.g. reuseport_id
      not match).  The bpf prog can decide if it wants to do SK_DROP as its
      discretion.
      
      When the bpf prog returns SK_PASS, the kernel will check if a
      valid sk has been selected (i.e. "reuse_kern->selected_sk != NULL").
      If it does , it will use the selected sk.  If not, the kernel
      will select one from "reuse->socks[]" (as before this patch).
      
      The SK_DROP and SK_PASS handling logic will be in the next patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2dbb9b9e
  19. 07 7月, 2018 2 次提交
    • W
      ip: remove tx_flags from ipcm_cookie and use same logic for v4 and v6 · 678ca42d
      Willem de Bruijn 提交于
      skb_shinfo(skb)->tx_flags is derived from sk->sk_tsflags, possibly
      after modification by __sock_cmsg_send, by calling sock_tx_timestamp.
      
      The IPv4 and IPv6 paths do this conversion differently. In IPv4, the
      individual protocols that support tx timestamps call this function
      and store the result in ipc.tx_flags. In IPv6, sock_tx_timestamp is
      called in __ip6_append_data.
      
      There is no need to store both tx_flags and ts_flags in the cookie
      as one is derived from the other. Convert when setting up the cork
      and remove the redundant field. This is similar to IPv6, only have
      the conversion happen only once per datagram, in ip(6)_setup_cork.
      
      Also change __ip6_append_data to match __ip_append_data. Only update
      tskey if timestamping is enabled with OPT_ID. The SOCK_.. test is
      redundant: only valid protocols can have non-zero cork->tx_flags.
      
      After this change the IPv4 and IPv6 logic is the same.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678ca42d
    • W
      ipv4: ipcm_cookie initializers · 35178206
      Willem de Bruijn 提交于
      Initialize the cookie in one location to reduce code duplication and
      avoid bugs from inconsistent initialization, such as that fixed in
      commit 9887cba1 ("ip: limit use of gso_size to udp").
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35178206
  20. 04 7月, 2018 1 次提交
  21. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  22. 09 6月, 2018 1 次提交
    • P
      udp: fix rx queue len reported by diag and proc interface · 6c206b20
      Paolo Abeni 提交于
      After commit 6b229cf7 ("udp: add batching to udp_rmem_release()")
      the sk_rmem_alloc field does not measure exactly anymore the
      receive queue length, because we batch the rmem release. The issue
      is really apparent only after commit 0d4a6608 ("udp: do rmem bulk
      free even if the rx sk queue is empty"): the user space can easily
      check for an empty socket with not-0 queue length reported by the 'ss'
      tool or the procfs interface.
      
      We need to use a custom UDP helper to report the correct queue length,
      taking into account the forward allocation deficit.
      
      Reported-by: trevor.francis@46labs.com
      Fixes: 6b229cf7 ("UDP: add batching to udp_rmem_release()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c206b20
  23. 05 6月, 2018 1 次提交
  24. 28 5月, 2018 1 次提交
    • A
      bpf: Hooks for sys_sendmsg · 1cedee13
      Andrey Ignatov 提交于
      In addition to already existing BPF hooks for sys_bind and sys_connect,
      the patch provides new hooks for sys_sendmsg.
      
      It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
      that provides access to socket itlself (properties like family, type,
      protocol) and user-passed `struct sockaddr *` so that BPF program can
      override destination IP and port for system calls such as sendto(2) or
      sendmsg(2) and/or assign source IP to the socket.
      
      The hooks are implemented as two new attach types:
      `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
      UDPv6 correspondingly.
      
      UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
      sys_connect hooks, i.e. to prevent reading from / writing to e.g.
      user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
      
      The difference with already existing hooks is sys_sendmsg are
      implemented only for unconnected UDP.
      
      For TCP it doesn't make sense to change user-provided `struct sockaddr *`
      at sendto(2)/sendmsg(2) time since socket either was already connected
      and has source/destination set or wasn't connected and call to
      sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
      
      Connected UDP is already handled by sys_connect hooks that can override
      source/destination at connect time and use fast-path later, i.e. these
      hooks don't affect UDP fast-path.
      
      Rewriting source IP is implemented differently than that in sys_connect
      hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
      just bind socket to desired local IP address since source IP can be set
      on per-packet basis by using ancillary data (cmsg(3)). So no matter if
      socket is bound or not, source IP has to be rewritten on every call to
      sys_sendmsg.
      
      To do so two new fields are added to UAPI `struct bpf_sock_addr`;
      * `msg_src_ip4` to set source IPv4 for UDPv4;
      * `msg_src_ip6` to set source IPv6 for UDPv6.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1cedee13
  25. 26 5月, 2018 1 次提交
  26. 24 5月, 2018 1 次提交
  27. 16 5月, 2018 2 次提交
  28. 12 5月, 2018 1 次提交
  29. 11 5月, 2018 2 次提交
  30. 09 5月, 2018 1 次提交