1. 26 1月, 2017 7 次提交
    • W
      net/tcp-fastopen: make connect()'s return case more consistent with non-TFO · 3979ad7e
      Willy Tarreau 提交于
      Without TFO, any subsequent connect() call after a successful one returns
      -1 EISCONN. The last API update ensured that __inet_stream_connect() can
      return -1 EINPROGRESS in response to sendmsg() when TFO is in use to
      indicate that the connection is now in progress. Unfortunately since this
      function is used both for connect() and sendmsg(), it has the undesired
      side effect of making connect() now return -1 EINPROGRESS as well after
      a successful call, while at the same time poll() returns POLLOUT. This
      can confuse some applications which happen to call connect() and to
      check for -1 EISCONN to ensure the connection is usable, and for which
      EINPROGRESS indicates a need to poll, causing a loop.
      
      This problem was encountered in haproxy where a call to connect() is
      precisely used in certain cases to confirm a connection's readiness.
      While arguably haproxy's behaviour should be improved here, it seems
      important to aim at a more robust behaviour when the goal of the new
      API is to make it easier to implement TFO in existing applications.
      
      This patch simply ensures that we preserve the same semantics as in
      the non-TFO case on the connect() syscall when using TFO, while still
      returning -1 EINPROGRESS on sendmsg(). For this we simply tell
      __inet_stream_connect() whether we're doing a regular connect() or in
      fact connecting for a sendmsg() call.
      
      Cc: Wei Wang <weiwan@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NWilly Tarreau <w@1wt.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3979ad7e
    • W
      net/tcp-fastopen: Add new API support · 19f6d3f3
      Wei Wang 提交于
      This patch adds a new socket option, TCP_FASTOPEN_CONNECT, as an
      alternative way to perform Fast Open on the active side (client). Prior
      to this patch, a client needs to replace the connect() call with
      sendto(MSG_FASTOPEN). This can be cumbersome for applications who want
      to use Fast Open: these socket operations are often done in lower layer
      libraries used by many other applications. Changing these libraries
      and/or the socket call sequences are not trivial. A more convenient
      approach is to perform Fast Open by simply enabling a socket option when
      the socket is created w/o changing other socket calls sequence:
        s = socket()
          create a new socket
        setsockopt(s, IPPROTO_TCP, TCP_FASTOPEN_CONNECT …);
          newly introduced sockopt
          If set, new functionality described below will be used.
          Return ENOTSUPP if TFO is not supported or not enabled in the
          kernel.
      
        connect()
          With cookie present, return 0 immediately.
          With no cookie, initiate 3WHS with TFO cookie-request option and
          return -1 with errno = EINPROGRESS.
      
        write()/sendmsg()
          With cookie present, send out SYN with data and return the number of
          bytes buffered.
          With no cookie, and 3WHS not yet completed, return -1 with errno =
          EINPROGRESS.
          No MSG_FASTOPEN flag is needed.
      
        read()
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connect() is called but
          write() is not called yet.
          Return -1 with errno = EWOULDBLOCK/EAGAIN if connection is
          established but no msg is received yet.
          Return number of bytes read if socket is established and there is
          msg received.
      
      The new API simplifies life for applications that always perform a write()
      immediately after a successful connect(). Such applications can now take
      advantage of Fast Open by merely making one new setsockopt() call at the time
      of creating the socket. Nothing else about the application's socket call
      sequence needs to change.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19f6d3f3
    • W
      net: Remove __sk_dst_reset() in tcp_v6_connect() · 25776aa9
      Wei Wang 提交于
      Remove __sk_dst_reset() in the failure handling because __sk_dst_reset()
      will eventually get called when sk is released. No need to handle it in
      the protocol specific connect call.
      This is also to make the code path consistent with ipv4.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25776aa9
    • W
      net/tcp-fastopen: refactor cookie check logic · 065263f4
      Wei Wang 提交于
      Refactor the cookie check logic in tcp_send_syn_data() into a function.
      This function will be called else where in later changes.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      065263f4
    • E
      tcp: reduce skb overhead in selected places · 60b1af33
      Eric Dumazet 提交于
      tcp_add_backlog() can use skb_condense() helper to get better
      gains and less SKB_TRUESIZE() magic. This only happens when socket
      backlog has to be used.
      
      Some attacks involve specially crafted out of order tiny TCP packets,
      clogging the ofo queue of (many) sockets.
      Then later, expensive collapse happens, trying to copy all these skbs
      into single ones.
      This unfortunately does not work if each skb has no neighbor in TCP
      sequence order.
      
      By using skb_condense() if the skb could not be coalesced to a prior
      one, we defeat these kind of threats, potentially saving 4K per skb
      (or more, since this is one page fragment).
      
      A typical NAPI driver allocates gro packets with GRO_MAX_HEAD bytes
      in skb->head, meaning the copy done by skb_condense() is limited to
      about 200 bytes.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60b1af33
    • D
      tipc: uninitialized return code in tipc_setsockopt() · a08ef476
      Dan Carpenter 提交于
      We shuffled some code around and added some new case statements here and
      now "res" isn't initialized on all paths.
      
      Fixes: 01fd12bb ("tipc: make replicast a user selectable option")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a08ef476
    • J
      net sched actions: Add support for user cookies · 1045ba77
      Jamal Hadi Salim 提交于
      Introduce optional 128-bit action cookie.
      Like all other cookie schemes in the networking world (eg in protocols
      like http or existing kernel fib protocol field, etc) the idea is to save
      user state that when retrieved serves as a correlator. The kernel
      _should not_ intepret it.  The user can store whatever they wish in the
      128 bits.
      
      Sample exercise(showing variable length use of cookie)
      
      .. create an accept action with cookie a1b2c3d4
      sudo $TC actions add action ok index 1 cookie a1b2c3d4
      
      .. dump all gact actions..
      sudo $TC -s actions ls action gact
      
          action order 0: gact action pass
           random type none pass val 0
           index 1 ref 1 bind 0 installed 5 sec used 5 sec
          Action statistics:
          Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
          backlog 0b 0p requeues 0
          cookie a1b2c3d4
      
      .. bind the accept action to a filter..
      sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
      u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1
      
      ... send some traffic..
      $ ping 127.0.0.1 -c 3
      PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
      64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
      64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
      64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1045ba77
  2. 25 1月, 2017 10 次提交
  3. 23 1月, 2017 2 次提交
  4. 21 1月, 2017 11 次提交
  5. 20 1月, 2017 3 次提交
    • D
      net: ipv6: Keep nexthop of multipath route on admin down · a1a22c12
      David Ahern 提交于
      IPv6 deletes route entries associated with multipath routes on an
      admin down where IPv4 does not. For example:
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24 metric 64
                  nexthop via 10.100.1.254  dev eth1 weight 1
                  nexthop via 10.100.2.254  dev eth2 weight 1
          10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.4
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
          $ ip -6 ro ls vrf red
          2001:db8:1::/120 dev eth1 proto kernel metric 256  pref medium
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Set link down:
          $ ip li set eth1 down
      
      IPv4 retains the multihop route but flags eth1 route as dead:
      
          $ ip ro ls vrf red
          unreachable default metric 8192
          1.1.1.0/24
                  nexthop via 10.100.1.16  dev eth1 weight 1 dead linkdown
                  nexthop via 10.100.2.16  dev eth2 weight 1
          10.100.2.0/24 dev eth2 proto kernel scope link src 10.100.2.4
      
      and IPv6 deletes the route as part of flushing all routes for the device:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      
      Worse, on admin up of the device the multipath route has to be deleted
      to get this leg of the route re-added.
      
      This patch keeps routes that are part of a multipath route if
      ignore_routes_with_linkdown is set with the dead and linkdown flags
      enabling consistency between IPv4 and IPv6:
      
          $ ip -6 ro ls vrf red
          2001:db8:2:: dev red proto none metric 0  pref medium
          2001:db8:2::/120 dev eth2 proto kernel metric 256  pref medium
          2001:db8:11::/120 via 2001:db8:1::16 dev eth1 metric 1024 dead linkdown  pref medium
          2001:db8:11::/120 via 2001:db8:2::17 dev eth2 metric 1024  pref medium
          ...
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a22c12
    • T
      net: caif: Remove unused stats member from struct chnl_net · 54c30f60
      Tobias Klauser 提交于
      The stats member of struct chnl_net is used nowhere in the code, so it
      might as well be removed.
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54c30f60
    • A
      net/sched: cls_flower: reduce fl_change stack size · 39b7b6a6
      Arnd Bergmann 提交于
      The new ARP support has pushed the stack size over the edge on ARM,
      as there are two large objects on the stack in this function (mask
      and tb) and both have now grown a bit more:
      
      net/sched/cls_flower.c: In function 'fl_change':
      net/sched/cls_flower.c:928:1: error: the frame size of 1072 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      We can solve this by dynamically allocating one or both of them.
      I first tried to do it just for the mask, but that only saved
      152 bytes on ARM, while this version just does it for the 'tb'
      array, bringing the stack size back down to 664 bytes.
      
      Fixes: 99d31326 ("net/sched: cls_flower: Support matching on ARP")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39b7b6a6
  6. 19 1月, 2017 7 次提交