1. 06 4月, 2016 28 次提交
  2. 05 4月, 2016 12 次提交
    • D
      Merge branch 'tcp-udp-misc' · 15f41e2b
      David S. Miller 提交于
      Eric Dumazet says:
      
      ====================
      net: various udp/tcp changes
      
      First round of patches for linux-4.7
      
      Add a generic facility for sockets to be freed after an RCU grace
      period, if they need to.
      
      Then UDP stack is changed to no longer use SLAB_DESTROY_BY_RCU,
      in order to speedup rx processing for traffic encapsulated in UDP.
      It gives a 17 % speedup for normal UDP reception in stress conditions.
      
      Then TCP listeners are changed to use SOCK_RCU_FREE as well
      to avoid touching sk_refcnt in synflood case :
      I got up to 30 % performance increase for a mono listener.
      
      Then three patches add SK_MEMINFO_DROPS to sock_diag
      and add per socket rx drops accounting to TCP.
      
      Last patch adds rate limiting on ACK sent on behalf of SYN_RECV
      to better resist to SYNFLOOD targeting one or few flows.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15f41e2b
    • E
      tcp: rate limit ACK sent by SYN_RECV request sockets · 4ce7e93c
      Eric Dumazet 提交于
      Attackers like to use SYNFLOOD targeting one 5-tuple, as they
      hit a single RX queue (and cpu) on the victim.
      
      If they use random sequence numbers in their SYN, we detect
      they do not match the expected window and send back an ACK.
      
      This patch adds a rate limitation, so that the effect of such
      attacks is limited to ingress only.
      
      We roughly double our ability to absorb such attacks.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ce7e93c
    • E
      ipv4: tcp: set SOCK_USE_WRITE_QUEUE for ip_send_unicast_reply() · a9d6532b
      Eric Dumazet 提交于
      TCP uses per cpu 'sockets' to send some packets :
      - RST packets ( tcp_v4_send_reset()) )
      - ACK packets for SYN_RECV and TIMEWAIT sockets
      
      By setting SOCK_USE_WRITE_QUEUE flag, we tell sock_wfree()
      to not call sk_write_space() since these internal sockets
      do not care.
      
      This gives a small performance improvement, merely by allowing
      cpu to properly predict the sock_wfree() conditional branch,
      and avoiding one atomic operation.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9d6532b
    • E
      tcp: increment sk_drops for listeners · 9caad864
      Eric Dumazet 提交于
      Goal: packets dropped by a listener are accounted for.
      
      This adds tcp_listendrop() helper, and clears sk_drops in sk_clone_lock()
      so that children do not inherit their parent drop count.
      
      Note that we no longer increment LINUX_MIB_LISTENDROPS counter when
      sending a SYNCOOKIE, since the SYN packet generated a SYNACK.
      We already have a separate LINUX_MIB_SYNCOOKIESSENT
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9caad864
    • E
      tcp: increment sk_drops for dropped rx packets · 532182cd
      Eric Dumazet 提交于
      Now ss can report sk_drops, we can instruct TCP to increment
      this per socket counter when it drops an incoming frame, to refine
      monitoring and debugging.
      
      Following patch takes care of listeners drops.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      532182cd
    • E
      sock_diag: add SK_MEMINFO_DROPS · 15239302
      Eric Dumazet 提交于
      Reporting sk_drops to user space was available for UDP
      sockets using /proc interface.
      
      Add this to sock_diag, so that we can have the same information
      available to ss users, and we'll be able to add sk_drops
      indications for TCP sockets as well.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15239302
    • E
      tcp/dccp: do not touch listener sk_refcnt under synflood · 3b24d854
      Eric Dumazet 提交于
      When a SYNFLOOD targets a non SO_REUSEPORT listener, multiple
      cpus contend on sk->sk_refcnt and sk->sk_wmem_alloc changes.
      
      By letting listeners use SOCK_RCU_FREE infrastructure,
      we can relax TCP_LISTEN lookup rules and avoid touching sk_refcnt
      
      Note that we still use SLAB_DESTROY_BY_RCU rules for other sockets,
      only listeners are impacted by this change.
      
      Peak performance under SYNFLOOD is increased by ~33% :
      
      On my test machine, I could process 3.2 Mpps instead of 2.4 Mpps
      
      Most consuming functions are now skb_set_owner_w() and sock_wfree()
      contending on sk->sk_wmem_alloc when cooking SYNACK and freeing them.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b24d854
    • E
      inet: reqsk_alloc() needs to take care of dead listeners · 3a5d1c0e
      Eric Dumazet 提交于
      We'll soon no longer take a refcount on listeners,
      so reqsk_alloc() can not assume a listener refcount is not
      zero. We need to use atomic_inc_not_zero()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5d1c0e
    • E
      tcp/dccp: use rcu locking in inet_diag_find_one_icsk() · 2d331915
      Eric Dumazet 提交于
      RX packet processing holds rcu_read_lock(), so we can remove
      pairs of rcu_read_lock()/rcu_read_unlock() in lookup functions
      if inet_diag also holds rcu before calling them.
      
      This is needed anyway as __inet_lookup_listener() and
      inet6_lookup_listener() will soon no longer increment
      refcount on the found listener.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d331915
    • E
      tcp/dccp: remove BH disable/enable in lookup · ee3cf32a
      Eric Dumazet 提交于
      Since linux 2.6.29, lookups only use rcu locking.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee3cf32a
    • E
      udp: no longer use SLAB_DESTROY_BY_RCU · ca065d0c
      Eric Dumazet 提交于
      Tom Herbert would like not touching UDP socket refcnt for encapsulated
      traffic. For this to happen, we need to use normal RCU rules, with a grace
      period before freeing a socket. UDP sockets are not short lived in the
      high usage case, so the added cost of call_rcu() should not be a concern.
      
      This actually removes a lot of complexity in UDP stack.
      
      Multicast receives no longer need to hold a bucket spinlock.
      
      Note that ip early demux still needs to take a reference on the socket.
      
      Same remark for functions used by xt_socket and xt_PROXY netfilter modules,
      but this might be changed later.
      
      Performance for a single UDP socket receiving flood traffic from
      many RX queues/cpus.
      
      Simple udp_rx using simple recvfrom() loop :
      438 kpps instead of 374 kpps : 17 % increase of the peak rate.
      
      v2: Addressed Willem de Bruijn feedback in multicast handling
       - keep early demux break in __udp4_lib_demux_lookup()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Tested-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca065d0c
    • E
      net: add SOCK_RCU_FREE socket flag · a4298e45
      Eric Dumazet 提交于
      We want a generic way to insert an RCU grace period before socket
      freeing for cases where RCU_SLAB_DESTROY_BY_RCU is adding too
      much overhead.
      
      SLAB_DESTROY_BY_RCU strict rules force us to take a reference
      on the socket sk_refcnt, and it is a performance problem for UDP
      encapsulation, or TCP synflood behavior, as many CPUs might
      attempt the atomic operations on a shared sk_refcnt
      
      UDP sockets and TCP listeners can set SOCK_RCU_FREE so that their
      lookup can use traditional RCU rules, without refcount changes.
      They can set the flag only once hashed and visible by other cpus.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <tom@herbertland.com>
      Tested-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4298e45