1. 22 10月, 2015 2 次提交
    • E
      ipv6: gro: support sit protocol · feec0cb3
      Eric Dumazet 提交于
      Tom Herbert added SIT support to GRO with commit
      19424e05 ("sit: Add gro callbacks to sit_offload"),
      later reverted by Herbert Xu.
      
      The problem came because Tom patch was building GRO
      packets without proper meta data : If packets were locally
      delivered, we would not care.
      
      But if packets needed to be forwarded, GSO engine was not
      able to segment individual segments.
      
      With the following patch, we correctly set skb->encapsulation
      and inner network header. We also update gso_type.
      
      Tested:
      
      Server :
      netserver
      modprobe dummy
      ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
      arp -s 8.0.0.100 4e:32:51:04:47:e5
      iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
      ifconfig sixtofour0
      sixtofour0 Link encap:IPv6-in-IPv4
                inet6 addr: 2002:af6:798::1/128 Scope:Global
                inet6 addr: 2002:af6:798::/128 Scope:Global
                UP RUNNING NOARP  MTU:1480  Metric:1
                RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
                TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
                collisions:0 txqueuelen:0
                RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)
      
      Client :
      netperf -H 2002:af6:798::1 -l 1000 &
      
      Checked on server traffic copied on dummy0 and verify segments were
      properly rebuilt, with proper IP headers, TCP checksums...
      
      tcpdump on eth0 shows proper GRO aggregation takes place.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feec0cb3
    • A
      netlink: Rightsize IFLA_AF_SPEC size calculation · b1974ed0
      Arad, Ronen 提交于
      if_nlmsg_size() overestimates the minimum allocation size of netlink
      dump request (when called from rtnl_calcit()) or the size of the
      message (when called from rtnl_getlink()). This is because
      ext_filter_mask is not supported by rtnl_link_get_af_size() and
      rtnl_link_get_size().
      
      The over-estimation is significant when at least one netdev has many
      VLANs configured (8 bytes for each configured VLAN).
      
      This patch-set "rightsizes" the protocol specific attribute size
      calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
      and adding this a argument to get_link_af_size op in rtnl_af_ops.
      
      Bridge module already used filtering aware sizing for notifications.
      br_get_link_af_size_filtered() is consistent with the modified
      get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
      br_get_link_af_size() becomes unused and thus removed.
      Signed-off-by: NRonen Arad <ronen.arad@intel.com>
      Acked-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1974ed0
  2. 19 10月, 2015 1 次提交
  3. 17 10月, 2015 1 次提交
  4. 16 10月, 2015 3 次提交
  5. 14 10月, 2015 3 次提交
  6. 13 10月, 2015 12 次提交
  7. 11 10月, 2015 2 次提交
  8. 08 10月, 2015 7 次提交
  9. 07 10月, 2015 2 次提交
  10. 05 10月, 2015 2 次提交
    • A
      ipv6: use ktime_t for internal timestamps · 3dd7669f
      Arnd Bergmann 提交于
      The ipv6 mip6 implementation is one of only a few users of the
      skb_get_timestamp() function in the kernel, which is both unsafe
      on 32-bit architectures because of the 2038 overflow, and slightly
      less efficient than the skb_get_ktime() based approach.
      
      This converts the function call and the mip6_report_rate_limiter
      structure that stores the time stamp, eliminating all uses of
      timeval in the ipv6 code.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: James Morris <jmorris@namei.org>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dd7669f
    • E
      tcp: avoid two atomic ops for syncookies · a1a5344d
      Eric Dumazet 提交于
      inet_reqsk_alloc() is used to allocate a temporary request
      in order to generate a SYNACK with a cookie. Then later,
      syncookie validation also uses a temporary request.
      
      These paths already took a reference on listener refcount,
      we can avoid a couple of atomic operations.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1a5344d
  11. 03 10月, 2015 5 次提交
    • E
      tcp: do not lock listener to process SYN packets · e994b2f0
      Eric Dumazet 提交于
      Everything should now be ready to finally allow SYN
      packets processing without holding listener lock.
      
      Tested:
      
      3.5 Mpps SYNFLOOD. Plenty of cpu cycles available.
      
      Next bottleneck is the refcount taken on listener,
      that could be avoided if we remove SLAB_DESTROY_BY_RCU
      strict semantic for listeners, and use regular RCU.
      
          13.18%  [kernel]  [k] __inet_lookup_listener
           9.61%  [kernel]  [k] tcp_conn_request
           8.16%  [kernel]  [k] sha_transform
           5.30%  [kernel]  [k] inet_reqsk_alloc
           4.22%  [kernel]  [k] sock_put
           3.74%  [kernel]  [k] tcp_make_synack
           2.88%  [kernel]  [k] ipt_do_table
           2.56%  [kernel]  [k] memcpy_erms
           2.53%  [kernel]  [k] sock_wfree
           2.40%  [kernel]  [k] tcp_v4_rcv
           2.08%  [kernel]  [k] fib_table_lookup
           1.84%  [kernel]  [k] tcp_openreq_init_rwin
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e994b2f0
    • E
      tcp: attach SYNACK messages to request sockets instead of listener · ca6fb065
      Eric Dumazet 提交于
      If a listen backlog is very big (to avoid syncookies), then
      the listener sk->sk_wmem_alloc is the main source of false
      sharing, as we need to touch it twice per SYNACK re-transmit
      and TX completion.
      
      (One SYN packet takes listener lock once, but up to 6 SYNACK
      are generated)
      
      By attaching the skb to the request socket, we remove this
      source of contention.
      
      Tested:
      
       listen(fd, 10485760); // single listener (no SO_REUSEPORT)
       16 RX/TX queue NIC
       Sustain a SYNFLOOD attack of ~320,000 SYN per second,
       Sending ~1,400,000 SYNACK per second.
       Perf profiles now show listener spinlock being next bottleneck.
      
          20.29%  [kernel]  [k] queued_spin_lock_slowpath
          10.06%  [kernel]  [k] __inet_lookup_established
           5.12%  [kernel]  [k] reqsk_timer_handler
           3.22%  [kernel]  [k] get_next_timer_interrupt
           3.00%  [kernel]  [k] tcp_make_synack
           2.77%  [kernel]  [k] ipt_do_table
           2.70%  [kernel]  [k] run_timer_softirq
           2.50%  [kernel]  [k] ip_finish_output
           2.04%  [kernel]  [k] cascade
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca6fb065
    • E
      tcp/dccp: install syn_recv requests into ehash table · 079096f1
      Eric Dumazet 提交于
      In this patch, we insert request sockets into TCP/DCCP
      regular ehash table (where ESTABLISHED and TIMEWAIT sockets
      are) instead of using the per listener hash table.
      
      ACK packets find SYN_RECV pseudo sockets without having
      to find and lock the listener.
      
      In nominal conditions, this halves pressure on listener lock.
      
      Note that this will allow for SO_REUSEPORT refinements,
      so that we can select a listener using cpu/numa affinities instead
      of the prior 'consistent hash', since only SYN packets will
      apply this selection logic.
      
      We will shrink listen_sock in the following patch to ease
      code review.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ying Cai <ycai@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      079096f1
    • E
      tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argument · 2feda341
      Eric Dumazet 提交于
      This is no longer used.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2feda341
    • E
      tcp: get_openreq[46]() changes · aa3a0c8c
      Eric Dumazet 提交于
      When request sockets are no longer in a per listener hash table
      but on regular TCP ehash, we need to access listener uid
      through req->rsk_listener
      
      get_openreq6() also gets a const for its request socket argument.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa3a0c8c