1. 05 1月, 2014 1 次提交
  2. 04 1月, 2014 3 次提交
  3. 02 1月, 2014 4 次提交
  4. 31 12月, 2013 1 次提交
    • E
      netfilter: nft_reject: support for IPv6 and TCP reset · bee11dc7
      Eric Leblond 提交于
      This patch moves nft_reject_ipv4 to nft_reject and adds support
      for IPv6 protocol. This patch uses functions included in nf_reject.h
      to implement reject by TCP reset.
      
      The code has to be build as a module if NF_TABLES_IPV6 is also a
      module to avoid compilation error due to usage of IPv6 functions.
      This has been done in Kconfig by using the construct:
      
       depends on NF_TABLES_IPV6 || !NF_TABLES_IPV6
      
      This seems a bit weird in terms of syntax but works perfectly.
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      bee11dc7
  5. 30 12月, 2013 3 次提交
  6. 29 12月, 2013 4 次提交
  7. 27 12月, 2013 7 次提交
  8. 23 12月, 2013 1 次提交
  9. 21 12月, 2013 1 次提交
    • E
      tcp: autocork should not hold first packet in write queue · a181ceb5
      Eric Dumazet 提交于
      Willem noticed a TCP_RR regression caused by TCP autocorking
      on a Mellanox test bed. MLX4_EN_TX_COAL_TIME is 16 us, which can be
      right above RTT between hosts.
      
      We can receive a ACK for a packet still in NIC TX ring buffer or in a
      softnet completion queue.
      
      Fix this by always pushing the skb if it is at the head of write queue.
      
      Also, as TX completion is lockless, it's safer to perform sk_wmem_alloc
      test after setting TSQ_THROTTLED.
      
      erd:~# MIB="MIN_LATENCY,MEAN_LATENCY,MAX_LATENCY,P99_LATENCY,STDDEV_LATENCY"
      erd:~#  ./netperf -H remote -t TCP_RR -- -o $MIB | tail -n 1
      (repeat 3 times)
      
      Before patch :
      
      18,1049.87,41004,39631,6295.47
      17,239.52,40804,48,2912.79
      18,348.40,40877,54,3573.39
      
      After patch :
      
      18,22.84,4606,38,16.39
      17,21.56,2871,36,13.51
      17,22.46,2705,37,11.83
      Reported-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Fixes: f54b3111 ("tcp: auto corking")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a181ceb5
  10. 19 12月, 2013 2 次提交
  11. 18 12月, 2013 3 次提交
    • T
      net: Add utility functions to clear rxhash · 7539fadc
      Tom Herbert 提交于
      In several places 'skb->rxhash = 0' is being done to clear the
      rxhash value in an skb.  This does not clear l4_rxhash which could
      still be set so that the rxhash wouldn't be recalculated on subsequent
      call to skb_get_rxhash.  This patch adds an explict function to clear
      all the rxhash related information in the skb properly.
      
      skb_clear_hash_if_not_l4 clears the rxhash only if it is not marked as
      l4_rxhash.
      
      Fixed up places where 'skb->rxhash = 0' was being called.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7539fadc
    • E
      tcp: refine TSO splits · d4589926
      Eric Dumazet 提交于
      While investigating performance problems on small RPC workloads,
      I noticed linux TCP stack was always splitting the last TSO skb
      into two parts (skbs). One being a multiple of MSS, and a small one
      with the Push flag. This split is done even if TCP_NODELAY is set,
      or if no small packet is in flight.
      
      Example with request/response of 4K/4K
      
      IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
      IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
      
      We can avoid this split by including the Nagle tests at the right place.
      
      Note : If some NIC had trouble sending TSO packets with a partial
      last segment, we would have hit the problem in GRO/forwarding workload already.
      
      tcp_minshall_update() is moved to tcp_output.c and is updated as we might
      feed a TSO packet with a partial last segment.
      
      This patch tremendously improves performance, as the traffic now looks
      like :
      
      IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
      IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
      IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
      IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
      IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
      IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
      
      Before :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280774
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           205719.049006 task-clock                #    9.278 CPUs utilized
               8,449,968 context-switches          #    0.041 M/sec
               1,935,997 CPU-migrations            #    0.009 M/sec
                 160,541 page-faults               #    0.780 K/sec
         548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
         455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
         272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
         166,091,460,030 instructions              #    0.30  insns per cycle
                                                   #    2.74  stalled cycles per insn [83.39%]
          29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
           1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]
      
            22.173517844 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   16851063           0.0
      IpExtOutOctets                  23878580777        0.0
      
      After patch :
      
      lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
      280877
      
       Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':
      
           107496.071918 task-clock                #    4.847 CPUs utilized
               5,635,458 context-switches          #    0.052 M/sec
               1,374,707 CPU-migrations            #    0.013 M/sec
                 160,920 page-faults               #    0.001 M/sec
         281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
         228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
         142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
          95,227,712,566 instructions              #    0.34  insns per cycle
                                                   #    2.40  stalled cycles per insn [83.43%]
          16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
             874,252,952 branch-misses             #    5.39% of all branches         [83.37%]
      
            22.175821286 seconds time elapsed
      
      lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
      IpOutRequests                   11239428           0.0
      IpExtOutOctets                  23595191035        0.0
      
      Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
      2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
      scheduler.
      
      Many thanks to Neal for review and ideas.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Nandita Dukkipati <nanditad@google.com>
      Cc: Van Jacobson <vanj@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Tested-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4589926
    • E
      udp: ipv4: do not use sk_dst_lock from softirq context · e47eb5df
      Eric Dumazet 提交于
      Using sk_dst_lock from softirq context is not supported right now.
      
      Instead of adding BH protection everywhere,
      udp_sk_rx_dst_set() can instead use xchg(), as suggested
      by David.
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Fixes: 97502231 ("udp: ipv4: must add synchronization in udp_sk_rx_dst_set()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e47eb5df
  12. 14 12月, 2013 1 次提交
  13. 13 12月, 2013 1 次提交
    • J
      net-gro: Prepare GRO stack for the upcoming tunneling support · 299603e8
      Jerry Chu 提交于
      This patch modifies the GRO stack to avoid the use of "network_header"
      and associated macros like ip_hdr() and ipv6_hdr() in order to allow
      an arbitary number of IP hdrs (v4 or v6) to be used in the
      encapsulation chain. This lays the foundation for various IP
      tunneling support (IP-in-IP, GRE, VXLAN, SIT,...) to be added later.
      
      With this patch, the GRO stack traversing now is mostly based on
      skb_gro_offset rather than special hdr offsets saved in skb (e.g.,
      skb->network_header). As a result all but the top layer (i.e., the
      the transport layer) must have hdrs of the same length in order for
      a pkt to be considered for aggregation. Therefore when adding a new
      encap layer (e.g., for tunneling), one must check and skip flows
      (e.g., by setting NAPI_GRO_CB(p)->same_flow to 0) that have a
      different hdr length.
      
      Note that unlike the network header, the transport header can and
      will continue to be set by the GRO code since there will be at
      most one "transport layer" in the encap chain.
      Signed-off-by: NH.K. Jerry Chu <hkchu@google.com>
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299603e8
  14. 12 12月, 2013 4 次提交
  15. 11 12月, 2013 4 次提交
    • P
      netfilter: SYNPROXY target: restrict to INPUT/FORWARD · f01b3926
      Patrick McHardy 提交于
      Fix a crash in synproxy_send_tcp() when using the SYNPROXY target in the
      PREROUTING chain caused by missing routing information.
      Reported-by: NNicki P. <xastx@gmx.de>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f01b3926
    • E
      udp: ipv4: fix an use after free in __udp4_lib_rcv() · 8afdd99a
      Eric Dumazet 提交于
      Dave Jones reported a use after free in UDP stack :
      
      [ 5059.434216] =========================
      [ 5059.434314] [ BUG: held lock freed! ]
      [ 5059.434420] 3.13.0-rc3+ #9 Not tainted
      [ 5059.434520] -------------------------
      [ 5059.434620] named/863 is freeing memory ffff88005e960000-ffff88005e96061f, with a lock still held there!
      [ 5059.434815]  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435012] 3 locks held by named/863:
      [ 5059.435086]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8143054d>] __netif_receive_skb_core+0x11d/0x940
      [ 5059.435295]  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff81467a5e>] ip_local_deliver_finish+0x3e/0x410
      [ 5059.435500]  #2:  (slock-AF_INET){+.-...}, at: [<ffffffff8149bd21>] udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.435734]
      stack backtrace:
      [ 5059.435858] CPU: 0 PID: 863 Comm: named Not tainted 3.13.0-rc3+ #9 [loadavg: 0.21 0.06 0.06 1/115 1365]
      [ 5059.436052] Hardware name:                  /D510MO, BIOS MOPNV10J.86A.0175.2010.0308.0620 03/08/2010
      [ 5059.436223]  0000000000000002 ffff88007e203ad8 ffffffff8153a372 ffff8800677130e0
      [ 5059.436390]  ffff88007e203b10 ffffffff8108cafa ffff88005e960000 ffff88007b00cfc0
      [ 5059.436554]  ffffea00017a5800 ffffffff8141c490 0000000000000246 ffff88007e203b48
      [ 5059.436718] Call Trace:
      [ 5059.436769]  <IRQ>  [<ffffffff8153a372>] dump_stack+0x4d/0x66
      [ 5059.436904]  [<ffffffff8108cafa>] debug_check_no_locks_freed+0x15a/0x160
      [ 5059.437037]  [<ffffffff8141c490>] ? __sk_free+0x110/0x230
      [ 5059.437147]  [<ffffffff8112da2a>] kmem_cache_free+0x6a/0x150
      [ 5059.437260]  [<ffffffff8141c490>] __sk_free+0x110/0x230
      [ 5059.437364]  [<ffffffff8141c5c9>] sk_free+0x19/0x20
      [ 5059.437463]  [<ffffffff8141cb25>] sock_edemux+0x25/0x40
      [ 5059.437567]  [<ffffffff8141c181>] sock_queue_rcv_skb+0x81/0x280
      [ 5059.437685]  [<ffffffff8149bd21>] ? udp_queue_rcv_skb+0xd1/0x4b0
      [ 5059.437805]  [<ffffffff81499c82>] __udp_queue_rcv_skb+0x42/0x240
      [ 5059.437925]  [<ffffffff81541d25>] ? _raw_spin_lock+0x65/0x70
      [ 5059.438038]  [<ffffffff8149bebb>] udp_queue_rcv_skb+0x26b/0x4b0
      [ 5059.438155]  [<ffffffff8149c712>] __udp4_lib_rcv+0x152/0xb00
      [ 5059.438269]  [<ffffffff8149d7f5>] udp_rcv+0x15/0x20
      [ 5059.438367]  [<ffffffff81467b2f>] ip_local_deliver_finish+0x10f/0x410
      [ 5059.438492]  [<ffffffff81467a5e>] ? ip_local_deliver_finish+0x3e/0x410
      [ 5059.438621]  [<ffffffff81468653>] ip_local_deliver+0x43/0x80
      [ 5059.438733]  [<ffffffff81467f70>] ip_rcv_finish+0x140/0x5a0
      [ 5059.438843]  [<ffffffff81468926>] ip_rcv+0x296/0x3f0
      [ 5059.438945]  [<ffffffff81430b72>] __netif_receive_skb_core+0x742/0x940
      [ 5059.439074]  [<ffffffff8143054d>] ? __netif_receive_skb_core+0x11d/0x940
      [ 5059.442231]  [<ffffffff8108c81d>] ? trace_hardirqs_on+0xd/0x10
      [ 5059.442231]  [<ffffffff81430d83>] __netif_receive_skb+0x13/0x60
      [ 5059.442231]  [<ffffffff81431c1e>] netif_receive_skb+0x1e/0x1f0
      [ 5059.442231]  [<ffffffff814334e0>] napi_gro_receive+0x70/0xa0
      [ 5059.442231]  [<ffffffffa01de426>] rtl8169_poll+0x166/0x700 [r8169]
      [ 5059.442231]  [<ffffffff81432bc9>] net_rx_action+0x129/0x1e0
      [ 5059.442231]  [<ffffffff810478cd>] __do_softirq+0xed/0x240
      [ 5059.442231]  [<ffffffff81047e25>] irq_exit+0x125/0x140
      [ 5059.442231]  [<ffffffff81004241>] do_IRQ+0x51/0xc0
      [ 5059.442231]  [<ffffffff81542bef>] common_interrupt+0x6f/0x6f
      
      We need to keep a reference on the socket, by using skb_steal_sock()
      at the right place.
      
      Note that another patch is needed to fix a race in
      udp_sk_rx_dst_set(), as we hold no lock protecting the dst.
      
      Fixes: 421b3885 ("udp: ipv4: Add udp early demux")
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8afdd99a
    • S
      net: more spelling fixes · 8e3bff96
      stephen hemminger 提交于
      Various spelling fixes in networking stack
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e3bff96
    • J
      ipv4: add support for IFA_FLAGS nl attribute · ad6c8135
      Jiri Pirko 提交于
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad6c8135