1. 13 9月, 2014 1 次提交
    • S
      udp: Fix inverted NAPI_GRO_CB(skb)->flush test · 2d8f7e2c
      Scott Wood 提交于
      Commit 2abb7cdc ("udp: Add support for doing checksum unnecessary
      conversion") caused napi_gro_cb structs with the "flush" field zero to
      take the "udp_gro_receive" path rather than the "set flush to 1" path
      that they would previously take.  As a result I saw booting from an NFS
      root hang shortly after starting userspace, with "server not
      responding" messages.
      
      This change to the handling of "flush == 0" packets appears to be
      incidental to the goal of adding new code in the case where
      skb_gro_checksum_validate_zero_check() returns zero.  Based on that and
      the fact that it breaks things, I'm assuming that it is unintentional.
      
      Fixes: 2abb7cdc ("udp: Add support for doing checksum unnecessary conversion")
      Cc: Tom Herbert <therbert@google.com>
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d8f7e2c
  2. 10 9月, 2014 5 次提交
  3. 09 9月, 2014 4 次提交
  4. 07 9月, 2014 1 次提交
  5. 06 9月, 2014 5 次提交
  6. 05 9月, 2014 1 次提交
  7. 02 9月, 2014 6 次提交
  8. 30 8月, 2014 2 次提交
    • T
      net: Allow GRO to use and set levels of checksum unnecessary · 662880f4
      Tom Herbert 提交于
      Allow GRO path to "consume" checksums provided in CHECKSUM_UNNECESSARY
      and to report new checksums verfied for use in fallback to normal
      path.
      
      Change GRO checksum path to track csum_level using a csum_cnt field
      in NAPI_GRO_CB. On GRO initialization, if ip_summed is
      CHECKSUM_UNNECESSARY set NAPI_GRO_CB(skb)->csum_cnt to
      skb->csum_level + 1. For each checksum verified, decrement
      NAPI_GRO_CB(skb)->csum_cnt while its greater than zero. If a checksum
      is verfied and NAPI_GRO_CB(skb)->csum_cnt == 0, we have verified a
      deeper checksum than originally indicated in skbuf so increment
      csum_level (or initialize to CHECKSUM_UNNECESSARY if ip_summed is
      CHECKSUM_NONE or CHECKSUM_COMPLETE).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      662880f4
    • T
      net: Clarification of CHECKSUM_UNNECESSARY · 77cffe23
      Tom Herbert 提交于
      This patch:
       - Clarifies the specific requirements of devices returning
         CHECKSUM_UNNECESSARY (comments in skbuff.h).
       - Adds csum_level field to skbuff. This is used to express how
         many checksums are covered by CHECKSUM_UNNECESSARY (stores n - 1).
         This replaces the overloading of skb->encapsulation, that field is
         is now only used to indicate inner headers are valid.
       - Change __skb_checksum_validate_needed to "consume" each checksum
         as indicated by csum_level as layers of the the packet are parsed.
       - Remove skb_pop_rcv_encapsulation, no longer needed in the new
         csum_level model.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77cffe23
  9. 28 8月, 2014 1 次提交
  10. 25 8月, 2014 4 次提交
  11. 24 8月, 2014 1 次提交
  12. 23 8月, 2014 4 次提交
    • Y
      tcp: improve undo on timeout · 989e04c5
      Yuchung Cheng 提交于
      Upon timeout, undo (via both timestamps/Eifel and DSACKs) was
      disabled if any retransmits were still in flight.  The concern was
      perhaps that spurious retransmission sent in a previous recovery
      episode may trigger DSACKs to falsely undo the current recovery.
      
      However, this inadvertently misses undo opportunities (using either
      TCP timestamps or DSACKs) when timeout occurs during a loss episode,
      i.e.  recurring timeouts or timeout during fast recovery. In these
      cases some retransmissions will be in flight but we should allow
      undo. Furthermore, we should only reset undo_marker and undo_retrans
      upon timeout if we are starting a new recovery episode. Finally,
      when we do reset our undo state, we now do so in a manner similar
      to tcp_enter_recovery(), so that we require a DSACK for each of
      the outstsanding retransmissions. This will achieve the original
      goal by requiring that we receive the same number of DSACKs as
      retransmissions.
      
      This patch increases the undo events by 50% on Google servers.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      989e04c5
    • H
      ipconfig: Use time_before · c72c95a0
      Himangi Saraogi 提交于
      The functions time_before, time_before_eq, time_after, and time_after_eq
      are more robust for comparing jiffies against other values.
      
      A simplified version of the Coccinelle semantic patch making this change
      is as follows:
      
      @change@
      expression E1,E2;
      @@
      - jiffies - E1 < E2
      + time_before(jiffies, E1+E2)
      Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
      Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c72c95a0
    • A
      net/ipv4/igmp.c: Replace rcu_dereference() with rcu_access_pointer() · e6b68883
      Andreea-Cristina Bernat 提交于
      The "rcu_dereference()" call is used directly in a condition.
      Since its return value is never dereferenced it is recommended to use
      "rcu_access_pointer()" instead of "rcu_dereference()".
      Therefore, this patch makes the replacement.
      
      The following Coccinelle semantic patch was used:
      @@
      @@
      
      (
       if(
       (<+...
      - rcu_dereference
      + rcu_access_pointer
        (...)
        ...+>)) {...}
      |
       while(
       (<+...
      - rcu_dereference
      + rcu_access_pointer
        (...)
        ...+>)) {...}
      )
      Signed-off-by: NAndreea-Cristina Bernat <bernat.ada@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6b68883
    • S
      ipv4: Restore accept_local behaviour in fib_validate_source() · 1dced6a8
      Sébastien Barré 提交于
      Commit 7a9bc9b8 ("ipv4: Elide fib_validate_source() completely when possible.")
      introduced a short-circuit to avoid calling fib_validate_source when not
      needed. That change took rp_filter into account, but not accept_local.
      This resulted in a change of behaviour: with rp_filter and accept_local
      off, incoming packets with a local address in the source field should be
      dropped.
      
      Here is how to reproduce the change pre/post 7a9bc9b8 commit:
      -configure the same IPv4 address on hosts A and B.
      -try to send an ARP request from B to A.
      -The ARP request will be dropped before that commit, but accepted and answered
      after that commit.
      
      This adds a check for ACCEPT_LOCAL, to maintain full
      fib validation in case it is 0. We also leave __fib_validate_source() earlier
      when possible, based on the same check as fib_validate_source(), once the
      accept_local stuff is verified.
      
      Cc: Gregory Detal <gregory.detal@uclouvain.be>
      Cc: Christoph Paasch <christoph.paasch@uclouvain.be>
      Cc: Hannes Frederic Sowa <hannes@redhat.com>
      Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NSébastien Barré <sebastien.barre@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dced6a8
  13. 19 8月, 2014 1 次提交
    • P
      netfilter: move NAT Kconfig switches out of the iptables scope · 8993cf8e
      Pablo Neira Ayuso 提交于
      Currently, the NAT configs depend on iptables and ip6tables. However,
      users should be capable of enabling NAT for nft without having to
      switch on iptables.
      
      Fix this by adding new specific IP_NF_NAT and IP6_NF_NAT config
      switches for iptables and ip6tables NAT support. I have also moved
      the original NF_NAT_IPV4 and NF_NAT_IPV6 configs out of the scope
      of iptables to make them independent of it.
      
      This patch also adds NETFILTER_XT_NAT which selects the xt_nat
      combo that provides snat/dnat for iptables. We cannot use NF_NAT
      anymore since nf_tables can select this.
      Reported-by: NMatteo Croce <technoboy85@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8993cf8e
  14. 15 8月, 2014 4 次提交
    • N
      tcp: fix ssthresh and undo for consecutive short FRTO episodes · 0c9ab092
      Neal Cardwell 提交于
      Fix TCP FRTO logic so that it always notices when snd_una advances,
      indicating that any RTO after that point will be a new and distinct
      loss episode.
      
      Previously there was a very specific sequence that could cause FRTO to
      fail to notice a new loss episode had started:
      
      (1) RTO timer fires, enter FRTO and retransmit packet 1 in write queue
      (2) receiver ACKs packet 1
      (3) FRTO sends 2 more packets
      (4) RTO timer fires again (should start a new loss episode)
      
      The problem was in step (3) above, where tcp_process_loss() returned
      early (in the spot marked "Step 2.b"), so that it never got to the
      logic to clear icsk_retransmits. Thus icsk_retransmits stayed
      non-zero. Thus in step (4) tcp_enter_loss() would see the non-zero
      icsk_retransmits, decide that this RTO is not a new episode, and
      decide not to cut ssthresh and remember the current cwnd and ssthresh
      for undo.
      
      There were two main consequences to the bug that we have
      observed. First, ssthresh was not decreased in step (4). Second, when
      there was a series of such FRTO (1-4) sequences that happened to be
      followed by an FRTO undo, we would restore the cwnd and ssthresh from
      before the entire series started (instead of the cwnd and ssthresh
      from before the most recent RTO). This could result in cwnd and
      ssthresh being restored to values much bigger than the proper values.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Fixes: e33099f9 ("tcp: implement RFC5682 F-RTO")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c9ab092
    • H
      tcp: don't allow syn packets without timestamps to pass tcp_tw_recycle logic · a26552af
      Hannes Frederic Sowa 提交于
      tcp_tw_recycle heavily relies on tcp timestamps to build a per-host
      ordering of incoming connections and teardowns without the need to
      hold state on a specific quadruple for TCP_TIMEWAIT_LEN, but only for
      the last measured RTO. To do so, we keep the last seen timestamp in a
      per-host indexed data structure and verify if the incoming timestamp
      in a connection request is strictly greater than the saved one during
      last connection teardown. Thus we can verify later on that no old data
      packets will be accepted by the new connection.
      
      During moving a socket to time-wait state we already verify if timestamps
      where seen on a connection. Only if that was the case we let the
      time-wait socket expire after the RTO, otherwise normal TCP_TIMEWAIT_LEN
      will be used. But we don't verify this on incoming SYN packets. If a
      connection teardown was less than TCP_PAWS_MSL seconds in the past we
      cannot guarantee to not accept data packets from an old connection if
      no timestamps are present. We should drop this SYN packet. This patch
      closes this loophole.
      
      Please note, this patch does not make tcp_tw_recycle in any way more
      usable but only adds another safety check:
      Sporadic drops of SYN packets because of reordering in the network or
      in the socket backlog queues can happen. Users behing NAT trying to
      connect to a tcp_tw_recycle enabled server can get caught in blackholes
      and their connection requests may regullary get dropped because hosts
      behind an address translator don't have synchronized tcp timestamp clocks.
      tcp_tw_recycle cannot work if peers don't have tcp timestamps enabled.
      
      In general, use of tcp_tw_recycle is disadvised.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a26552af
    • N
      tcp: fix tcp_release_cb() to dispatch via address family for mtu_reduced() · 4fab9071
      Neal Cardwell 提交于
      Make sure we use the correct address-family-specific function for
      handling MTU reductions from within tcp_release_cb().
      
      Previously AF_INET6 sockets were incorrectly always using the IPv6
      code path when sometimes they were handling IPv4 traffic and thus had
      an IPv4 dst.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NWillem de Bruijn <willemb@google.com>
      Fixes: 563d34d0 ("tcp: dont drop MTU reduction indications")
      Reviewed-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fab9071
    • A
      tcp: don't use timestamp from repaired skb-s to calculate RTT (v2) · 9d186cac
      Andrey Vagin 提交于
      We don't know right timestamp for repaired skb-s. Wrong RTT estimations
      isn't good, because some congestion modules heavily depends on it.
      
      This patch adds the TCPCB_REPAIRED flag, which is included in
      TCPCB_RETRANS.
      
      Thanks to Eric for the advice how to fix this issue.
      
      This patch fixes the warning:
      [  879.562947] WARNING: CPU: 0 PID: 2825 at net/ipv4/tcp_input.c:3078 tcp_ack+0x11f5/0x1380()
      [  879.567253] CPU: 0 PID: 2825 Comm: socket-tcpbuf-l Not tainted 3.16.0-next-20140811 #1
      [  879.567829] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [  879.568177]  0000000000000000 00000000c532680c ffff880039643d00 ffffffff817aa2d2
      [  879.568776]  0000000000000000 ffff880039643d38 ffffffff8109afbd ffff880039d6ba80
      [  879.569386]  ffff88003a449800 000000002983d6bd 0000000000000000 000000002983d6bc
      [  879.569982] Call Trace:
      [  879.570264]  [<ffffffff817aa2d2>] dump_stack+0x4d/0x66
      [  879.570599]  [<ffffffff8109afbd>] warn_slowpath_common+0x7d/0xa0
      [  879.570935]  [<ffffffff8109b0ea>] warn_slowpath_null+0x1a/0x20
      [  879.571292]  [<ffffffff816d0a05>] tcp_ack+0x11f5/0x1380
      [  879.571614]  [<ffffffff816d10bd>] tcp_rcv_established+0x1ed/0x710
      [  879.571958]  [<ffffffff816dc9da>] tcp_v4_do_rcv+0x10a/0x370
      [  879.572315]  [<ffffffff81657459>] release_sock+0x89/0x1d0
      [  879.572642]  [<ffffffff816c81a0>] do_tcp_setsockopt.isra.36+0x120/0x860
      [  879.573000]  [<ffffffff8110a52e>] ? rcu_read_lock_held+0x6e/0x80
      [  879.573352]  [<ffffffff816c8912>] tcp_setsockopt+0x32/0x40
      [  879.573678]  [<ffffffff81654ac4>] sock_common_setsockopt+0x14/0x20
      [  879.574031]  [<ffffffff816537b0>] SyS_setsockopt+0x80/0xf0
      [  879.574393]  [<ffffffff817b40a9>] system_call_fastpath+0x16/0x1b
      [  879.574730] ---[ end trace a17cbc38eb8c5c00 ]---
      
      v2: moving setting of skb->when for repaired skb-s in tcp_write_xmit,
          where it's set for other skb-s.
      
      Fixes: 431a9124 ("tcp: timestamp SYN+DATA messages")
      Fixes: 740b0f18 ("tcp: switch rtt estimations to usec resolution")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d186cac