1. 12 3月, 2012 1 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
  2. 10 3月, 2012 2 次提交
  3. 08 3月, 2012 7 次提交
  4. 07 3月, 2012 1 次提交
    • N
      tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97
      Neal Cardwell 提交于
      This commit fixes tcp_shift_skb_data() so that it does not shift
      SACKed data below snd_una.
      
      This fixes an issue whose symptoms exactly match reports showing
      tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
      net/ipv4/tcp_input.c:3418" thread on netdev).
      
      Since 2008 (832d11c5)
      tcp_shift_skb_data() had been shifting SACKed ranges that were below
      snd_una. It checked that the *end* of the skb it was about to shift
      from was above snd_una, but did not check that the end of the actual
      shifted range was above snd_una; this commit adds that check.
      
      Shifting SACKed ranges below snd_una is problematic because for such
      ranges tcp_sacktag_one() short-circuits: it does not declare anything
      as SACKed and does not increase sacked_out.
      
      Before the fixes in commits cc9a672e
      and daef52ba, shifting SACKed ranges
      below snd_una happened to work because tcp_shifted_skb() was always
      (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
      tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
      tcp_sacktag_one() never short-circuited and always increased
      tp->sacked_out in this case.
      
      After those two fixes, my testing has verified that shifting SACKed
      ranges below snd_una could cause tp->sacked_out to go negative with
      the following sequence of events:
      
      (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
          then shifts a prefix of that skb that is below snd_una
      
      (2) tcp_shifted_skb() increments the packet count of the
          already-SACKed prev sk_buff
      
      (3) tcp_sacktag_one() sees the end of the new SACKed range is below
          snd_una, so it short-circuits and doesn't increase tp->sacked_out
      
      (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
          decrements tp->sacked_out by this "inflated" pcount that was
          missing a matching increase in tp->sacked_out, and hence
          tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
          to s32 is negative.
      
      (6) this leads to the warnings seen in the recent "WARNING: at
          net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
          tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);
      
      More generally, I think this bug can be tickled in some cases where
      two or more ACKs from the receiver are lost and then a DSACK arrives
      that is immediately above an existing SACKed skb in the write queue.
      
      This fix changes tcp_shift_skb_data() to abort this sequence at step
      (1) in the scenario above by noticing that the bytes are below snd_una
      and not shifting them.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4648dc97
  5. 04 3月, 2012 1 次提交
    • N
      tcp: don't fragment SACKed skbs in tcp_mark_head_lost() · c0638c24
      Neal Cardwell 提交于
      In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
      to mark the first portion as lost. This is for two primary reasons:
      
      (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
      doing this, it preserves the sum of their packet counts in order to
      reflect the real-world dynamics on the wire. But given that skbs can
      have remainders that do not align to MSS boundaries, this packet count
      preservation means that for SACKed skbs there is not necessarily a
      direct linear relationship between tcp_skb_pcount(skb) and
      skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
      off and mark as lost a prefix of length (packets - oldcnt)*mss from
      SACKed skbs were leading to occasional failures of the WARN_ON(len >
      skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
      recent "crash in tcp_fragment" thread on netdev).
      
      (2) there is no real point in fragmenting off part of a SACKed skb and
      calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
      for SACKed skbs.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0638c24
  6. 29 2月, 2012 1 次提交
    • N
      tcp: fix false reordering signal in tcp_shifted_skb · 4c90d3b3
      Neal Cardwell 提交于
      When tcp_shifted_skb() shifts bytes from the skb that is currently
      pointed to by 'highest_sack' then the increment of
      TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
      implicit advancement, combined with the recent fix to pass the correct
      SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
      that the newly SACKed range was before the tcp_highest_sack_seq(),
      leading to a call to tcp_update_reordering() with a degree of
      reordering matching the size of the newly SACKed range (typically just
      1 packet, which is a NOP, but potentially larger).
      
      This commit fixes this by simply calling tcp_sacktag_one() before the
      TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
      highest SACKed sequence.
      
      Correspondingly, we can simplify the code a little now that
      tcp_shifted_skb() should update the lost_cnt_hint in all cases where
      skb == tp->lost_skb_hint.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c90d3b3
  7. 27 2月, 2012 1 次提交
  8. 25 2月, 2012 1 次提交
  9. 24 2月, 2012 1 次提交
  10. 22 2月, 2012 2 次提交
  11. 16 2月, 2012 2 次提交
  12. 15 2月, 2012 1 次提交
  13. 14 2月, 2012 1 次提交
  14. 13 2月, 2012 3 次提交
  15. 11 2月, 2012 2 次提交
  16. 09 2月, 2012 1 次提交
    • E
      ipv4: Implement IP_UNICAST_IF socket option. · 76e21053
      Erich E. Hoover 提交于
      The IP_UNICAST_IF feature is needed by the Wine project.  This patch
      implements the feature by setting the outgoing interface in a similar
      fashion to that of IP_MULTICAST_IF.  A separate option is needed to
      handle this feature since the existing options do not provide all of
      the characteristics required by IP_UNICAST_IF, a summary is provided
      below.
      
      SO_BINDTODEVICE:
      * SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
      does not.  From reading some old mailing list articles my
      understanding is that SO_BINDTODEVICE requires administrative
      privileges because it can override the administrator's routing
      settings.
      * The SO_BINDTODEVICE option restricts both outbound and inbound
      traffic, IP_UNICAST_IF only impacts outbound traffic.
      
      IP_PKTINFO:
      * Since IP_PKTINFO and IP_UNICAST_IF are independent options,
      implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
      applications.
      * Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
      complicates the Wine codebase and reduces the socket performance
      (doing this requires a lot of extra communication between the
      "server" and "user" layers).
      
      bind():
      * bind() does not work on broadcast packets, IP_UNICAST_IF is
      specifically intended to work with broadcast packets.
      * Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
      traffic.
      Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e21053
  17. 08 2月, 2012 1 次提交
  18. 05 2月, 2012 1 次提交
  19. 03 2月, 2012 1 次提交
  20. 02 2月, 2012 2 次提交
    • A
      net: Disambiguate kernel message · efcdbf24
      Arun Sharma 提交于
      Some of our machines were reporting:
      
      TCP: too many of orphaned sockets
      
      even when the number of orphaned sockets was well below the
      limit.
      
      We print a different message depending on whether we're out
      of TCP memory or there are too many orphaned sockets.
      
      Also move the check out of line and cleanup the messages
      that were printed.
      Signed-off-by: NArun Sharma <asharma@fb.com>
      Suggested-by: NMohan Srinivasan <mohan@fb.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efcdbf24
    • S
      tcp: md5: RST: getting md5 key from listener · 658ddaaf
      Shawn Lu 提交于
      TCP RST mechanism is broken in TCP md5(RFC2385). When
      connection is gone, md5 key is lost, sending RST
      without md5 hash is deem to ignored by peer. This can
      be a problem since RST help protocal like bgp to fast
      recove from peer crash.
      
      In most case, users of tcp md5, such as bgp and ldp,
      have listener on both sides to accept connection from peer.
      md5 keys for peers are saved in listening socket.
      
      There are two cases in finding md5 key when connection is
      lost:
      1.Passive receive RST: The message is send to well known port,
      tcp will associate it with listner. md5 key is gotten from
      listener.
      
      2.Active receive RST (no sock): The message is send to ative
      side, there is no socket associated with the message. In this
      case, finding listener from source port, then find md5 key from
      listener.
      
      we are not loosing sercuriy here:
      packet is checked with md5 hash. No RST is generated
      if md5 hash doesn't match or no md5 key can be found.
      Signed-off-by: NShawn Lu <shawn.lu@ericsson.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      658ddaaf
  21. 01 2月, 2012 4 次提交
  22. 31 1月, 2012 2 次提交
    • N
      tcp: fix tcp_trim_head() to adjust segment count with skb MSS · 5b35e1e6
      Neal Cardwell 提交于
      This commit fixes tcp_trim_head() to recalculate the number of
      segments in the skb with the skb's existing MSS, so trimming the head
      causes the skb segment count to be monotonically non-increasing - it
      should stay the same or go down, but not increase.
      
      Previously tcp_trim_head() used the current MSS of the connection. But
      if there was a decrease in MSS between original transmission and ACK
      (e.g. due to PMTUD), this could cause tcp_trim_head() to
      counter-intuitively increase the segment count when trimming bytes off
      the head of an skb. This violated assumptions in tcp_tso_acked() that
      tcp_trim_head() only decreases the packet count, so that packets_acked
      in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to
      pass u32 pkts_acked values as large as 0xffffffff to
      ca_ops->pkts_acked().
      
      As an aside, if tcp_trim_head() had really wanted the skb to reflect
      the current MSS, it should have called tcp_set_skb_tso_segs()
      unconditionally, since a decrease in MSS would mean that a
      single-packet skb should now be sliced into multiple segments.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b35e1e6
    • G
      net/tcp: Fix tcp memory limits initialization when !CONFIG_SYSCTL · 4acb4190
      Glauber Costa 提交于
      sysctl_tcp_mem() initialization was moved to sysctl_tcp_ipv4.c
      in commit 3dc43e3e, since it
      became a per-ns value.
      
      That code, however, will never run when CONFIG_SYSCTL is
      disabled, leading to bogus values on those fields - causing hung
      TCP sockets.
      
      This patch fixes it by keeping an initialization code in
      tcp_init(). It will be overwritten by the first net namespace
      init if CONFIG_SYSCTL is compiled in, and do the right thing if
      it is compiled out.
      
      It is also named properly as tcp_init_mem(), to properly signal
      its non-sysctl side effect on TCP limits.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Cc: David S. Miller <davem@davemloft.net>
      Link: http://lkml.kernel.org/r/4F22D05A.8030604@parallels.com
      [ renamed the function, tidied up the changelog a bit ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4acb4190
  23. 28 1月, 2012 1 次提交