1. 17 3月, 2012 1 次提交
    • N
      arp: allow arp processing to honor per interface arp_accept sysctl · 124d37e9
      Neil Horman 提交于
      I found recently that the arp_process function which handles all of our received
      arp frames, is using IPV4_DEVCONF_ALL macro to check the state of the arp_process
      flag.  This seems wrong, as it implies that either none or all of the network
      interfaces accept gratuitous arps.  This patch corrects that, allowing
      per-interface arp_accept configuration to deviate from the all setting.  Note
      this also brings us into line with the way the arp_filter setting is handled
      during arp_process execution.
      
      Tested this myself on my home network, and confirmed it works as expected.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      124d37e9
  2. 13 3月, 2012 1 次提交
  3. 12 3月, 2012 2 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
    • E
      tcp: fix syncookie regression · dfd25fff
      Eric Dumazet 提交于
      commit ea4fc0d6 (ipv4: Don't use rt->rt_{src,dst} in ip_queue_xmit())
      added a serious regression on synflood handling.
      
      Simon Kirby discovered a successful connection was delayed by 20 seconds
      before being responsive.
      
      In my tests, I discovered that xmit frames were lost, and needed ~4
      retransmits and a socket dst rebuild before being really sent.
      
      In case of syncookie initiated connection, we use a different path to
      initialize the socket dst, and inet->cork.fl.u.ip4 is left cleared.
      
      As ip_queue_xmit() now depends on inet flow being setup, fix this by
      copying the temp flowi4 we use in cookie_v4_check().
      Reported-by: NSimon Kirby <sim@netnation.com>
      Bisected-by: NSimon Kirby <sim@netnation.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfd25fff
  4. 10 3月, 2012 2 次提交
  5. 08 3月, 2012 7 次提交
  6. 07 3月, 2012 1 次提交
    • N
      tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una · 4648dc97
      Neal Cardwell 提交于
      This commit fixes tcp_shift_skb_data() so that it does not shift
      SACKed data below snd_una.
      
      This fixes an issue whose symptoms exactly match reports showing
      tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
      net/ipv4/tcp_input.c:3418" thread on netdev).
      
      Since 2008 (832d11c5)
      tcp_shift_skb_data() had been shifting SACKed ranges that were below
      snd_una. It checked that the *end* of the skb it was about to shift
      from was above snd_una, but did not check that the end of the actual
      shifted range was above snd_una; this commit adds that check.
      
      Shifting SACKed ranges below snd_una is problematic because for such
      ranges tcp_sacktag_one() short-circuits: it does not declare anything
      as SACKed and does not increase sacked_out.
      
      Before the fixes in commits cc9a672e
      and daef52ba, shifting SACKed ranges
      below snd_una happened to work because tcp_shifted_skb() was always
      (incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
      tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
      tcp_sacktag_one() never short-circuited and always increased
      tp->sacked_out in this case.
      
      After those two fixes, my testing has verified that shifting SACKed
      ranges below snd_una could cause tp->sacked_out to go negative with
      the following sequence of events:
      
      (1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
          then shifts a prefix of that skb that is below snd_una
      
      (2) tcp_shifted_skb() increments the packet count of the
          already-SACKed prev sk_buff
      
      (3) tcp_sacktag_one() sees the end of the new SACKed range is below
          snd_una, so it short-circuits and doesn't increase tp->sacked_out
      
      (5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
          decrements tp->sacked_out by this "inflated" pcount that was
          missing a matching increase in tp->sacked_out, and hence
          tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
          to s32 is negative.
      
      (6) this leads to the warnings seen in the recent "WARNING: at
          net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
          tcp_input.c:3418  WARN_ON((int)tp->sacked_out < 0);
      
      More generally, I think this bug can be tickled in some cases where
      two or more ACKs from the receiver are lost and then a DSACK arrives
      that is immediately above an existing SACKed skb in the write queue.
      
      This fix changes tcp_shift_skb_data() to abort this sequence at step
      (1) in the scenario above by noticing that the bytes are below snd_una
      and not shifting them.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4648dc97
  7. 04 3月, 2012 1 次提交
    • N
      tcp: don't fragment SACKed skbs in tcp_mark_head_lost() · c0638c24
      Neal Cardwell 提交于
      In tcp_mark_head_lost() we should not attempt to fragment a SACKed skb
      to mark the first portion as lost. This is for two primary reasons:
      
      (1) tcp_shifted_skb() coalesces adjacent regions of SACKed skbs. When
      doing this, it preserves the sum of their packet counts in order to
      reflect the real-world dynamics on the wire. But given that skbs can
      have remainders that do not align to MSS boundaries, this packet count
      preservation means that for SACKed skbs there is not necessarily a
      direct linear relationship between tcp_skb_pcount(skb) and
      skb->len. Thus tcp_mark_head_lost()'s previous attempts to fragment
      off and mark as lost a prefix of length (packets - oldcnt)*mss from
      SACKed skbs were leading to occasional failures of the WARN_ON(len >
      skb->len) in tcp_fragment() (which used to be a BUG_ON(); see the
      recent "crash in tcp_fragment" thread on netdev).
      
      (2) there is no real point in fragmenting off part of a SACKed skb and
      calling tcp_skb_mark_lost() on it, since tcp_skb_mark_lost() is a NOP
      for SACKed skbs.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNandita Dukkipati <nanditad@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0638c24
  8. 29 2月, 2012 1 次提交
    • N
      tcp: fix false reordering signal in tcp_shifted_skb · 4c90d3b3
      Neal Cardwell 提交于
      When tcp_shifted_skb() shifts bytes from the skb that is currently
      pointed to by 'highest_sack' then the increment of
      TCP_SKB_CB(skb)->seq implicitly advances tcp_highest_sack_seq(). This
      implicit advancement, combined with the recent fix to pass the correct
      SACKed range into tcp_sacktag_one(), caused tcp_sacktag_one() to think
      that the newly SACKed range was before the tcp_highest_sack_seq(),
      leading to a call to tcp_update_reordering() with a degree of
      reordering matching the size of the newly SACKed range (typically just
      1 packet, which is a NOP, but potentially larger).
      
      This commit fixes this by simply calling tcp_sacktag_one() before the
      TCP_SKB_CB(skb)->seq advancement that can advance our notion of the
      highest SACKed sequence.
      
      Correspondingly, we can simplify the code a little now that
      tcp_shifted_skb() should update the lost_cnt_hint in all cases where
      skb == tp->lost_skb_hint.
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c90d3b3
  9. 27 2月, 2012 1 次提交
  10. 25 2月, 2012 1 次提交
  11. 24 2月, 2012 1 次提交
  12. 22 2月, 2012 2 次提交
  13. 16 2月, 2012 2 次提交
  14. 15 2月, 2012 1 次提交
  15. 14 2月, 2012 1 次提交
  16. 13 2月, 2012 3 次提交
  17. 11 2月, 2012 2 次提交
  18. 09 2月, 2012 1 次提交
    • E
      ipv4: Implement IP_UNICAST_IF socket option. · 76e21053
      Erich E. Hoover 提交于
      The IP_UNICAST_IF feature is needed by the Wine project.  This patch
      implements the feature by setting the outgoing interface in a similar
      fashion to that of IP_MULTICAST_IF.  A separate option is needed to
      handle this feature since the existing options do not provide all of
      the characteristics required by IP_UNICAST_IF, a summary is provided
      below.
      
      SO_BINDTODEVICE:
      * SO_BINDTODEVICE requires administrative privileges, IP_UNICAST_IF
      does not.  From reading some old mailing list articles my
      understanding is that SO_BINDTODEVICE requires administrative
      privileges because it can override the administrator's routing
      settings.
      * The SO_BINDTODEVICE option restricts both outbound and inbound
      traffic, IP_UNICAST_IF only impacts outbound traffic.
      
      IP_PKTINFO:
      * Since IP_PKTINFO and IP_UNICAST_IF are independent options,
      implementing IP_UNICAST_IF with IP_PKTINFO will likely break some
      applications.
      * Implementing IP_UNICAST_IF on top of IP_PKTINFO significantly
      complicates the Wine codebase and reduces the socket performance
      (doing this requires a lot of extra communication between the
      "server" and "user" layers).
      
      bind():
      * bind() does not work on broadcast packets, IP_UNICAST_IF is
      specifically intended to work with broadcast packets.
      * Like SO_BINDTODEVICE, bind() restricts both outbound and inbound
      traffic.
      Signed-off-by: NErich E. Hoover <ehoover@mines.edu>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76e21053
  19. 08 2月, 2012 1 次提交
  20. 05 2月, 2012 1 次提交
  21. 03 2月, 2012 1 次提交
  22. 02 2月, 2012 2 次提交
    • A
      net: Disambiguate kernel message · efcdbf24
      Arun Sharma 提交于
      Some of our machines were reporting:
      
      TCP: too many of orphaned sockets
      
      even when the number of orphaned sockets was well below the
      limit.
      
      We print a different message depending on whether we're out
      of TCP memory or there are too many orphaned sockets.
      
      Also move the check out of line and cleanup the messages
      that were printed.
      Signed-off-by: NArun Sharma <asharma@fb.com>
      Suggested-by: NMohan Srinivasan <mohan@fb.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: David Miller <davem@davemloft.net>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Joe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efcdbf24
    • S
      tcp: md5: RST: getting md5 key from listener · 658ddaaf
      Shawn Lu 提交于
      TCP RST mechanism is broken in TCP md5(RFC2385). When
      connection is gone, md5 key is lost, sending RST
      without md5 hash is deem to ignored by peer. This can
      be a problem since RST help protocal like bgp to fast
      recove from peer crash.
      
      In most case, users of tcp md5, such as bgp and ldp,
      have listener on both sides to accept connection from peer.
      md5 keys for peers are saved in listening socket.
      
      There are two cases in finding md5 key when connection is
      lost:
      1.Passive receive RST: The message is send to well known port,
      tcp will associate it with listner. md5 key is gotten from
      listener.
      
      2.Active receive RST (no sock): The message is send to ative
      side, there is no socket associated with the message. In this
      case, finding listener from source port, then find md5 key from
      listener.
      
      we are not loosing sercuriy here:
      packet is checked with md5 hash. No RST is generated
      if md5 hash doesn't match or no md5 key can be found.
      Signed-off-by: NShawn Lu <shawn.lu@ericsson.com>
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      658ddaaf
  23. 01 2月, 2012 4 次提交