1. 08 4月, 2008 2 次提交
    • I
      [TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb · c137f3dd
      Ilpo Järvinen 提交于
      Fixes a long-standing bug which makes NewReno recovery crippled.
      With GSO the whole head skb was marked as LOST which is in
      violation of NewReno procedure that only wants to mark one packet
      and ended up breaking our TCP code by causing counter overflow
      because our code was built on top of assumption about valid
      NewReno procedure. This manifested as triggering a WARN_ON for
      the overflow in a number of places.
      
      It seems relatively safe alternative to just do nothing if
      tcp_fragment fails due to oom because another duplicate ACK is
      likely to be received soon and the fragmentation will be retried.
      
      Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was
      lucky enough to be able to reproduce this so that the warning
      for the overflow was hit. It's not as easy task as it seems even
      if this bug happens quite often because the amount of outstanding
      data is pretty significant for the mismarkings to lead to an
      overflow.
      
      Because it's very late in 2.6.25-rc cycle (if this even makes in
      time), I didn't want to touch anything with SACK enabled here.
      Fragmenting might be useful for it as well but it's more or less
      a policy decision rather than mandatory fix. Thus there's no need
      to rush and we can postpone considering tcp_fragment with SACK
      for 2.6.26.
      
      In 2.6.24 and earlier, this very same bug existed but the effect
      is slightly different because of a small changes in the if
      conditions that fit to the patch's context. With them nothing
      got lost marker and thus no retransmissions happened.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c137f3dd
    • I
      [TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack · 1b69d745
      Ilpo Järvinen 提交于
      The fast retransmission can be forced locally to the rfc3517
      branch in tcp_update_scoreboard instead of making such fragile
      constructs deeper in tcp_mark_head_lost.
      
      This is necessary for the next patch which must not have
      loopholes for cnt > packets check. As one can notice,
      readability got some improvements too because of this :-).
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b69d745
  2. 04 4月, 2008 1 次提交
    • H
      [ICMP]: Ensure that ICMP relookup maintains status quo · af268182
      Herbert Xu 提交于
      The ICMP relookup path is only meant to modify behaviour when
      appropriate IPsec policies are in place and marked as requiring
      relookups.  It is certainly not meant to modify behaviour when
      IPsec policies don't exist at all.
      
      However, due to an oversight on the error paths existing behaviour
      may in fact change should one of the relookup steps fail.
      
      This patch corrects this by redirecting all errors on relookup
      failures to the previous code path.  That is, if the initial
      xfrm_lookup let the packet pass, we will stand by that decision
      should the relookup fail due to an error.
      
      This should be safe from a security point-of-view because compliant
      systems must install a default deny policy so the packet would'nt
      have passed in that case.
      
      Many thanks to Julian Anastasov for pointing out this error.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af268182
  3. 01 4月, 2008 1 次提交
  4. 29 3月, 2008 2 次提交
  5. 28 3月, 2008 2 次提交
  6. 27 3月, 2008 1 次提交
    • H
      [IPSEC]: Fix BEET output · 732c8bd5
      Herbert Xu 提交于
      The IPv6 BEET output function is incorrectly including the inner
      header in the payload to be protected.  This causes a crash as
      the packet doesn't actually have that many bytes for a second
      header.
      
      The IPv4 BEET output on the other hand is broken when it comes
      to handling an inner IPv6 header since it always assumes an
      inner IPv4 header.
      
      This patch fixes both by making sure that neither BEET output
      function touches the inner header at all.  All access is now
      done through the protocol-independent cb structure.  Two new
      attributes are added to make this work, the IP header length
      and the IPv4 option length.  They're filled in by the inner
      mode's output function.
      
      Thanks to Joakim Koskela for finding this problem.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      732c8bd5
  7. 26 3月, 2008 1 次提交
  8. 25 3月, 2008 1 次提交
  9. 23 3月, 2008 2 次提交
    • S
      [IPV4] fib_trie: fix warning from rcu_assign_poinger · 6440cc9e
      Stephen Hemminger 提交于
      This gets rid of a warning caused by the test in rcu_assign_pointer.
      I tried to fix rcu_assign_pointer, but that devolved into a long set
      of discussions about doing it right that came to no real solution.
      Since the test in rcu_assign_pointer for constant NULL would never
      succeed in fib_trie, just open code instead.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6440cc9e
    • H
      [TCP]: Let skbs grow over a page on fast peers · 69d15067
      Herbert Xu 提交于
      While testing the virtio-net driver on KVM with TSO I noticed
      that TSO performance with a 1500 MTU is significantly worse
      compared to the performance of non-TSO with a 16436 MTU.  The
      packet dump shows that most of the packets sent are smaller
      than a page.
      
      Looking at the code this actually is quite obvious as it always
      stop extending the packet if it's the first packet yet to be
      sent and if it's larger than the MSS.  Since each extension is
      bound by the page size, this means that (given a 1500 MTU) we're
      very unlikely to construct packets greater than a page, provided
      that the receiver and the path is fast enough so that packets can
      always be sent immediately.
      
      The fix is also quite obvious.  The push calls inside the loop
      is just an optimisation so that we don't end up doing all the
      sending at the end of the loop.  Therefore there is no specific
      reason why it has to do so at MSS boundaries.  For TSO, the
      most natural extension of this optimisation is to do the pushing
      once the skb exceeds the TSO size goal.
      
      This is what the patch does and testing with KVM shows that the
      TSO performance with a 1500 MTU easily surpasses that of a 16436
      MTU and indeed the packet sizes sent are generally larger than
      16436.
      
      I don't see any obvious downsides for slower peers or connections,
      but it would be prudent to test this extensively to ensure that
      those cases don't regress.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      69d15067
  10. 22 3月, 2008 1 次提交
  11. 21 3月, 2008 2 次提交
    • P
      [TCP]: Fix shrinking windows with window scaling · 607bfbf2
      Patrick McHardy 提交于
      When selecting a new window, tcp_select_window() tries not to shrink
      the offered window by using the maximum of the remaining offered window
      size and the newly calculated window size. The newly calculated window
      size is always a multiple of the window scaling factor, the remaining
      window size however might not be since it depends on rcv_wup/rcv_nxt.
      This means we're effectively shrinking the window when scaling it down.
      
      
      The dump below shows the problem (scaling factor 2^7):
      
      - Window size of 557 (71296) is advertised, up to 3111907257:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557 <...>
      
      - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
        below the last end:
      
      IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514 <...>
      
      The number 40 results from downscaling the remaining window:
      
      3111907257 - 3111841425 = 65832
      65832 / 2^7 = 514
      65832 % 2^7 = 40
      
      If the sender uses up the entire window before it is shrunk, this can have
      chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
      will notice that the window has been shrunk since tcp_wnd_end() is before
      tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
      This will fail the receivers checks in tcp_sequence() however since it
      is before it's tp->rcv_wup, making it respond with a dupack.
      
      If both sides are in this condition, this leads to a constant flood of
      ACKs until the connection times out.
      
      Make sure the window is never shrunk by aligning the remaining window to
      the window scaling factor.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      607bfbf2
    • D
      [NETFILTER]: ipt_recent: sanity check hit count · d0ebf133
      Daniel Hokka Zakrisson 提交于
      If a rule using ipt_recent is created with a hit count greater than
      ip_pkt_list_tot, the rule will never match as it cannot keep track
      of enough timestamps. This patch makes ipt_recent refuse to create such
      rules.
      
      With ip_pkt_list_tot's default value of 20, the following can be used
      to reproduce the problem.
      
      nc -u -l 0.0.0.0 1234 &
      for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done
      
      This limits it to 20 packets:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 20 --name test --rsource -j DROP
      
      While this is unlimited:
      iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
               --rsource
      iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
               60 --hitcount 21 --name test --rsource -j DROP
      
      With the patch the second rule-set will throw an EINVAL.
      Reported-by: NSean Kennedy <skennedy@vcn.com>
      Signed-off-by: NDaniel Hokka Zakrisson <daniel@hozac.com>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0ebf133
  12. 18 3月, 2008 2 次提交
  13. 12 3月, 2008 1 次提交
  14. 05 3月, 2008 2 次提交
    • S
      [IPCONFIG]: The kernel gets no IP from some DHCP servers · dea75bdf
      Stephen Hemminger 提交于
      From: Stephen Hemminger <shemminger@linux-foundation.org>
      
      Based upon a patch by Marcel Wappler:
       
         This patch fixes a DHCP issue of the kernel: some DHCP servers
         (i.e.  in the Linksys WRT54Gv5) are very strict about the contents
         of the DHCPDISCOVER packet they receive from clients.
       
         Table 5 in RFC2131 page 36 requests the fields 'ciaddr' and
         'siaddr' MUST be set to '0'.  These DHCP servers ignore Linux
         kernel's DHCP discovery packets with these two fields set to
         '255.255.255.255' (in contrast to popular DHCP clients, such as
         'dhclient' or 'udhcpc').  This leads to a not booting system.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dea75bdf
    • H
      [ESP]: Add select on AUTHENC · ed58dd41
      Herbert Xu 提交于
      Now the ESP uses the AEAD interface even for algorithms which are
      not combined mode, we need to select CONFIG_CRYPTO_AUTHENC as
      otherwise only combined mode algorithms will work.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed58dd41
  15. 04 3月, 2008 1 次提交
  16. 29 2月, 2008 3 次提交
  17. 27 2月, 2008 2 次提交
    • P
      [INET]: Don't create tunnels with '%' in name. · b37d428b
      Pavel Emelyanov 提交于
      Four tunnel drivers (ip_gre, ipip, ip6_tunnel and sit) can receive a
      pre-defined name for a device from the userspace.  Since these drivers
      call the register_netdevice() (rtnl_lock, is held), which does _not_
      generate the device's name, this name may contain a '%' character.
      
      Not sure how bad is this to have a device with a '%' in its name, but
      all the other places either use the register_netdev(), which call the
      dev_alloc_name(), or explicitly call the dev_alloc_name() before
      registering, i.e. do not allow for such names.
      
      This had to be prior to the commit 34cc7b, but I forgot to number the
      patches and this one got lost, sorry.
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b37d428b
    • B
      [IPV4]: Reset scope when changing address · 148f9729
      Bjorn Mork 提交于
      This bug did bite at least one user, who did have to resort to rebooting
      the system after an "ifconfig eth0 127.0.0.1" typo.
      
      Deleting the address and adding a new is a less intrusive workaround.
      But I still beleive this is a bug that should be fixed.  Some way or
      another.
      
      Another possibility would be to remove the scope mangling based on
      address.  This will always be incomplete (are 127/8 the only address
      space with host scope requirements?)
      
      We set the scope to RT_SCOPE_HOST if an IPv4 interface is configured
      with a loopback address (127/8).  The scope is never reset, and will
      remain set to RT_SCOPE_HOST after changing the address. This patch
      resets the scope if the address is changed again, to restore normal
      functionality.
      Signed-off-by: NBjorn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      148f9729
  18. 24 2月, 2008 1 次提交
  19. 20 2月, 2008 3 次提交
  20. 18 2月, 2008 3 次提交
  21. 14 2月, 2008 2 次提交
  22. 13 2月, 2008 4 次提交