1. 30 5月, 2009 1 次提交
    • I
      tcp: fix loop in ofo handling code and reduce its complexity · 2df9001e
      Ilpo Järvinen 提交于
      Somewhat luckily, I was looking into these parts with very fine
      comb because I've made somewhat similar changes on the same
      area (conflicts that arose weren't that lucky though). The loop
      was very much overengineered recently in commit 91521944
      (tcp: Use SKB queue and list helpers instead of doing it
      by-hand), while it basically just wants to know if there are
      skbs after 'skb'.
      
      Also it got broken because skb1 = skb->next got translated into
      skb1 = skb1->next (though abstracted) improperly. Note that
      'skb1' is pointing to previous sk_buff than skb or NULL if at
      head. Two things went wrong:
      - We'll kfree 'skb' on the first iteration instead of the
        skbuff following 'skb' (it would require required SACK reneging
        to recover I think).
      - The list head case where 'skb1' is NULL is checked too early
        and the loop won't execute whereas it previously did.
      
      Conclusion, mostly revert the recent changes which makes the
      cset very messy looking but using proper accessor in the
      previous-like version.
      
      The effective changes against the original can be viewed with:
        git-diff 91521944^ \
      		net/ipv4/tcp_input.c | sed -n -e '57,70 p'
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2df9001e
  2. 29 5月, 2009 3 次提交
  3. 27 5月, 2009 6 次提交
  4. 22 5月, 2009 1 次提交
  5. 21 5月, 2009 3 次提交
    • R
      net: Remove unused parameter from fill method in fib_rules_ops. · 04af8cf6
      Rami Rosen 提交于
      The netlink message header (struct nlmsghdr) is an unused parameter in
      fill method of fib_rules_ops struct.  This patch removes this
      parameter from this method and fixes the places where this method is
      called.
      
      (include/net/fib_rules.h)
      Signed-off-by: NRami Rosen <ramirose@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04af8cf6
    • E
      net: fix rtable leak in net/ipv4/route.c · 1ddbcb00
      Eric Dumazet 提交于
      Alexander V. Lukyanov found a regression in 2.6.29 and made a complete
      analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339
      Quoted here because its a perfect one :
      
      begin_of_quotation
       2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the
       patch has at least one critical flaw, and another problem.
      
       rt_intern_hash calculates rthi pointer, which is later used for new entry
       insertion. The same loop calculates cand pointer which is used to clean the
       list. If the pointers are the same, rtable leak occurs, as first the cand is
       removed then the new entry is appended to it.
      
       This leak leads to unregister_netdevice problem (usage count > 0).
      
       Another problem of the patch is that it tries to insert the entries in certain
       order, to facilitate counting of entries distinct by all but QoS parameters.
       Unfortunately, referencing an existing rtable entry moves it to list beginning,
       to speed up further lookups, so the carefully built order is destroyed.
      
       For the first problem the simplest patch it to set rthi=0 when rthi==cand, but
       it will also destroy the ordering.
      end_of_quotation
      
      Problematic commit is 1080d709
      (net: implement emergency route cache rebulds when gc_elasticity is exceeded)
      
      Trying to keep dst_entries ordered is too complex and breaks the fact that
      order should depend on the frequency of use for garbage collection.
      
      A possible fix is to make rt_intern_hash() simpler, and only makes
      rt_check_expire() a litle bit smarter, being able to cope with an arbitrary
      entries order. The added loop is running on cache hot data, while cpu
      is prefetching next object, so should be unnoticied.
      Reported-and-analyzed-by: NAlexander V. Lukyanov <lav@yar.ru>
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ddbcb00
    • E
      net: fix length computation in rt_check_expire() · cf8da764
      Eric Dumazet 提交于
      rt_check_expire() computes average and standard deviation of chain lengths,
      but not correclty reset length to 0 at beginning of each chain.
      This probably gives overflows for sum2 (and sum) on loaded machines instead
      of meaningful results.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf8da764
  6. 20 5月, 2009 1 次提交
    • C
      ipv4: teach ipconfig about the MTU option in DHCP · 9643f455
      Chris Friesen 提交于
      The DHCP spec allows the server to specify the MTU.  This can be useful
      for netbooting with UDP-based NFS-root on a network using jumbo frames.
      This patch allows the kernel IP autoconfiguration to handle this option
      correctly.
      
      It would be possible to use initramfs and add a script to set the MTU,
      but that seems like a complicated solution if no initramfs is otherwise
      necessary, and would bloat the kernel image more than this code would.
      
      This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen.
      Signed-off-by: NChris Friesen <cfriesen@nortel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9643f455
  7. 19 5月, 2009 5 次提交
  8. 18 5月, 2009 2 次提交
  9. 09 5月, 2009 1 次提交
  10. 07 5月, 2009 1 次提交
  11. 06 5月, 2009 1 次提交
  12. 05 5月, 2009 2 次提交
    • S
      tcp: Fix tcp_prequeue() to get correct rto_min value · 0c266898
      Satoru SATOH 提交于
      tcp_prequeue() refers to the constant value (TCP_RTO_MIN) regardless of
      the actual value might be tuned. The following patches fix this and make
      tcp_prequeue get the actual value returns from tcp_rto_min().
      Signed-off-by: NSatoru SATOH <satoru.satoh@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c266898
    • I
      tcp: extend ECN sysctl to allow server-side only ECN · 255cac91
      Ilpo Järvinen 提交于
      This should be very safe compared with full enabled, so I see
      no reason why it shouldn't be done right away. As ECN can only
      be negotiated if the SYN sending party is also supporting it,
      somebody in the loop probably knows what he/she is doing. If
      SYN does not ask for ECN, the server side SYN-ACK is identical
      to what it is without ECN. Thus it's quite safe.
      
      The chosen value is safe w.r.t to existing configs which
      choose to currently set manually either 0 or 1 but
      silently upgrades those who have not explicitly requested
      ECN off.
      
      Whether to just enable both sides comes up time to time but
      unless that gets done now we can at least make the servers
      aware of ECN already. As there are some known problems to occur
      if ECN is enabled, it's currently questionable whether there's
      any real gain from enabling clients as servers mostly won't
      support it anyway (so we'd hit just the negative sides). After
      enabling the servers and getting that deployed, the client end
      enable really has some potential gain too.
      Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      255cac91
  13. 29 4月, 2009 1 次提交
  14. 28 4月, 2009 1 次提交
  15. 27 4月, 2009 3 次提交
  16. 20 4月, 2009 2 次提交
  17. 17 4月, 2009 3 次提交
    • E
      [PATCH] net: remove superfluous call to synchronize_net() · 573636cb
      Eric Dumazet 提交于
      inet_register_protosw() function is responsible for adding a new
      inet protocol into a global table (inetsw[]) that is used with RCU rules.
      
      As soon as the store of the pointer is done, other cpus might see
      this new protocol in inetsw[], so we have to make sure new protocol
      is ready for use. All pending memory updates should thus be committed
      to memory before setting the pointer.
      This is correctly done using rcu_assign_pointer()
      
      synchronize_net() is typically used at unregister time, after
      unsetting the pointer, to make sure no other cpu is still using
      the object we want to dismantle. Using it at register time
      is only adding an artificial delay that could hide a real bug,
      and this bug could popup if/when synchronize_rcu() can proceed
      faster than now.
      
      This saves about 13 ms on boot time on a HZ=1000 8 cpus machine  ;) 
      (4 calls to inet_register_protosw(), and about 3200 us per call)
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      573636cb
    • H
      gro: Fix use after free in tcp_gro_receive · a0a69a01
      Herbert Xu 提交于
      After calling skb_gro_receive skb->len can no longer be relied
      on since if the skb was merged using frags, then its pages will
      have been removed and the length reduced.
      
      This caused tcp_gro_receive to prematurely end merging which
      resulted in suboptimal performance with ixgbe.
      
      The fix is to store skb->len on the stack.
      Reported-by: NMark Wagner <mwagner@redhat.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0a69a01
    • P
      netfilter: nf_nat: add support for persistent mappings · 98d500d6
      Patrick McHardy 提交于
      The removal of the SAME target accidentally removed one feature that is
      not available from the normal NAT targets so far, having multi-range
      mappings that use the same mapping for each connection from a single
      client. The current behaviour is to choose the address from the range
      based on source and destination IP, which breaks when communicating
      with sites having multiple addresses that require all connections to
      originate from the same IP address.
      
      Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the
      destination address is taken into account for selecting addresses.
      
      http://bugzilla.kernel.org/show_bug.cgi?id=12954Signed-off-by: NPatrick McHardy <kaber@trash.net>
      98d500d6
  18. 14 4月, 2009 1 次提交
  19. 11 4月, 2009 1 次提交
    • V
      ipv6: Fix NULL pointer dereference with time-wait sockets · 499923c7
      Vlad Yasevich 提交于
      Commit b2f5e7cd
      (ipv6: Fix conflict resolutions during ipv6 binding)
      introduced a regression where time-wait sockets were
      not treated correctly.  This resulted in the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000062
      IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70
      ...
      Call Trace:
      [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6]
      [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6]
      [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400
      [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6]
      [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0
      [<ffffffff8056ed49>] sys_bind+0x89/0x100
      [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c
      [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b
      Tested-by: NBrian Haley <brian.haley@hp.com>
      Tested-by: NEd Tomlinson <edt@aei.ca>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499923c7
  20. 03 4月, 2009 1 次提交