1. 03 9月, 2009 2 次提交
    • W
      tcp: replace hard coded GFP_KERNEL with sk_allocation · aa133076
      Wu Fengguang 提交于
      This fixed a lockdep warning which appeared when doing stress
      memory tests over NFS:
      
      	inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      
      	page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock
      
      	mount_root => nfs_root_data => tcp_close => lock sk_lock =>
      			tcp_send_fin => alloc_skb_fclone => page reclaim
      
      David raised a concern that if the allocation fails in tcp_send_fin(), and it's
      GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting
      for the allocation to succeed.
      
      But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks
      weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could
      loop endlessly under memory pressure.
      
      CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      CC: David S. Miller <davem@davemloft.net>
      CC: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa133076
    • E
      ip: Report qdisc packet drops · 6ce9e7b5
      Eric Dumazet 提交于
      Christoph Lameter pointed out that packet drops at qdisc level where not
      accounted in SNMP counters. Only if application sets IP_RECVERR, drops
      are reported to user (-ENOBUFS errors) and SNMP counters updated.
      
      IP_RECVERR is used to enable extended reliable error message passing,
      but these are not needed to update system wide SNMP stats.
      
      This patch changes things a bit to allow SNMP counters to be updated,
      regardless of IP_RECVERR being set or not on the socket.
      
      Example after an UDP tx flood
      # netstat -s 
      ...
      IP:
          1487048 outgoing packets dropped
      ...
      Udp:
      ...
          SndbufErrors: 1487048
      
      
      send() syscalls, do however still return an OK status, to not
      break applications.
      
      Note : send() manual page explicitly says for -ENOBUFS error :
      
       "The output queue for a network interface was full.
        This generally indicates that the interface has stopped sending,
        but may be caused by transient congestion.
        (Normally, this does not occur in Linux. Packets are just silently
        dropped when a device queue overflows.) "
      
      This is not true for IP_RECVERR enabled sockets : a send() syscall
      that hit a qdisc drop returns an ENOBUFS error.
      
      Many thanks to Christoph, David, and last but not least, Alexey !
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ce9e7b5
  2. 02 9月, 2009 7 次提交
  3. 01 9月, 2009 1 次提交
  4. 29 8月, 2009 2 次提交
    • D
      ipv6: Update Neighbor Cache when IPv6 RA is received on a router · 31ce8c71
      David Ward 提交于
      When processing a received IPv6 Router Advertisement, the kernel
      creates or updates an IPv6 Neighbor Cache entry for the sender --
      but presently this does not occur if IPv6 forwarding is enabled
      (net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
      are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
      cases processing of the Router Advertisement has already halted.
      
      This patch allows the Neighbor Cache to be updated in these cases,
      while still avoiding any modification to routes or link parameters.
      
      This continues to satisfy RFC 4861, since any entry created in the
      Neighbor Cache as the result of a received Router Advertisement is
      still placed in the STALE state.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31ce8c71
    • S
      sit: allow ip fragmentation when using nopmtudisc to fix package loss · 8945a808
      Sascha Hlusiak 提交于
      if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
      will be performed by deriving the mtu from the ipv4 link and setting the
      DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
      way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
      using the new and lower mtu and everyone is happy.
      
      If the frag_off parameter is unset, the mtu for the tunnel will be derived
      from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
      pmtu. In that case we must allow the fragmentation of the IPv4 packet because
      the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
      frequent icmp_frag_needed and package loss on the IPv6 layer.
      
      This patch allows fragmentation when tunnel was created with parameter
      nopmtudisc, like in ipip/gre tunnels.
      Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8945a808
  5. 24 8月, 2009 1 次提交
    • B
      ipv6: Fix commit 63d9950b (ipv6: Make... · ca6982b8
      Bruno Prémont 提交于
      ipv6: Fix commit 63d9950b (ipv6: Make v4-mapped bindings consistent with IPv4)
      
      Commit 63d9950b
        (ipv6: Make v4-mapped bindings consistent with IPv4)
      changes behavior of inet6_bind() for v4-mapped addresses so it should
      behave the same way as inet_bind().
      
      During this change setting of err to -EADDRNOTAVAIL got lost:
      
      af_inet.c:469 inet_bind()
      	err = -EADDRNOTAVAIL;
      	if (!sysctl_ip_nonlocal_bind &&
      	    !(inet->freebind || inet->transparent) &&
      	    addr->sin_addr.s_addr != htonl(INADDR_ANY) &&
      	    chk_addr_ret != RTN_LOCAL &&
      	    chk_addr_ret != RTN_MULTICAST &&
      	    chk_addr_ret != RTN_BROADCAST)
      		goto out;
      
      
      af_inet6.c:463 inet6_bind()
      	if (addr_type == IPV6_ADDR_MAPPED) {
      		int chk_addr_ret;
      
      		/* Binding to v4-mapped address on a v6-only socket                         
      		 * makes no sense                                                           
      		 */
      		if (np->ipv6only) {
      			err = -EINVAL;
      			goto out; 
      		}
      
      		/* Reproduce AF_INET checks to make the bindings consitant */               
      		v4addr = addr->sin6_addr.s6_addr32[3];                                      
      		chk_addr_ret = inet_addr_type(net, v4addr);                                 
      		if (!sysctl_ip_nonlocal_bind &&                                             
      		    !(inet->freebind || inet->transparent) &&                               
      		    v4addr != htonl(INADDR_ANY) &&
      		    chk_addr_ret != RTN_LOCAL &&                                            
      		    chk_addr_ret != RTN_MULTICAST &&                                        
      		    chk_addr_ret != RTN_BROADCAST)
      			goto out;
      	} else {
      
      
      Signed-off-by Bruno Prémont <bonbons@linux-vserver.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca6982b8
  6. 14 8月, 2009 3 次提交
  7. 06 8月, 2009 1 次提交
  8. 05 8月, 2009 1 次提交
  9. 03 8月, 2009 1 次提交
  10. 31 7月, 2009 1 次提交
    • N
      xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1
      Neil Horman 提交于
      Choose saner defaults for xfrm[4|6] gc_thresh values on init
      
      Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
      (set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
      dynamically at boot time, the static selections can be non-sensical.
      This patch dynamically selects an appropriate gc threshold based on
      the corresponding main routing table size, using the assumption that
      we should in the worst case be able to handle as many connections as
      the routing table can.
      
      For ipv4, the maximum route cache size is 16 * the number of hash
      buckets in the route cache.  Given that xfrm4 starts garbage
      collection at the gc_thresh and prevents new allocations at 2 *
      gc_thresh, we set gc_thresh to half the maximum route cache size.
      
      For ipv6, its a bit trickier.  there is no maximum route cache size,
      but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
      sane to select a simmilar gc_thresh for the xfrm6 code that is half
      the number of hash buckets in the v6 route cache times 16 (like the v4
      code does).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a33bc5c1
  11. 28 7月, 2009 1 次提交
    • N
      xfrm: export xfrm garbage collector thresholds via sysctl · a44a4a00
      Neil Horman 提交于
      Export garbage collector thresholds for xfrm[4|6]_dst_ops
      
      Had a problem reported to me recently in which a high volume of ipsec
      connections on a system began reporting ENOBUFS for new connections
      eventually.
      
      It seemed that after about 2000 connections we started being unable to
      create more.  A quick look revealed that the xfrm code used a dst_ops
      structure that limited the gc_thresh value to 1024, and always
      dropped route cache entries after 2x the gc_thresh.
      
      It seems the most direct solution is to export the gc_thresh values in
      the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
      that higher volumes of connections can be supported.  This patch has
      been tested and allows the reporter to increase their ipsec connection
      volume successfully.
      Reported-by: NJoe Nall <joe@nall.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
      ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
      ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
      2 files changed, 36 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a44a4a00
  12. 22 7月, 2009 1 次提交
  13. 20 7月, 2009 2 次提交
  14. 13 7月, 2009 4 次提交
  15. 12 7月, 2009 2 次提交
  16. 07 7月, 2009 1 次提交
  17. 06 7月, 2009 1 次提交
  18. 04 7月, 2009 2 次提交
    • B
      IPv6: preferred lifetime of address not getting updated · a1ed0526
      Brian Haley 提交于
      There's a bug in addrconf_prefix_rcv() where it won't update the
      preferred lifetime of an IPv6 address if the current valid lifetime
      of the address is less than 2 hours (the minimum value in the RA).
      
      For example, If I send a router advertisement with a prefix that
      has valid lifetime = preferred lifetime = 2 hours we'll build
      this address:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7175sec preferred_lft 7175sec
      
      If I then send the same prefix with valid lifetime = preferred
      lifetime = 0 it will be ignored since the minimum valid lifetime
      is 2 hours:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7161sec preferred_lft 7161sec
      
      But according to RFC 4862 we should always reset the preferred lifetime
      even if the valid lifetime is invalid, which would cause the address
      to immediately get deprecated.  So with this patch we'd see this:
      
      5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
             valid_lft 7163sec preferred_lft 0sec
      
      The comment winds-up being 5x the size of the code to fix the problem.
      
      Update the preferred lifetime of IPv6 addresses derived from a prefix
      info option in a router advertisement even if the valid lifetime in
      the option is invalid, as specified in RFC 4862 Section 5.5.3e.  Fixes
      an issue where an address will not immediately become deprecated.
      Reported by Jens Rosenboom.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ed0526
    • W
      xfrm6: fix the proto and ports decode of sctp protocol · 59cae009
      Wei Yongjun 提交于
      The SCTP pushed the skb above the sctp chunk header, so the
      check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
      _decode_session6() will never return 0 and the ports decode
      of sctp will always fail. (nh + offset + 1 - skb->data < 0)
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59cae009
  19. 27 6月, 2009 2 次提交
  20. 26 6月, 2009 1 次提交
  21. 23 6月, 2009 1 次提交
  22. 18 6月, 2009 1 次提交
  23. 14 6月, 2009 1 次提交
    • T
      PIM-SM: namespace changes · 403dbb97
      Tom Goff 提交于
      IPv4:
        - make PIM register vifs netns local
        - set the netns when a PIM register vif is created
        - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
          by adding the protocol handler when multicast routing is initialized
      
      IPv6:
        - make PIM register vifs netns local
        - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
          by adding the protocol handler when multicast routing is initialized
      Signed-off-by: NTom Goff <thomas.goff@boeing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      403dbb97