1. 29 8月, 2009 2 次提交
    • D
      ipv6: Update Neighbor Cache when IPv6 RA is received on a router · 31ce8c71
      David Ward 提交于
      When processing a received IPv6 Router Advertisement, the kernel
      creates or updates an IPv6 Neighbor Cache entry for the sender --
      but presently this does not occur if IPv6 forwarding is enabled
      (net.ipv6.conf.*.forwarding = 1), or if IPv6 Router Advertisements
      are not accepted (net.ipv6.conf.*.accept_ra = 0), because in these
      cases processing of the Router Advertisement has already halted.
      
      This patch allows the Neighbor Cache to be updated in these cases,
      while still avoiding any modification to routes or link parameters.
      
      This continues to satisfy RFC 4861, since any entry created in the
      Neighbor Cache as the result of a received Router Advertisement is
      still placed in the STALE state.
      Signed-off-by: NDavid Ward <david.ward@ll.mit.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31ce8c71
    • S
      sit: allow ip fragmentation when using nopmtudisc to fix package loss · 8945a808
      Sascha Hlusiak 提交于
      if tunnel parameters have frag_off set to IP_DF, pmtudisc on the ipv4 link
      will be performed by deriving the mtu from the ipv4 link and setting the
      DF-Flag of the encapsulating IPv4 Header. If fragmentation is needed on the
      way, the IPv4 pmtu gets adjusted, the ipv6 package will be resent eventually,
      using the new and lower mtu and everyone is happy.
      
      If the frag_off parameter is unset, the mtu for the tunnel will be derived
      from the tunnel device or the ipv6 pmtu, which might be higher than the ipv4
      pmtu. In that case we must allow the fragmentation of the IPv4 packet because
      the IPv6 mtu wouldn't 'learn' from the adjusted IPv4 pmtu, resulting in
      frequent icmp_frag_needed and package loss on the IPv6 layer.
      
      This patch allows fragmentation when tunnel was created with parameter
      nopmtudisc, like in ipip/gre tunnels.
      Signed-off-by: NSascha Hlusiak <contact@saschahlusiak.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8945a808
  2. 14 8月, 2009 3 次提交
  3. 06 8月, 2009 1 次提交
  4. 05 8月, 2009 1 次提交
  5. 03 8月, 2009 1 次提交
  6. 31 7月, 2009 1 次提交
    • N
      xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1
      Neil Horman 提交于
      Choose saner defaults for xfrm[4|6] gc_thresh values on init
      
      Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
      (set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
      dynamically at boot time, the static selections can be non-sensical.
      This patch dynamically selects an appropriate gc threshold based on
      the corresponding main routing table size, using the assumption that
      we should in the worst case be able to handle as many connections as
      the routing table can.
      
      For ipv4, the maximum route cache size is 16 * the number of hash
      buckets in the route cache.  Given that xfrm4 starts garbage
      collection at the gc_thresh and prevents new allocations at 2 *
      gc_thresh, we set gc_thresh to half the maximum route cache size.
      
      For ipv6, its a bit trickier.  there is no maximum route cache size,
      but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
      sane to select a simmilar gc_thresh for the xfrm6 code that is half
      the number of hash buckets in the v6 route cache times 16 (like the v4
      code does).
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a33bc5c1
  7. 28 7月, 2009 1 次提交
    • N
      xfrm: export xfrm garbage collector thresholds via sysctl · a44a4a00
      Neil Horman 提交于
      Export garbage collector thresholds for xfrm[4|6]_dst_ops
      
      Had a problem reported to me recently in which a high volume of ipsec
      connections on a system began reporting ENOBUFS for new connections
      eventually.
      
      It seemed that after about 2000 connections we started being unable to
      create more.  A quick look revealed that the xfrm code used a dst_ops
      structure that limited the gc_thresh value to 1024, and always
      dropped route cache entries after 2x the gc_thresh.
      
      It seems the most direct solution is to export the gc_thresh values in
      the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
      that higher volumes of connections can be supported.  This patch has
      been tested and allows the reporter to increase their ipsec connection
      volume successfully.
      Reported-by: NJoe Nall <joe@nall.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      
      ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
      ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
      2 files changed, 36 insertions(+)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a44a4a00
  8. 22 7月, 2009 1 次提交
  9. 20 7月, 2009 2 次提交
  10. 13 7月, 2009 4 次提交
  11. 12 7月, 2009 2 次提交
  12. 07 7月, 2009 1 次提交
  13. 06 7月, 2009 1 次提交
  14. 04 7月, 2009 2 次提交
    • B
      IPv6: preferred lifetime of address not getting updated · a1ed0526
      Brian Haley 提交于
      There's a bug in addrconf_prefix_rcv() where it won't update the
      preferred lifetime of an IPv6 address if the current valid lifetime
      of the address is less than 2 hours (the minimum value in the RA).
      
      For example, If I send a router advertisement with a prefix that
      has valid lifetime = preferred lifetime = 2 hours we'll build
      this address:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7175sec preferred_lft 7175sec
      
      If I then send the same prefix with valid lifetime = preferred
      lifetime = 0 it will be ignored since the minimum valid lifetime
      is 2 hours:
      
      3: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:217:8ff:fe7d:4718/64 scope global dynamic
             valid_lft 7161sec preferred_lft 7161sec
      
      But according to RFC 4862 we should always reset the preferred lifetime
      even if the valid lifetime is invalid, which would cause the address
      to immediately get deprecated.  So with this patch we'd see this:
      
      5: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qlen 1000
          inet6 2001:1890:1109:a20:21f:29ff:fe5a:ef04/64 scope global deprecated dynamic
             valid_lft 7163sec preferred_lft 0sec
      
      The comment winds-up being 5x the size of the code to fix the problem.
      
      Update the preferred lifetime of IPv6 addresses derived from a prefix
      info option in a router advertisement even if the valid lifetime in
      the option is invalid, as specified in RFC 4862 Section 5.5.3e.  Fixes
      an issue where an address will not immediately become deprecated.
      Reported by Jens Rosenboom.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1ed0526
    • W
      xfrm6: fix the proto and ports decode of sctp protocol · 59cae009
      Wei Yongjun 提交于
      The SCTP pushed the skb above the sctp chunk header, so the
      check of pskb_may_pull(skb, nh + offset + 1 - skb->data) in
      _decode_session6() will never return 0 and the ports decode
      of sctp will always fail. (nh + offset + 1 - skb->data < 0)
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59cae009
  15. 27 6月, 2009 2 次提交
  16. 26 6月, 2009 1 次提交
  17. 23 6月, 2009 1 次提交
  18. 18 6月, 2009 1 次提交
  19. 14 6月, 2009 1 次提交
    • T
      PIM-SM: namespace changes · 403dbb97
      Tom Goff 提交于
      IPv4:
        - make PIM register vifs netns local
        - set the netns when a PIM register vif is created
        - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2)
          by adding the protocol handler when multicast routing is initialized
      
      IPv6:
        - make PIM register vifs netns local
        - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2)
          by adding the protocol handler when multicast routing is initialized
      Signed-off-by: NTom Goff <thomas.goff@boeing.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      403dbb97
  20. 13 6月, 2009 2 次提交
  21. 11 6月, 2009 1 次提交
    • E
      net: No more expensive sock_hold()/sock_put() on each tx · 2b85a34e
      Eric Dumazet 提交于
      One of the problem with sock memory accounting is it uses
      a pair of sock_hold()/sock_put() for each transmitted packet.
      
      This slows down bidirectional flows because the receive path
      also needs to take a refcount on socket and might use a different
      cpu than transmit path or transmit completion path. So these
      two atomic operations also trigger cache line bounces.
      
      We can see this in tx or tx/rx workloads (media gateways for example),
      where sock_wfree() can be in top five functions in profiles.
      
      We use this sock_hold()/sock_put() so that sock freeing
      is delayed until all tx packets are completed.
      
      As we also update sk_wmem_alloc, we could offset sk_wmem_alloc
      by one unit at init time, until sk_free() is called.
      Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc)
      to decrement initial offset and atomicaly check if any packets
      are in flight.
      
      skb_set_owner_w() doesnt call sock_hold() anymore
      
      sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc
      reached 0 to perform the final freeing.
      
      Drawback is that a skb->truesize error could lead to unfreeable sockets, or
      even worse, prematurely calling __sk_free() on a live socket.
      
      Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s
      on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt
      contention point. 5 % speedup on a UDP transmit workload (depends
      on number of flows), lowering TX completion cpu usage.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2b85a34e
  22. 09 6月, 2009 2 次提交
  23. 08 6月, 2009 1 次提交
    • J
      netfilter: nf_ct_icmp: keep the ICMP ct entries longer · f87fb666
      Jan Kasprzak 提交于
      Current conntrack code kills the ICMP conntrack entry as soon as
      the first reply is received. This is incorrect, as we then see only
      the first ICMP echo reply out of several possible duplicates as
      ESTABLISHED, while the rest will be INVALID. Also this unnecessarily
      increases the conntrackd traffic on H-A firewalls.
      
      Make all the ICMP conntrack entries (including the replied ones)
      last for the default of nf_conntrack_icmp{,v6}_timeout seconds.
      Signed-off-by: NJan "Yenya" Kasprzak <kas@fi.muni.cz>
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      f87fb666
  24. 04 6月, 2009 1 次提交
  25. 03 6月, 2009 2 次提交
    • E
      net: skb->dst accessors · adf30907
      Eric Dumazet 提交于
      Define three accessors to get/set dst attached to a skb
      
      struct dst_entry *skb_dst(const struct sk_buff *skb)
      
      void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst)
      
      void skb_dst_drop(struct sk_buff *skb)
      This one should replace occurrences of :
      dst_release(skb->dst)
      skb->dst = NULL;
      
      Delete skb->dst field
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adf30907
    • P
      netfilter: conntrack: simplify event caching system · 17e6e4ea
      Pablo Neira Ayuso 提交于
      This patch simplifies the conntrack event caching system by removing
      several events:
      
       * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted
         since the have no clients.
       * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter
         days.
       * IPCT_REFRESH which is not of any use since we always include the
         timeout in the messages.
      
      After this patch, the existing events are:
      
       * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify
       addition and deletion of entries.
       * IPCT_STATUS, that notes that the status bits have changes,
       eg. IPS_SEEN_REPLY and IPS_ASSURED.
       * IPCT_PROTOINFO, that reports that internal protocol information has
       changed, eg. the TCP, DCCP and SCTP protocol state.
       * IPCT_HELPER, that a helper has been assigned or unassigned to this
       entry.
       * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this
       covers the case when a mark is set to zero.
       * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence
       adjustment.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      17e6e4ea
  26. 02 6月, 2009 1 次提交
  27. 01 6月, 2009 1 次提交