1. 10 11月, 2005 3 次提交
    • Y
      [NETFILTER]: Add nf_conntrack subsystem. · 9fb9cbb1
      Yasuyuki Kozakai 提交于
      The existing connection tracking subsystem in netfilter can only
      handle ipv4.  There were basically two choices present to add
      connection tracking support for ipv6.  We could either duplicate all
      of the ipv4 connection tracking code into an ipv6 counterpart, or (the
      choice taken by these patches) we could design a generic layer that
      could handle both ipv4 and ipv6 and thus requiring only one sub-protocol
      (TCP, UDP, etc.) connection tracking helper module to be written.
      
      In fact nf_conntrack is capable of working with any layer 3
      protocol.
      
      The existing ipv4 specific conntrack code could also not deal
      with the pecularities of doing connection tracking on ipv6,
      which is also cured here.  For example, these issues include:
      
      1) ICMPv6 handling, which is used for neighbour discovery in
         ipv6 thus some messages such as these should not participate
         in connection tracking since effectively they are like ARP
         messages
      
      2) fragmentation must be handled differently in ipv6, because
         the simplistic "defrag, connection track and NAT, refrag"
         (which the existing ipv4 connection tracking does) approach simply
         isn't feasible in ipv6
      
      3) ipv6 extension header parsing must occur at the correct spots
         before and after connection tracking decisions, and there were
         no provisions for this in the existing connection tracking
         design
      
      4) ipv6 has no need for stateful NAT
      
      The ipv4 specific conntrack layer is kept around, until all of
      the ipv4 specific conntrack helpers are ported over to nf_conntrack
      and it is feature complete.  Once that occurs, the old conntrack
      stuff will get placed into the feature-removal-schedule and we will
      fully kill it off 6 months later.
      Signed-off-by: NYasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
      Signed-off-by: NHarald Welte <laforge@netfilter.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      9fb9cbb1
    • K
      [IPV6]: ip6ip6_lock is not unlocked in error path. · 9f0ede52
      Ken-ichirou MATSUZAWA 提交于
      From: Ken-ichirou MATSUZAWA <chamas@h4.dion.ne.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f0ede52
    • P
      [IPV6]: Fix fallout from CONFIG_IPV6_PRIVACY · 44fd0261
      Peter Chubb 提交于
      Trying to build today's 2.6.14+git snapshot gives undefined references
      to use_tempaddr
      
      Looks like an ifdef got left out.
      Signed-off-by: NPeter Chubb <peterc@gelato.unsw.edu.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44fd0261
  2. 09 11月, 2005 4 次提交
  3. 06 11月, 2005 1 次提交
  4. 03 11月, 2005 2 次提交
  5. 01 11月, 2005 3 次提交
  6. 30 10月, 2005 1 次提交
  7. 29 10月, 2005 2 次提交
    • Y
      [MCAST] IPv6: Fix algorithm to compute Querier's Query Interval · f12baeab
      Yan Zheng 提交于
      5.1.3.  Maximum Response Code
      
         The Maximum Response Code field specifies the maximum time allowed
         before sending a responding Report.  The actual time allowed, called
         the Maximum Response Delay, is represented in units of milliseconds,
         and is derived from the Maximum Response Code as follows:
      
         If Maximum Response Code < 32768,
            Maximum Response Delay = Maximum Response Code
      
         If Maximum Response Code >=32768, Maximum Response Code represents a
         floating-point value as follows:
      
             0 1 2 3 4 5 6 7 8 9 A B C D E F
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            |1| exp |          mant         |
            +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      
         Maximum Response Delay = (mant | 0x1000) << (exp+3)
      
      
      5.1.9.  QQIC (Querier's Query Interval Code)
      
         The Querier's Query Interval Code field specifies the [Query
         Interval] used by the Querier.  The actual interval, called the
         Querier's Query Interval (QQI), is represented in units of seconds,
         and is derived from the Querier's Query Interval Code as follows:
      
         If QQIC < 128, QQI = QQIC
      
         If QQIC >= 128, QQIC represents a floating-point value as follows:
      
             0 1 2 3 4 5 6 7
            +-+-+-+-+-+-+-+-+
            |1| exp | mant  |
            +-+-+-+-+-+-+-+-+
      
         QQI = (mant | 0x10) << (exp + 3)
      
                                                      -- rfc3810
      
      #define MLDV2_QQIC(value) MLDV2_EXP(0x80, 4, 3, value)
      #define MLDV2_MRC(value) MLDV2_EXP(0x8000, 12, 3, value)
      
      Above macro are defined in mcast.c. but 1 << 4 == 0x10 and 1 << 12 == 0x1000.
      So the result computed by original Macro is larger.
      Signed-off-by: NYan Zheng <yanzheng@21cn.com>
      Acked-by: NDavid L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      f12baeab
    • A
      [IPv4/IPv6]: UFO Scatter-gather approach · e89e9cf5
      Ananda Raju 提交于
      Attached is kernel patch for UDP Fragmentation Offload (UFO) feature.
      
      1. This patch incorporate the review comments by Jeff Garzik.
      2. Renamed USO as UFO (UDP Fragmentation Offload)
      3. udp sendfile support with UFO
      
      This patches uses scatter-gather feature of skb to generate large UDP
      datagram. Below is a "how-to" on changes required in network device
      driver to use the UFO interface.
      
      UDP Fragmentation Offload (UFO) Interface:
      -------------------------------------------
      UFO is a feature wherein the Linux kernel network stack will offload the
      IP fragmentation functionality of large UDP datagram to hardware. This
      will reduce the overhead of stack in fragmenting the large UDP datagram to
      MTU sized packets
      
      1) Drivers indicate their capability of UFO using
      dev->features |= NETIF_F_UFO | NETIF_F_HW_CSUM | NETIF_F_SG
      
      NETIF_F_HW_CSUM is required for UFO over ipv6.
      
      2) UFO packet will be submitted for transmission using driver xmit routine.
      UFO packet will have a non-zero value for
      
      "skb_shinfo(skb)->ufo_size"
      
      skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP
      fragment going out of the adapter after IP fragmentation by hardware.
      
      skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[]
      contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW
      indicating that hardware has to do checksum calculation. Hardware should
      compute the UDP checksum of complete datagram and also ip header checksum of
      each fragmented IP packet.
      
      For IPV6 the UFO provides the fragment identification-id in
      skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating
      IPv6 fragments.
      Signed-off-by: NAnanda Raju <ananda.raju@neterion.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      e89e9cf5
  8. 26 10月, 2005 2 次提交
  9. 16 10月, 2005 1 次提交
  10. 14 10月, 2005 1 次提交
  11. 11 10月, 2005 2 次提交
    • H
      [IPSEC] Fix block size/MTU bugs in ESP · d4875b04
      Herbert Xu 提交于
      This patch fixes the following bugs in ESP:
      
      * Fix transport mode MTU overestimate.  This means that the inner MTU
        is smaller than it needs be.  Worse yet, given an input MTU which
        is a multiple of 4 it will always produce an estimate which is not
        a multiple of 4.
      
        For example, given a standard ESP/3DES/MD5 transform and an MTU of
        1500, the resulting MTU for transport mode is 1462 when it should
        be 1464.
      
        The reason for this is because IP header lengths are always a multiple
        of 4 for IPv4 and 8 for IPv6.
      
      * Ensure that the block size is at least 4.  This is required by RFC2406
        and corresponds to what the esp_output function does.  At the moment
        this only affects crypto_null as its block size is 1.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4875b04
    • H
      [IPSEC]: Use ALIGN macro in ESP · a02a6422
      Herbert Xu 提交于
      This patch uses the macro ALIGN in all the applicable spots for ESP.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a02a6422
  12. 06 10月, 2005 2 次提交
  13. 05 10月, 2005 1 次提交
  14. 04 10月, 2005 5 次提交
    • H
      [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl · e5ed6399
      Herbert Xu 提交于
      The following patch renames __in_dev_get() to __in_dev_get_rtnl() and
      introduces __in_dev_get_rcu() to cover the second case.
      
      1) RCU with refcnt should use in_dev_get().
      2) RCU without refcnt should use __in_dev_get_rcu().
      3) All others must hold RTNL and use __in_dev_get_rtnl().
      
      There is one exception in net/ipv4/route.c which is in fact a pre-existing
      race condition.  I've marked it as such so that we remember to fix it.
      
      This patch is based on suggestions and prior work by Suzanne Wood and
      Paul McKenney.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5ed6399
    • D
      [IPV6]: Fix leak added by udp connect dst caching fix. · a5e7c210
      David S. Miller 提交于
      Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a5e7c210
    • Y
      [IPV6]: Fix ipv6 fragment ID selection at slow path · f36d6ab1
      Yan Zheng 提交于
      Signed-Off-By: NYan Zheng <yanzheng@21cn.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f36d6ab1
    • E
      [INET]: speedup inet (tcp/dccp) lookups · 81c3d547
      Eric Dumazet 提交于
      Arnaldo and I agreed it could be applied now, because I have other
      pending patches depending on this one (Thank you Arnaldo)
      
      (The other important patch moves skc_refcnt in a separate cache line,
      so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)
      
      1) First some performance data :
      --------------------------------
      
      tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()
      
      The most time critical code is :
      
      sk_for_each(sk, node, &head->chain) {
           if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
               goto hit; /* You sunk my battleship! */
      }
      
      The sk_for_each() does use prefetch() hints but only the begining of
      "struct sock" is prefetched.
      
      As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
      away from the begining of "struct sock", it has to bring into CPU
      cache cold cache line. Each iteration has to use at least 2 cache
      lines.
      
      This can be problematic if some chains are very long.
      
      2) The goal
      -----------
      
      The idea I had is to change things so that INET_MATCH() may return
      FALSE in 99% of cases only using the data already in the CPU cache,
      using one cache line per iteration.
      
      3) Description of the patch
      ---------------------------
      
      Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
      filling a 32 bits hole on 64 bits platform.
      
      struct sock_common {
      	unsigned short		skc_family;
      	volatile unsigned char	skc_state;
      	unsigned char		skc_reuse;
      	int			skc_bound_dev_if;
      	struct hlist_node	skc_node;
      	struct hlist_node	skc_bind_node;
      	atomic_t		skc_refcnt;
      +	unsigned int		skc_hash;
      	struct proto		*skc_prot;
      };
      
      Store in this 32 bits field the full hash, not masked by (ehash_size -
      1) Using this full hash as the first comparison done in INET_MATCH
      permits us immediatly skip the element without touching a second cache
      line in case of a miss.
      
      Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
      sk_hash and tw_hash) already contains the slot number if we mask with
      (ehash_size - 1)
      
      File include/net/inet_hashtables.h
      
      64 bits platforms :
      #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))
           ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
           ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      32bits platforms:
      #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
           (((__sk)->sk_hash == (__hash))                 &&  \
           (inet_sk(__sk)->daddr          == (__saddr))   &&  \
           (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
           (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
      
      
      - Adds a prefetch(head->chain.first) in 
      __inet_lookup_established()/__tcp_v4_check_established() and 
      __inet6_lookup_established()/__tcp_v6_check_established() and 
      __dccp_v4_check_established() to bring into cache the first element of the 
      list, before the {read|write}_lock(&head->lock);
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      81c3d547
    • H
      [NET]: Fix packet timestamping. · 325ed823
      Herbert Xu 提交于
      I've found the problem in general.  It affects any 64-bit
      architecture.  The problem occurs when you change the system time.
      
      Suppose that when you boot your system clock is forward by a day.
      This gets recorded down in skb_tv_base.  You then wind the clock back
      by a day.  From that point onwards the offset will be negative which
      essentially overflows the 32-bit variables they're stored in.
      
      In fact, why don't we just store the real time stamp in those 32-bit
      variables? After all, we're not going to overflow for quite a while
      yet.
      
      When we do overflow, we'll need a better solution of course.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      325ed823
  15. 27 9月, 2005 1 次提交
  16. 25 9月, 2005 1 次提交
  17. 20 9月, 2005 2 次提交
  18. 18 9月, 2005 1 次提交
  19. 15 9月, 2005 2 次提交
  20. 10 9月, 2005 3 次提交