1. 10 10月, 2013 1 次提交
    • E
      inet: includes a sock_common in request_sock · 634fb979
      Eric Dumazet 提交于
      TCP listener refactoring, part 5 :
      
      We want to be able to insert request sockets (SYN_RECV) into main
      ehash table instead of the per listener hash table to allow RCU
      lookups and remove listener lock contention.
      
      This patch includes the needed struct sock_common in front
      of struct request_sock
      
      This means there is no more inet6_request_sock IPv6 specific
      structure.
      
      Following inet_request_sock fields were renamed as they became
      macros to reference fields from struct sock_common.
      Prefix ir_ was chosen to avoid name collisions.
      
      loc_port   -> ir_loc_port
      loc_addr   -> ir_loc_addr
      rmt_addr   -> ir_rmt_addr
      rmt_port   -> ir_rmt_port
      iif        -> ir_iif
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634fb979
  2. 09 10月, 2013 1 次提交
    • E
      ipv6: make lookups simpler and faster · efe4208f
      Eric Dumazet 提交于
      TCP listener refactoring, part 4 :
      
      To speed up inet lookups, we moved IPv4 addresses from inet to struct
      sock_common
      
      Now is time to do the same for IPv6, because it permits us to have fast
      lookups for all kind of sockets, including upcoming SYN_RECV.
      
      Getting IPv6 addresses in TCP lookups currently requires two extra cache
      lines, plus a dereference (and memory stall).
      
      inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
      
      This patch is way bigger than its IPv4 counter part, because for IPv4,
      we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
      it's not doable easily.
      
      inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
      inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
      
      And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
      at the same offset.
      
      We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
      macro.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe4208f
  3. 04 10月, 2013 1 次提交
    • E
      inet: consolidate INET_TW_MATCH · 50805466
      Eric Dumazet 提交于
      TCP listener refactoring, part 2 :
      
      We can use a generic lookup, sockets being in whatever state, if
      we are sure all relevant fields are at the same place in all socket
      types (ESTABLISH, TIME_WAIT, SYN_RECV)
      
      This patch removes these macros :
      
       inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
      
      And adds :
      
       sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
      
      Then, INET_TW_MATCH() is really the same than INET_MATCH()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50805466
  4. 30 8月, 2013 1 次提交
  5. 20 8月, 2013 1 次提交
  6. 14 8月, 2013 1 次提交
    • H
      ipv6: make unsolicited report intervals configurable for mld · fc4eba58
      Hannes Frederic Sowa 提交于
      Commit cab70040 ("net: igmp:
      Reduce Unsolicited report interval to 1s when using IGMPv3") and
      2690048c ("net: igmp: Allow user-space
      configuration of igmp unsolicited report interval") by William Manley made
      igmp unsolicited report intervals configurable per interface and corrected
      the interval of unsolicited igmpv3 report messages resendings to 1s.
      
      Same needs to be done for IPv6:
      
      MLDv1 (RFC2710 7.10.): 10 seconds
      MLDv2 (RFC3810 9.11.): 1 second
      
      Both intervals are configurable via new procfs knobs
      mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
      
      (also added .force_mld_version to ipv6_devconf_dflt to bring structs in
      line without semantic changes)
      
      v2:
      a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
         unsolicited_report_interval procfs knobs.
      b) incorporate stylistic feedback from William Manley
      
      v3:
      a) add new DEVCONF_* values to the end of the enum (thanks to David
         Miller)
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: William Manley <william.manley@youview.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4eba58
  7. 31 1月, 2013 1 次提交
  8. 14 1月, 2013 2 次提交
  9. 09 12月, 2012 1 次提交
  10. 01 12月, 2012 1 次提交
    • E
      net: move inet_dport/inet_num in sock_common · ce43b03e
      Eric Dumazet 提交于
      commit 68835aba (net: optimize INET input path further)
      moved some fields used for tcp/udp sockets lookup in the first cache
      line of struct sock_common.
      
      This patch moves inet_dport/inet_num as well, filling a 32bit hole
      on 64 bit arches and reducing number of cache line misses in lookups.
      
      Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
      before addresses match, as this check is more discriminant.
      
      Remove the hash check from MATCH() macros because we dont need to
      re validate the hash value after taking a refcount on socket, and
      use likely/unlikely compiler hints, as the sk_hash/hash check
      makes the following conditional tests 100% predicted by cpu.
      
      Introduce skc_addrpair/skc_portpair pair values to better
      document the alignment requirements of the port/addr pairs
      used in the various MATCH() macros, and remove some casts.
      
      The namespace check can also be done at last.
      
      This slightly improves TCP/UDP lookup times.
      
      IP/TCP early demux needs inet->rx_dst_ifindex and
      TCP needs inet->min_ttl, lets group them together in same cache line.
      
      With help from Ben Hutchings & Joe Perches.
      
      Idea of this patch came after Ling Ma proposal to move skc_hash
      to the beginning of struct sock_common, and should allow him
      to submit a final version of his patch. My tests show an improvement
      doing so.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce43b03e
  11. 14 11月, 2012 1 次提交
  12. 13 10月, 2012 1 次提交
  13. 30 8月, 2012 1 次提交
    • P
      netfilter: nf_conntrack_ipv6: improve fragmentation handling · 4cdd3408
      Patrick McHardy 提交于
      The IPv6 conntrack fragmentation currently has a couple of shortcomings.
      Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
      defragmented packet is then passed to conntrack, the resulting conntrack
      information is attached to each original fragment and the fragments then
      continue their way through the stack.
      
      Helper invocation occurs in the POSTROUTING hook, at which point only
      the original fragments are available. The result of this is that
      fragmented packets are never passed to helpers.
      
      This patch improves the situation in the following way:
      
      - If a reassembled packet belongs to a connection that has a helper
        assigned, the reassembled packet is passed through the stack instead
        of the original fragments.
      
      - During defragmentation, the largest received fragment size is stored.
        On output, the packet is refragmented if required. If the largest
        received fragment size exceeds the outgoing MTU, a "packet too big"
        message is generated, thus behaving as if the original fragments
        were passed through the stack from an outside point of view.
      
      - The ipv6_helper() hook function can't receive fragments anymore for
        connections using a helper, so it is switched to use ipv6_skip_exthdr()
        instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
        reassembled packets are passed to connection tracking helpers.
      
      The result of this is that we can properly track fragmented packets, but
      still generate ICMPv6 Packet too big messages if we would have before.
      
      This patch is also required as a precondition for IPv6 NAT, where NAT
      helpers might enlarge packets up to a point that they require
      fragmentation. In that case we can't generate Packet too big messages
      since the proper MTU can't be calculated in all cases (f.i. when
      changing textual representation of a variable amount of addresses),
      so the packet is transparently fragmented iff the original packet or
      fragments would have fit the outgoing MTU.
      
      IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      4cdd3408
  14. 07 8月, 2012 1 次提交
  15. 18 7月, 2012 1 次提交
  16. 11 7月, 2012 1 次提交
  17. 13 2月, 2012 2 次提交
  18. 09 2月, 2012 1 次提交
  19. 12 12月, 2011 1 次提交
  20. 25 11月, 2010 1 次提交
  21. 21 10月, 2010 1 次提交
  22. 23 8月, 2010 1 次提交
  23. 20 7月, 2010 1 次提交
    • D
      ipv6: Make IP6CB(skb)->nhoff 16-bit. · e7c38157
      David S. Miller 提交于
      Even with jumbograms I cannot see any way in which we would need
      to records a larger than 65535 valued next-header offset.
      
      The maximum extension header length is (256 << 3) == 2048.
      There are only a handful of extension headers specified which
      we'd even accept (say 5 or 6), therefore the largest next-header
      offset we'd ever have to contend with is something less than
      say 16k.
      
      Therefore make it a u16 instead of a u32.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7c38157
  24. 03 6月, 2010 1 次提交
  25. 11 5月, 2010 1 次提交
    • P
      ipv6: ip6mr: support multiple tables · d1db275d
      Patrick McHardy 提交于
      This patch adds support for multiple independant multicast routing instances,
      named "tables".
      
      Userspace multicast routing daemons can bind to a specific table instance by
      issuing a setsockopt call using a new option MRT6_TABLE. The table number is
      stored in the raw socket data and affects all following ip6mr setsockopt(),
      getsockopt() and ioctl() calls. By default, a single table (RT6_TABLE_DFLT)
      is created with a default routing rule pointing to it. Newly created pim6reg
      devices have the table number appended ("pim6regX"), with the exception of
      devices created in the default table, which are named just "pim6reg" for
      compatibility reasons.
      
      Packets are directed to a specific table instance using routing rules,
      similar to how regular routing rules work. Currently iif, oif and mark
      are supported as keys, source and destination addresses could be supported
      additionally.
      
      Example usage:
      
      - bind pimd/xorp/... to a specific table:
      
      uint32_t table = 123;
      setsockopt(fd, SOL_IPV6, MRT6_TABLE, &table, sizeof(table));
      
      - create routing rules directing packets to the new table:
      
      # ip -6 mrule add iif eth0 lookup 123
      # ip -6 mrule add oif eth0 lookup 123
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      d1db275d
  26. 24 4月, 2010 2 次提交
  27. 23 4月, 2010 1 次提交
    • S
      IPv6: Generic TTL Security Mechanism (final version) · e802af9c
      Stephen Hemminger 提交于
      This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism.  
      
      Not to users of mapped address; the IPV6 and IPV4 socket options are seperate.
      The server does have to deal with both IPv4 and IPv6 socket options
      and the client has to handle the different for each family.
      
      On client:
      	int ttl = 255;
      	getaddrinfo(argv[1], argv[2], &hint, &result);
      
      	for (rp = result; rp != NULL; rp = rp->ai_next) {
      		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
      		if (s < 0) continue;
      
      		if (rp->ai_family == AF_INET) {
      			setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl));
      		} else if (rp->ai_family == AF_INET6) {
      			setsockopt(s, IPPROTO_IPV6,  IPV6_UNICAST_HOPS, 
      					&ttl, sizeof(ttl)))
      		}
      			
      		if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) {
      		   ...
      
      On server:
      	int minttl = 255 - maxhops;
         
      	getaddrinfo(NULL, port, &hints, &result);
      	for (rp = result; rp != NULL; rp = rp->ai_next) {
      		s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol);
      		if (s < 0) continue;
      
      		if (rp->ai_family == AF_INET6)
      			setsockopt(s, IPPROTO_IPV6,  IPV6_MINHOPCOUNT,
      					&minttl, sizeof(minttl));
      		setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl));
      			
      		if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0)
      			break
      ...
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e802af9c
  28. 13 4月, 2010 1 次提交
  29. 19 10月, 2009 1 次提交
    • E
      inet: rename some inet_sock fields · c720c7e8
      Eric Dumazet 提交于
      In order to have better cache layouts of struct sock (separate zones
      for rx/tx paths), we need this preliminary patch.
      
      Goal is to transfert fields used at lookup time in the first
      read-mostly cache line (inside struct sock_common) and move sk_refcnt
      to a separate cache line (only written by rx path)
      
      This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
      sport and id fields. This allows a future patch to define these
      fields as macros, like sk_refcnt, without name clashes.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c720c7e8
  30. 09 10月, 2009 1 次提交
    • J
      ipv6: Fix the size overflow of addrconf_sysctl array · e0e6f55d
      Jin Dongming 提交于
      (This patch fixes bug of commit f7734fdf
       title "make TLLAO option for NA packets configurable")
      
      When the IPV6 conf is used, the function sysctl_set_parent is called and the
      array addrconf_sysctl is used as a parameter of the function.
      
      The above patch added new conf "force_tllao" into the array addrconf_sysctl,
      but the size of the array was not modified, the static allocated size is
      DEVCONF_MAX + 1 but the real size is DEVCONF_MAX + 2, so the problem is
      that the function sysctl_set_parent accessed wrong address.
      
      I got the following information.
      Call Trace:
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff8106085d>] sysctl_set_parent+0x29/0x3e
          [<ffffffff810622d5>] __register_sysctl_paths+0xde/0x272
          [<ffffffff8110892d>] ? __kmalloc_track_caller+0x16e/0x180
          [<ffffffffa00cfac3>] ? __addrconf_sysctl_register+0xc5/0x144 [ipv6]
          [<ffffffff8141f2c9>] register_net_sysctl_table+0x48/0x4b
          [<ffffffffa00cfaf5>] __addrconf_sysctl_register+0xf7/0x144 [ipv6]
          [<ffffffffa00cfc16>] addrconf_init_net+0xd4/0x104 [ipv6]
          [<ffffffff8139195f>] setup_net+0x35/0x82
          [<ffffffff81391f6c>] copy_net_ns+0x76/0xe0
          [<ffffffff8107ad60>] create_new_namespaces+0xf0/0x16e
          [<ffffffff8107afee>] copy_namespaces+0x65/0x9f
          [<ffffffff81056dff>] copy_process+0xb2c/0x12c3
          [<ffffffff810576e1>] do_fork+0x14b/0x2d2
          [<ffffffff8107ac4e>] ? up_read+0xe/0x10
          [<ffffffff81438e73>] ? do_page_fault+0x27a/0x2aa
          [<ffffffff8101044b>] sys_clone+0x28/0x2a
          [<ffffffff81011fb3>] stub_clone+0x13/0x20
          [<ffffffff81011c72>] ? system_call_fastpath+0x16/0x1b
      
      And the information of IPV6 in .config is as following.
      IPV6 in .config:
          CONFIG_IPV6=m
          CONFIG_IPV6_PRIVACY=y
          CONFIG_IPV6_ROUTER_PREF=y
          CONFIG_IPV6_ROUTE_INFO=y
          CONFIG_IPV6_OPTIMISTIC_DAD=y
          CONFIG_IPV6_MIP6=m
          CONFIG_IPV6_SIT=m
          # CONFIG_IPV6_SIT_6RD is not set
          CONFIG_IPV6_NDISC_NODETYPE=y
          CONFIG_IPV6_TUNNEL=m
          CONFIG_IPV6_MULTIPLE_TABLES=y
          CONFIG_IPV6_SUBTREES=y
          CONFIG_IPV6_MROUTE=y
          CONFIG_IPV6_PIMSM_V2=y
          # CONFIG_IP_VS_IPV6 is not set
          CONFIG_NF_CONNTRACK_IPV6=m
          CONFIG_IP6_NF_MATCH_IPV6HEADER=m
      
      I confirmed this patch fixes this problem.
      Signed-off-by: NJin Dongming <jin.dongming@np.css.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e6f55d
  31. 07 10月, 2009 1 次提交
    • O
      make TLLAO option for NA packets configurable · f7734fdf
      Octavian Purdila 提交于
      On Friday 02 October 2009 20:53:51 you wrote:
      
      > This is good although I would have shortened the name.
      
      Ah, I knew I forgot something :) Here is v4.
      
      tavi
      
      >From 24d96d825b9fa832b22878cc6c990d5711968734 Mon Sep 17 00:00:00 2001
      From: Octavian Purdila <opurdila@ixiacom.com>
      Date: Fri, 2 Oct 2009 00:51:15 +0300
      Subject: [PATCH] ipv6: new sysctl for sending TLLAO with unicast NAs
      
      Neighbor advertisements responding to unicast neighbor solicitations
      did not include the target link-layer address option. This patch adds
      a new sysctl option (disabled by default) which controls whether this
      option should be sent even with unicast NAs.
      
      The need for this arose because certain routers expect the TLLAO in
      some situations even as a response to unicast NS packets.
      
      Moreover, RFC 2461 recommends sending this to avoid a race condition
      (section 4.4, Target link-layer address)
      Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
      Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f7734fdf
  32. 01 6月, 2009 1 次提交
  33. 31 1月, 2009 1 次提交
  34. 16 12月, 2008 1 次提交
    • Y
      ipv6: Add IPV6_PKTINFO sticky option support to setsockopt() · b24a2516
      Yang Hongyang 提交于
      There are three reasons for me to add this support:
      1.When no interface is specified in an IPV6_PKTINFO ancillary data
        item, the interface specified in an IPV6_PKTINFO sticky optionis 
        is used.
      
      RFC3542:
      6.7.  Summary of Outgoing Interface Selection
      
         This document and [RFC-3493] specify various methods that affect the
         selection of the packet's outgoing interface.  This subsection
         summarizes the ordering among those in order to ensure deterministic
         behavior.
      
         For a given outgoing packet on a given socket, the outgoing interface
         is determined in the following order:
      
         1. if an interface is specified in an IPV6_PKTINFO ancillary data
            item, the interface is used.
      
         2. otherwise, if an interface is specified in an IPV6_PKTINFO sticky
            option, the interface is used.
      
      2.When no IPV6_PKTINFO ancillary data is received,getsockopt() should 
        return the sticky option value which set with setsockopt().
      
      RFC 3542:
         Issuing getsockopt() for the above options will return the sticky
         option value i.e., the value set with setsockopt().  If no sticky
         option value has been set getsockopt() will return the following
         values:
      
      3.Make the setsockopt implementation POSIX compliant.
      Signed-off-by: NYang Hongyang <yanghy@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b24a2516
  35. 22 7月, 2008 1 次提交
  36. 03 7月, 2008 2 次提交