1. 26 2月, 2016 1 次提交
    • D
      net: ipv6: Make address flushing on ifdown optional · f1705ec1
      David Ahern 提交于
      Currently, all ipv6 addresses are flushed when the interface is configured
      down, including global, static addresses:
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          << nothing; all addresses have been flushed>>
      
      Add a new sysctl to make this behavior optional. The new setting defaults to
      flush all addresses to maintain backwards compatibility. When the set global
      addresses with no expire times are not flushed on an admin down. The sysctl
      is per-interface or system-wide for all interfaces
      
          $ sysctl -w net.ipv6.conf.eth1.keep_addr_on_down=1
      or
          $ sysctl -w net.ipv6.conf.all.keep_addr_on_down=1
      
      Will keep addresses on eth1 on an admin down.
      
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
              inet6 2100:1::2/120 scope global
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link
                 valid_lft forever preferred_lft forever
          $ ip link set dev eth1 down
          $ ip -6 addr show dev eth1
          3: eth1: <BROADCAST,MULTICAST> mtu 1500 state DOWN qlen 1000
              inet6 2100:1::2/120 scope global tentative
                 valid_lft forever preferred_lft forever
              inet6 fe80::e0:f9ff:fe79:34bd/64 scope link tentative
                 valid_lft forever preferred_lft forever
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1705ec1
  2. 11 2月, 2016 2 次提交
  3. 03 12月, 2015 1 次提交
  4. 05 10月, 2015 1 次提交
  5. 14 8月, 2015 1 次提交
    • A
      net: ipv6 sysctl option to ignore routes when nexthop link is down · 35103d11
      Andy Gospodarek 提交于
      Like the ipv4 patch with a similar title, this adds a sysctl to allow
      the user to change routing behavior based on whether or not the
      interface associated with the nexthop was an up or down link.  The
      default setting preserves the current behavior, but anyone that enables
      it will notice that nexthops on down interfaces will no longer be
      selected:
      
      net.ipv6.conf.all.ignore_routes_with_linkdown = 0
      net.ipv6.conf.default.ignore_routes_with_linkdown = 0
      net.ipv6.conf.lo.ignore_routes_with_linkdown = 0
      ...
      
      When the above sysctls are set, not only will link status be reported to
      userspace, but an indication that a nexthop is dead and will not be used
      is also reported.
      
      1000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      1000::/8 via 8000::2 dev p8p1  metric 1024  pref medium
      7000::/8 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      8000::/8 dev p8p1  proto kernel  metric 256  pref medium
      9000::/8 via 8000::2 dev p8p1  metric 2048  pref medium
      9000::/8 via 7000::2 dev p7p1  metric 1024 dead linkdown  pref medium
      fe80::/64 dev p7p1  proto kernel  metric 256 dead linkdown  pref medium
      fe80::/64 dev p8p1  proto kernel  metric 256  pref medium
      
      This also adds devconf support and notification when sysctl values
      change.
      
      v2: drop use of rt6i_nhflags since it is not needed right now
      Signed-off-by: NAndy Gospodarek <gospo@cumulusnetworks.com>
      Signed-off-by: NDinesh Dutt <ddutt@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35103d11
  6. 31 7月, 2015 1 次提交
    • H
      net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7
      Hangbin Liu 提交于
      Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
      disabled accept hop limit from RA if it is smaller than the current hop
      limit for security stuff. But this behavior kind of break the RFC definition.
      
      RFC 4861, 6.3.4.  Processing Received Router Advertisements
         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
         and Retrans Timer) may contain a value denoting that it is
         unspecified.  In such cases, the parameter should be ignored and the
         host should continue using whatever value it is already using.
      
         If the received Cur Hop Limit value is non-zero, the host SHOULD set
         its CurHopLimit variable to the received value.
      
      So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
      hop limit value they can accept from RA. And set default to 1 to meet RFC
      standards.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8013d1d7
  7. 23 7月, 2015 1 次提交
  8. 10 7月, 2015 1 次提交
  9. 24 3月, 2015 1 次提交
  10. 03 2月, 2015 1 次提交
  11. 26 1月, 2015 1 次提交
  12. 06 11月, 2014 1 次提交
  13. 30 10月, 2014 1 次提交
    • E
      net: ipv6: Add a sysctl to make optimistic addresses useful candidates · 7fd2561e
      Erik Kline 提交于
      Add a sysctl that causes an interface's optimistic addresses
      to be considered equivalent to other non-deprecated addresses
      for source address selection purposes.  Preferred addresses
      will still take precedence over optimistic addresses, subject
      to other ranking in the source address selection algorithm.
      
      This is useful where different interfaces are connected to
      different networks from different ISPs (e.g., a cell network
      and a home wifi network).
      
      The current behaviour complies with RFC 3484/6724, and it
      makes sense if the host has only one interface, or has
      multiple interfaces on the same network (same or cooperating
      administrative domain(s), but not in the multiple distinct
      networks case.
      
      For example, if a mobile device has an IPv6 address on an LTE
      network and then connects to IPv6-enabled wifi, while the wifi
      IPv6 address is undergoing DAD, IPv6 connections will try use
      the wifi default route with the LTE IPv6 address, and will get
      stuck until they time out.
      
      Also, because optimistic nodes can receive frames, issue
      an RTM_NEWADDR as soon as DAD starts (with the IFA_F_OPTIMSTIC
      flag appropriately set).  A second RTM_NEWADDR is sent if DAD
      completes (the address flags have changed), otherwise an
      RTM_DELADDR is sent.
      
      Also: add an entry in ip-sysctl.txt for optimistic_dad.
      Signed-off-by: NErik Kline <ek@google.com>
      Acked-by: NLorenzo Colitti <lorenzo@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fd2561e
  14. 08 7月, 2014 1 次提交
    • T
      ipv6: Implement automatic flow label generation on transmit · cb1ce2ef
      Tom Herbert 提交于
      Automatically generate flow labels for IPv6 packets on transmit.
      The flow label is computed based on skb_get_hash. The flow label will
      only automatically be set when it is zero otherwise (i.e. flow label
      manager hasn't set one). This supports the transmit side functionality
      of RFC 6438.
      
      Added an IPv6 sysctl auto_flowlabels to enable/disable this behavior
      system wide, and added IPV6_AUTOFLOWLABEL socket option to enable this
      functionality per socket.
      
      By default, auto flowlabels are disabled to avoid possible conflicts
      with flow label manager, however if this feature proves useful we
      may want to enable it by default.
      
      It should also be noted that FreeBSD has already implemented automatic
      flow labels (including the sysctl and socket option). In FreeBSD,
      automatic flow labels default to enabled.
      
      Performance impact:
      
      Running super_netperf with 200 flows for TCP_RR and UDP_RR for
      IPv6. Note that in UDP case, __skb_get_hash will be called for
      every packet with explains slight regression. In the TCP case
      the hash is saved in the socket so there is no regression.
      
      Automatic flow labels disabled:
      
        TCP_RR:
          86.53% CPU utilization
          127/195/322 90/95/99% latencies
          1.40498e+06 tps
      
        UDP_RR:
          90.70% CPU utilization
          118/168/243 90/95/99% latencies
          1.50309e+06 tps
      
      Automatic flow labels enabled:
      
        TCP_RR:
          85.90% CPU utilization
          128/199/337 90/95/99% latencies
          1.40051e+06
      
        UDP_RR
          92.61% CPU utilization
          115/164/236 90/95/99% latencies
          1.4687e+06
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb1ce2ef
  15. 02 7月, 2014 2 次提交
    • E
      inet: move ipv6only in sock_common · 9fe516ba
      Eric Dumazet 提交于
      When an UDP application switches from AF_INET to AF_INET6 sockets, we
      have a small performance degradation for IPv4 communications because of
      extra cache line misses to access ipv6only information.
      
      This can also be noticed for TCP listeners, as ipv6_only_sock() is also
      used from __inet_lookup_listener()->compute_score()
      
      This is magnified when SO_REUSEPORT is used.
      
      Move ipv6only into struct sock_common so that it is available at
      no extra cost in lookups.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fe516ba
    • B
      ipv6: Allow accepting RA from local IP addresses. · d9333196
      Ben Greear 提交于
      This can be used in virtual networking applications, and
      may have other uses as well.  The option is disabled by
      default.
      
      A specific use case is setting up virtual routers, bridges, and
      hosts on a single OS without the use of network namespaces or
      virtual machines.  With proper use of ip rules, routing tables,
      veth interface pairs and/or other virtual interfaces,
      and applications that can bind to interfaces and/or IP addresses,
      it is possibly to create one or more virtual routers with multiple
      hosts attached.  The host interfaces can act as IPv6 systems,
      with radvd running on the ports in the virtual routers.  With the
      option provided in this patch enabled, those hosts can now properly
      obtain IPv6 addresses from the radvd.
      Signed-off-by: NBen Greear <greearb@candelatech.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9333196
  16. 28 6月, 2014 1 次提交
  17. 20 1月, 2014 2 次提交
  18. 19 12月, 2013 1 次提交
  19. 10 12月, 2013 2 次提交
  20. 06 12月, 2013 1 次提交
  21. 29 10月, 2013 1 次提交
  22. 10 10月, 2013 1 次提交
    • E
      inet: includes a sock_common in request_sock · 634fb979
      Eric Dumazet 提交于
      TCP listener refactoring, part 5 :
      
      We want to be able to insert request sockets (SYN_RECV) into main
      ehash table instead of the per listener hash table to allow RCU
      lookups and remove listener lock contention.
      
      This patch includes the needed struct sock_common in front
      of struct request_sock
      
      This means there is no more inet6_request_sock IPv6 specific
      structure.
      
      Following inet_request_sock fields were renamed as they became
      macros to reference fields from struct sock_common.
      Prefix ir_ was chosen to avoid name collisions.
      
      loc_port   -> ir_loc_port
      loc_addr   -> ir_loc_addr
      rmt_addr   -> ir_rmt_addr
      rmt_port   -> ir_rmt_port
      iif        -> ir_iif
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      634fb979
  23. 09 10月, 2013 1 次提交
    • E
      ipv6: make lookups simpler and faster · efe4208f
      Eric Dumazet 提交于
      TCP listener refactoring, part 4 :
      
      To speed up inet lookups, we moved IPv4 addresses from inet to struct
      sock_common
      
      Now is time to do the same for IPv6, because it permits us to have fast
      lookups for all kind of sockets, including upcoming SYN_RECV.
      
      Getting IPv6 addresses in TCP lookups currently requires two extra cache
      lines, plus a dereference (and memory stall).
      
      inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
      
      This patch is way bigger than its IPv4 counter part, because for IPv4,
      we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
      it's not doable easily.
      
      inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
      inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
      
      And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
      at the same offset.
      
      We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
      macro.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe4208f
  24. 04 10月, 2013 1 次提交
    • E
      inet: consolidate INET_TW_MATCH · 50805466
      Eric Dumazet 提交于
      TCP listener refactoring, part 2 :
      
      We can use a generic lookup, sockets being in whatever state, if
      we are sure all relevant fields are at the same place in all socket
      types (ESTABLISH, TIME_WAIT, SYN_RECV)
      
      This patch removes these macros :
      
       inet_addrpair, inet_addrpair, tw_addrpair, tw_portpair
      
      And adds :
      
       sk_portpair, sk_addrpair, sk_daddr, sk_rcv_saddr
      
      Then, INET_TW_MATCH() is really the same than INET_MATCH()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50805466
  25. 30 8月, 2013 1 次提交
  26. 20 8月, 2013 1 次提交
  27. 14 8月, 2013 1 次提交
    • H
      ipv6: make unsolicited report intervals configurable for mld · fc4eba58
      Hannes Frederic Sowa 提交于
      Commit cab70040 ("net: igmp:
      Reduce Unsolicited report interval to 1s when using IGMPv3") and
      2690048c ("net: igmp: Allow user-space
      configuration of igmp unsolicited report interval") by William Manley made
      igmp unsolicited report intervals configurable per interface and corrected
      the interval of unsolicited igmpv3 report messages resendings to 1s.
      
      Same needs to be done for IPv6:
      
      MLDv1 (RFC2710 7.10.): 10 seconds
      MLDv2 (RFC3810 9.11.): 1 second
      
      Both intervals are configurable via new procfs knobs
      mldv1_unsolicited_report_interval and mldv2_unsolicited_report_interval.
      
      (also added .force_mld_version to ipv6_devconf_dflt to bring structs in
      line without semantic changes)
      
      v2:
      a) Joined documentation update for IPv4 and IPv6 MLD/IGMP
         unsolicited_report_interval procfs knobs.
      b) incorporate stylistic feedback from William Manley
      
      v3:
      a) add new DEVCONF_* values to the end of the enum (thanks to David
         Miller)
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: William Manley <william.manley@youview.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc4eba58
  28. 31 1月, 2013 1 次提交
  29. 14 1月, 2013 2 次提交
  30. 09 12月, 2012 1 次提交
  31. 01 12月, 2012 1 次提交
    • E
      net: move inet_dport/inet_num in sock_common · ce43b03e
      Eric Dumazet 提交于
      commit 68835aba (net: optimize INET input path further)
      moved some fields used for tcp/udp sockets lookup in the first cache
      line of struct sock_common.
      
      This patch moves inet_dport/inet_num as well, filling a 32bit hole
      on 64 bit arches and reducing number of cache line misses in lookups.
      
      Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
      before addresses match, as this check is more discriminant.
      
      Remove the hash check from MATCH() macros because we dont need to
      re validate the hash value after taking a refcount on socket, and
      use likely/unlikely compiler hints, as the sk_hash/hash check
      makes the following conditional tests 100% predicted by cpu.
      
      Introduce skc_addrpair/skc_portpair pair values to better
      document the alignment requirements of the port/addr pairs
      used in the various MATCH() macros, and remove some casts.
      
      The namespace check can also be done at last.
      
      This slightly improves TCP/UDP lookup times.
      
      IP/TCP early demux needs inet->rx_dst_ifindex and
      TCP needs inet->min_ttl, lets group them together in same cache line.
      
      With help from Ben Hutchings & Joe Perches.
      
      Idea of this patch came after Ling Ma proposal to move skc_hash
      to the beginning of struct sock_common, and should allow him
      to submit a final version of his patch. My tests show an improvement
      doing so.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Ling Ma <ling.ma.program@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce43b03e
  32. 14 11月, 2012 1 次提交
  33. 13 10月, 2012 1 次提交
  34. 30 8月, 2012 1 次提交
    • P
      netfilter: nf_conntrack_ipv6: improve fragmentation handling · 4cdd3408
      Patrick McHardy 提交于
      The IPv6 conntrack fragmentation currently has a couple of shortcomings.
      Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
      defragmented packet is then passed to conntrack, the resulting conntrack
      information is attached to each original fragment and the fragments then
      continue their way through the stack.
      
      Helper invocation occurs in the POSTROUTING hook, at which point only
      the original fragments are available. The result of this is that
      fragmented packets are never passed to helpers.
      
      This patch improves the situation in the following way:
      
      - If a reassembled packet belongs to a connection that has a helper
        assigned, the reassembled packet is passed through the stack instead
        of the original fragments.
      
      - During defragmentation, the largest received fragment size is stored.
        On output, the packet is refragmented if required. If the largest
        received fragment size exceeds the outgoing MTU, a "packet too big"
        message is generated, thus behaving as if the original fragments
        were passed through the stack from an outside point of view.
      
      - The ipv6_helper() hook function can't receive fragments anymore for
        connections using a helper, so it is switched to use ipv6_skip_exthdr()
        instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
        reassembled packets are passed to connection tracking helpers.
      
      The result of this is that we can properly track fragmented packets, but
      still generate ICMPv6 Packet too big messages if we would have before.
      
      This patch is also required as a precondition for IPv6 NAT, where NAT
      helpers might enlarge packets up to a point that they require
      fragmentation. In that case we can't generate Packet too big messages
      since the proper MTU can't be calculated in all cases (f.i. when
      changing textual representation of a variable amount of addresses),
      so the packet is transparently fragmented iff the original packet or
      fragments would have fit the outgoing MTU.
      
      IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      4cdd3408
  35. 07 8月, 2012 1 次提交