1. 01 8月, 2015 1 次提交
    • R
      ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument · 343d60aa
      Roopa Prabhu 提交于
      This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
      for use cases where sk is not available (like mpls).
      sk appears to be needed to get the namespace 'net' and is optional
      otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
      to take net argument. sk remains optional.
      
      All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
      to pass net. I have modified them to use already available
      'net' in the scope of the call. I can change them to
      sock_net(sk) to avoid any unintended change in behaviour if sock
      namespace is different. They dont seem to be from code inspection.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      343d60aa
  2. 31 7月, 2015 1 次提交
    • H
      net/ipv6: add sysctl option accept_ra_min_hop_limit · 8013d1d7
      Hangbin Liu 提交于
      Commit 6fd99094 ("ipv6: Don't reduce hop limit for an interface")
      disabled accept hop limit from RA if it is smaller than the current hop
      limit for security stuff. But this behavior kind of break the RFC definition.
      
      RFC 4861, 6.3.4.  Processing Received Router Advertisements
         A Router Advertisement field (e.g., Cur Hop Limit, Reachable Time,
         and Retrans Timer) may contain a value denoting that it is
         unspecified.  In such cases, the parameter should be ignored and the
         host should continue using whatever value it is already using.
      
         If the received Cur Hop Limit value is non-zero, the host SHOULD set
         its CurHopLimit variable to the received value.
      
      So add sysctl option accept_ra_min_hop_limit to let user choose the minimum
      hop limit value they can accept from RA. And set default to 1 to meet RFC
      standards.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Acked-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8013d1d7
  3. 30 7月, 2015 1 次提交
  4. 27 7月, 2015 6 次提交
  5. 23 7月, 2015 1 次提交
  6. 22 7月, 2015 2 次提交
  7. 21 7月, 2015 2 次提交
  8. 16 7月, 2015 3 次提交
  9. 14 7月, 2015 1 次提交
    • T
      net: Build IPv6 into kernel by default · de551f2e
      Tom Herbert 提交于
      This patch makes the default to build IPv6 into the kernel. IPv6
      now has significant traction and any remaining vestiges of IPv6
      not being provided parity with IPv4 should be swept away. IPv6 is now
      core to the Internet and kernel.
      
      Points on IPv6 adoption:
      
      - Per Google statistics, IPv6 usage has reached 7% on the Internet
        and continues to exhibit an exponential growth rate
        https://www.google.com/intl/en/ipv6/statistics.html
      - Just a few days ago ARIN officially depleted its IPv4 pool
      - IPv6 only data centers are being successfully built
        (e.g. at Facebook)
      
      This patch changes the IPv6 Kconfig for IPV6. Default for CONFIG_IPV6
      is set to "y" and the text has been updated to reflect the maturity of
      IPv6.
      
      Impact:
      
      Under some circumstances building modules in to kernel might have a
      performance advantage. In my testing, I did notice a very slight
      improvement.
      
      This will obviously increase the size of the kernel image. In my
      configuration I see:
      
      IPv6 as module:
      
         text    data     bss     dec     hex filename
      9703666 1899288  933888 12536842         bf4c0a vmlinux
      
      IPv6 built into kernel
      
        text     data     bss     dec     hex filename
      9436490 1879600  913408 12229498         ba9b7a vmlinux
      
      Which increases text size by ~270K (2.8% increase in size for me). If
      image size is an issue, presumably for a device which does not do IP
      networking (IMO we should be discouraging IPv4-only devices), IPV6 can
      be disabled or still built as a module.
      Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de551f2e
  10. 11 7月, 2015 1 次提交
  11. 10 7月, 2015 4 次提交
  12. 04 7月, 2015 2 次提交
  13. 24 6月, 2015 1 次提交
  14. 19 6月, 2015 1 次提交
    • P
      netfilter: don't pull include/linux/netfilter.h from netns headers · a263653e
      Pablo Neira Ayuso 提交于
      This pulls the full hook netfilter definitions from all those that include
      net_namespace.h.
      
      Instead let's just include the bare minimum required in the new
      linux/netfilter_defs.h file, and use it from the netfilter netns header files.
      
      I also needed to include in.h and in6.h from linux/netfilter.h otherwise we hit
      this compilation error:
      
      In file included from include/linux/netfilter_defs.h:4:0,
                       from include/net/netns/netfilter.h:4,
                       from include/net/net_namespace.h:22,
                       from include/linux/netdevice.h:43,
                       from net/netfilter/nfnetlink_queue_core.c:23:
      include/uapi/linux/netfilter.h:76:17: error: field ‘in’ has incomplete type struct in_addr in;
      
      And also explicit include linux/netfilter.h in several spots.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      a263653e
  15. 16 6月, 2015 1 次提交
  16. 15 6月, 2015 1 次提交
  17. 13 6月, 2015 1 次提交
  18. 12 6月, 2015 4 次提交
    • F
      netfilter: xtables: avoid percpu ruleset duplication · 482cfc31
      Florian Westphal 提交于
      We store the rule blob per (possible) cpu.  Unfortunately this means we can
      waste lot of memory on big smp machines. ipt_entry structure ('rule head')
      is 112 byte, so e.g. with maxcpu=64 one single rule eats
      close to 8k RAM.
      
      Since previous patch made counters percpu it appears there is nothing
      left in the rule blob that needs to be percpu.
      
      On my test system (144 possible cpus, 400k dummy rules) this
      change saves close to 9 Gigabyte of RAM.
      Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      482cfc31
    • F
      netfilter: xtables: use percpu rule counters · 71ae0dff
      Florian Westphal 提交于
      The binary arp/ip/ip6tables ruleset is stored per cpu.
      
      The only reason left as to why we need percpu duplication are the rule
      counters embedded into ipt_entry et al -- since each cpu has its own copy
      of the rules, all counters can be lockless.
      
      The downside is that the more cpus are supported, the more memory is
      required.  Rules are not just duplicated per online cpu but for each
      possible cpu, i.e. if maxcpu is 144, then rule is duplicated 144 times,
      not for the e.g. 64 cores present.
      
      To save some memory and also improve utilization of shared caches it
      would be preferable to only store the rule blob once.
      
      So we first need to separate counters and the rule blob.
      
      Instead of using entry->counters, allocate this percpu and store the
      percpu address in entry->counters.pcnt on CONFIG_SMP.
      
      This change makes no sense as-is; it is merely an intermediate step to
      remove the percpu duplication of the rule set in a followup patch.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Reported-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      71ae0dff
    • B
      netfilter: bridge: forward IPv6 fragmented packets · efb6de9b
      Bernhard Thaler 提交于
      IPv6 fragmented packets are not forwarded on an ethernet bridge
      with netfilter ip6_tables loaded. e.g. steps to reproduce
      
      1) create a simple bridge like this
      
              modprobe br_netfilter
              brctl addbr br0
              brctl addif br0 eth0
              brctl addif br0 eth2
              ifconfig eth0 up
              ifconfig eth2 up
              ifconfig br0 up
      
      2) place a host with an IPv6 address on each side of the bridge
      
              set IPv6 address on host A:
              ip -6 addr add fd01:2345:6789:1::1/64 dev eth0
      
              set IPv6 address on host B:
              ip -6 addr add fd01:2345:6789:1::2/64 dev eth0
      
      3) run a simple ping command on host A with packets > MTU
      
              ping6 -s 4000 fd01:2345:6789:1::2
      
      4) wait some time and run e.g. "ip6tables -t nat -nvL" on the bridge
      
      IPv6 fragmented packets traverse the bridge cleanly until somebody runs.
      "ip6tables -t nat -nvL". As soon as it is run (and netfilter modules are
      loaded) IPv6 fragmented packets do not traverse the bridge any more (you
      see no more responses in ping's output).
      
      After applying this patch IPv6 fragmented packets traverse the bridge
      cleanly in above scenario.
      Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      [pablo@netfilter.org: small changes to br_nf_dev_queue_xmit]
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      efb6de9b
    • B
      netfilter: bridge: detect NAT66 correctly and change MAC address · 72b31f72
      Bernhard Thaler 提交于
      IPv4 iptables allows to REDIRECT/DNAT/SNAT any traffic over a bridge.
      
      e.g. REDIRECT
      $ sysctl -w net.bridge.bridge-nf-call-iptables=1
      $ iptables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
        -j REDIRECT --to-ports 81
      
      This does not work with ip6tables on a bridge in NAT66 scenario
      because the REDIRECT/DNAT/SNAT is not correctly detected.
      
      The bridge pre-routing (finish) netfilter hook has to check for a possible
      redirect and then fix the destination mac address. This allows to use the
      ip6tables rules for local REDIRECT/DNAT/SNAT REDIRECT similar to the IPv4
      iptables version.
      
      e.g. REDIRECT
      $ sysctl -w net.bridge.bridge-nf-call-ip6tables=1
      $ ip6tables -t nat -A PREROUTING -p tcp -m tcp --dport 8080 \
        -j REDIRECT --to-ports 81
      
      This patch makes it possible to use IPv6 NAT66 on a bridge. It was tested
      on a bridge with two interfaces using SNAT/DNAT NAT66 rules.
      Reported-by: NArtie Hamilton <artiemhamilton@yahoo.com>
      Signed-off-by: NSven Eckelmann <sven@open-mesh.com>
      [bernhard.thaler@wvnet.at: rebased, add indirect call to ip6_route_input()]
      [bernhard.thaler@wvnet.at: rebased, split into separate patches]
      Signed-off-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      72b31f72
  19. 11 6月, 2015 1 次提交
  20. 09 6月, 2015 2 次提交
    • J
      ipv6: Fix protocol resubmission · 0243508e
      Josh Hunt 提交于
      UDP encapsulation is broken on IPv6. This is because the logic to resubmit
      the nexthdr is inverted, checking for a ret value > 0 instead of < 0. Also,
      the resubmit label is in the wrong position since we already get the
      nexthdr value when performing decapsulation. In addition the skb pull is no
      longer necessary either.
      
      This changes the return value check to look for < 0, using it for the
      nexthdr on the next iteration, and moves the resubmit label to the proper
      location.
      
      With these changes the v6 code now matches what we do in the v4 ip input
      code wrt resubmitting when decapsulating.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Acked-by: N"Tom Herbert" <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0243508e
    • R
      ipv6: fix possible use after free of dev stats · 27e41fcf
      Robert Shearman 提交于
      The memory pointed to by idev->stats.icmpv6msgdev,
      idev->stats.icmpv6dev and idev->stats.ipv6 can each be used in an RCU
      read context without taking a reference on idev. For example, through
      IP6_*_STATS_* calls in ip6_rcv. These memory blocks are freed without
      waiting for an RCU grace period to elapse. This could lead to the
      memory being written to after it has been freed.
      
      Fix this by using call_rcu to free the memory used for stats, as well
      as idev after an RCU grace period has elapsed.
      Signed-off-by: NRobert Shearman <rshearma@brocade.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27e41fcf
  21. 08 6月, 2015 1 次提交
  22. 07 6月, 2015 2 次提交
    • E
      tcp: remove redundant checks II · 98da81a4
      Eric Dumazet 提交于
      For same reasons than in commit 12e25e10 ("tcp: remove redundant
      checks"), we can remove redundant checks done for timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98da81a4
    • E
      inet: add IP_BIND_ADDRESS_NO_PORT to overcome bind(0) limitations · 90c337da
      Eric Dumazet 提交于
      When an application needs to force a source IP on an active TCP socket
      it has to use bind(IP, port=x).
      
      As most applications do not want to deal with already used ports, x is
      often set to 0, meaning the kernel is in charge to find an available
      port.
      But kernel does not know yet if this socket is going to be a listener or
      be connected.
      It has very limited choices (no full knowledge of final 4-tuple for a
      connect())
      
      With limited ephemeral port range (about 32K ports), it is very easy to
      fill the space.
      
      This patch adds a new SOL_IP socket option, asking kernel to ignore
      the 0 port provided by application in bind(IP, port=0) and only
      remember the given IP address.
      
      The port will be automatically chosen at connect() time, in a way
      that allows sharing a source port as long as the 4-tuples are unique.
      
      This new feature is available for both IPv4 and IPv6 (Thanks Neal)
      
      Tested:
      
      Wrote a test program and checked its behavior on IPv4 and IPv6.
      
      strace(1) shows sequences of bind(IP=127.0.0.2, port=0) followed by
      connect().
      Also getsockname() show that the port is still 0 right after bind()
      but properly allocated after connect().
      
      socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 5
      setsockopt(5, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      connect(5, {sa_family=AF_INET, sin_port=htons(53174), sin_addr=inet_addr("127.0.0.3")}, 16) = 0
      getsockname(5, {sa_family=AF_INET, sin_port=htons(38050), sin_addr=inet_addr("127.0.0.2")}, [16]) = 0
      
      IPv6 test :
      
      socket(PF_INET6, SOCK_STREAM, IPPROTO_IP) = 7
      setsockopt(7, SOL_IP, IP_BIND_ADDRESS_NO_PORT, [1], 4) = 0
      bind(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      connect(7, {sa_family=AF_INET6, sin6_port=htons(57300), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
      getsockname(7, {sa_family=AF_INET6, sin6_port=htons(60964), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
      
      I was able to bind()/connect() a million concurrent IPv4 sockets,
      instead of ~32000 before patch.
      
      lpaa23:~# ulimit -n 1000010
      lpaa23:~# ./bind --connect --num-flows=1000000 &
      1000000 sockets
      
      lpaa23:~# grep TCP /proc/net/sockstat
      TCP: inuse 2000063 orphan 0 tw 47 alloc 2000157 mem 66
      
      Check that a given source port is indeed used by many different
      connections :
      
      lpaa23:~# ss -t src :40000 | head -10
      State      Recv-Q Send-Q   Local Address:Port          Peer Address:Port
      ESTAB      0      0           127.0.0.2:40000         127.0.202.33:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.27.240:44983
      ESTAB      0      0           127.0.0.2:40000           127.2.98.5:44983
      ESTAB      0      0           127.0.0.2:40000        127.0.124.196:44983
      ESTAB      0      0           127.0.0.2:40000         127.2.139.38:44983
      ESTAB      0      0           127.0.0.2:40000          127.1.59.80:44983
      ESTAB      0      0           127.0.0.2:40000          127.3.6.228:44983
      ESTAB      0      0           127.0.0.2:40000          127.0.38.53:44983
      ESTAB      0      0           127.0.0.2:40000         127.1.197.10:44983
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90c337da