1. 18 10月, 2017 1 次提交
  2. 07 10月, 2017 1 次提交
  3. 06 10月, 2017 1 次提交
  4. 30 8月, 2017 1 次提交
    • D
      ipv6: Use rt6i_idev index for echo replies to a local address · 1b70d792
      David Ahern 提交于
      Tariq repored local pings to linklocal address is failing:
      $ ifconfig ens8
      ens8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
              inet 11.141.16.6  netmask 255.255.0.0  broadcast 11.141.255.255
              inet6 fe80::7efe:90ff:fecb:7502  prefixlen 64  scopeid 0x20<link>
              ether 7c:fe:90:cb:75:02  txqueuelen 1000  (Ethernet)
              RX packets 12  bytes 1164 (1.1 KiB)
              RX errors 0  dropped 0  overruns 0  frame 0
              TX packets 30  bytes 2484 (2.4 KiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
      
      $  /bin/ping6 -c 3 fe80::7efe:90ff:fecb:7502%ens8
      PING fe80::7efe:90ff:fecb:7502%ens8(fe80::7efe:90ff:fecb:7502) 56 data bytes
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b70d792
  5. 25 8月, 2017 1 次提交
    • J
      ipv6: Compute multipath hash for ICMP errors from offending packet · 23aebdac
      Jakub Sitnicki 提交于
      When forwarding or sending out an ICMPv6 error, look at the embedded
      packet that triggered the error and compute a flow hash over its
      headers.
      
      This let's us route the ICMP error together with the flow it belongs to
      when multipath (ECMP) routing is in use, which in turn makes Path MTU
      Discovery work in ECMP load-balanced or anycast setups (RFC 7690).
      
      Granted, end-hosts behind the ECMP router (aka servers) need to reflect
      the IPv6 Flow Label for PMTUD to work.
      
      The code is organized to be in parallel with ipv4 stack:
      
        ip_multipath_l3_keys -> ip6_multipath_l3_keys
        fib_multipath_hash   -> rt6_multipath_hash
      Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23aebdac
  6. 22 8月, 2017 1 次提交
    • D
      net: ipv6: put host and anycast routes on device with address · 4832c30d
      David Ahern 提交于
      One nagging difference between ipv4 and ipv6 is host routes for ipv6
      addresses are installed using the loopback device or VRF / L3 Master
      device. e.g.,
      
          2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
          local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium
      
      Using the loopback device is convenient -- necessary for local tx, but
      has some nasty side effects, most notably setting the 'lo' device down
      causes all host routes for all local IPv6 address to be removed from the
      FIB and completely breaks IPv6 networking across all interfaces.
      
      This patch puts FIB entries for IPv6 routes against the device. This
      simplifies the routes in the FIB, for example by making dst->dev and
      rt6i_idev->dev the same (a future patch can look at removing the device
      reference taken for rt6i_idev for FIB entries).
      
      When copies are made on FIB lookups, the cloned route has dst->dev
      set to loopback (or the L3 master device). This is needed for the
      local Tx of packets to local addresses.
      
      With fib entries allocated against the real network device, the addrconf
      code that reinserts host routes on admin up of 'lo' is no longer needed.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4832c30d
  7. 15 6月, 2017 1 次提交
    • J
      net: don't global ICMP rate limit packets originating from loopback · 849a44de
      Jesper Dangaard Brouer 提交于
      Florian Weimer seems to have a glibc test-case which requires that
      loopback interfaces does not get ICMP ratelimited.  This was broken by
      commit c0303efe ("net: reduce cycles spend on ICMP replies that
      gets rate limited").
      
      An ICMP response will usually be routed back-out the same incoming
      interface.  Thus, take advantage of this and skip global ICMP
      ratelimit when the incoming device is loopback.  In the unlikely event
      that the outgoing it not loopback, due to strange routing policy
      rules, ICMP rate limiting still works via peer ratelimiting via
      icmpv4_xrlim_allow().  Thus, we should still comply with RFC1812
      (section 4.3.2.8 "Rate Limiting").
      
      This seems to fix the reproducer given by Florian.  While still
      avoiding to perform expensive and unneeded outgoing route lookup for
      rate limited packets (in the non-loopback case).
      
      Fixes: c0303efe ("net: reduce cycles spend on ICMP replies that gets rate limited")
      Reported-by: NFlorian Weimer <fweimer@redhat.com>
      Reported-by: N"H.J. Lu" <hjl.tools@gmail.com>
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      849a44de
  8. 10 1月, 2017 2 次提交
  9. 25 12月, 2016 1 次提交
  10. 29 11月, 2016 1 次提交
    • D
      net: handle no dst on skb in icmp6_send · 79dc7e3f
      David Ahern 提交于
      Andrey reported the following while fuzzing the kernel with syzkaller:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 3859 Comm: a.out Not tainted 4.9.0-rc6+ #429
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff8800666d4200 task.stack: ffff880067348000
      RIP: 0010:[<ffffffff833617ec>]  [<ffffffff833617ec>]
      icmp6_send+0x5fc/0x1e30 net/ipv6/icmp.c:451
      RSP: 0018:ffff88006734f2c0  EFLAGS: 00010206
      RAX: ffff8800666d4200 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000018
      RBP: ffff88006734f630 R08: ffff880064138418 R09: 0000000000000003
      R10: dffffc0000000000 R11: 0000000000000005 R12: 0000000000000000
      R13: ffffffff84e7e200 R14: ffff880064138484 R15: ffff8800641383c0
      FS:  00007fb3887a07c0(0000) GS:ffff88006cc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000000 CR3: 000000006b040000 CR4: 00000000000006f0
      Stack:
       ffff8800666d4200 ffff8800666d49f8 ffff8800666d4200 ffffffff84c02460
       ffff8800666d4a1a 1ffff1000ccdaa2f ffff88006734f498 0000000000000046
       ffff88006734f440 ffffffff832f4269 ffff880064ba7456 0000000000000000
      Call Trace:
       [<ffffffff83364ddc>] icmpv6_param_prob+0x2c/0x40 net/ipv6/icmp.c:557
       [<     inline     >] ip6_tlvopt_unknown net/ipv6/exthdrs.c:88
       [<ffffffff83394405>] ip6_parse_tlv+0x555/0x670 net/ipv6/exthdrs.c:157
       [<ffffffff8339a759>] ipv6_parse_hopopts+0x199/0x460 net/ipv6/exthdrs.c:663
       [<ffffffff832ee773>] ipv6_rcv+0xfa3/0x1dc0 net/ipv6/ip6_input.c:191
       ...
      
      icmp6_send / icmpv6_send is invoked for both rx and tx paths. In both
      cases the dst->dev should be preferred for determining the L3 domain
      if the dst has been set on the skb. Fallback to the skb->dev if it has
      not. This covers the case reported here where icmp6_send is invoked on
      Rx before the route lookup.
      
      Fixes: 5d41ce29 ("net: icmp6_send should use dst dev to determine L3 domain")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79dc7e3f
  11. 08 11月, 2016 1 次提交
  12. 05 11月, 2016 1 次提交
    • L
      net: inet: Support UID-based routing in IP protocols. · e2d118a1
      Lorenzo Colitti 提交于
      - Use the UID in routing lookups made by protocol connect() and
        sendmsg() functions.
      - Make sure that routing lookups triggered by incoming packets
        (e.g., Path MTU discovery) take the UID of the socket into
        account.
      - For packets not associated with a userspace socket, (e.g., ping
        replies) use UID 0 inside the user namespace corresponding to
        the network namespace the socket belongs to. This allows
        all namespaces to apply routing and iptables rules to
        kernel-originated traffic in that namespaces by matching UID 0.
        This is better than using the UID of the kernel socket that is
        sending the traffic, because the UID of kernel sockets created
        at namespace creation time (e.g., the per-processor ICMP and
        TCP sockets) is the UID of the user that created the socket,
        which might not be mapped in the namespace.
      
      Tested: compiles allnoconfig, allyesconfig, allmodconfig
      Tested: https://android-review.googlesource.com/253302Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2d118a1
  13. 19 6月, 2016 4 次提交
  14. 16 6月, 2016 1 次提交
    • D
      net: vrf: Handle ipv6 multicast and link-local addresses · 9ff74384
      David Ahern 提交于
      IPv6 multicast and link-local addresses require special handling by the
      VRF driver:
      1. Rather than using the VRF device index and full FIB lookups,
         packets to/from these addresses should use direct FIB lookups based on
         the VRF device table.
      
      2. fail sends/receives on a VRF device to/from a multicast address
         (e.g, make ping6 ff02::1%<vrf> fail)
      
      3. move the setting of the flow oif to the first dst lookup and revert
         the change in icmpv6_echo_reply made in ca254490 ("net: Add VRF
         support to IPv6 stack"). Linklocal/mcast addresses require use of the
         skb->dev.
      
      With this change connections into and out of a VRF enslaved device work
      for multicast and link-local addresses work (icmp, tcp, and udp)
      e.g.,
      
      1. packets into VM with VRF config:
          ping6 -c3 fe80::e0:f9ff:fe1c:b974%br1
          ping6 -c3 ff02::1%br1
      
          ssh -6 fe80::e0:f9ff:fe1c:b974%br1
      
      2. packets going out a VRF enslaved device:
          ping6 -c3 fe80::18f8:83ff:fe4b:7a2e%eth1
          ping6 -c3 ff02::1%eth1
          ssh -6 root@fe80::18f8:83ff:fe4b:7a2e%eth1
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ff74384
  15. 15 6月, 2016 1 次提交
  16. 12 6月, 2016 1 次提交
  17. 07 5月, 2016 1 次提交
  18. 04 5月, 2016 1 次提交
    • W
      ipv6: add new struct ipcm6_cookie · 26879da5
      Wei Wang 提交于
      In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
      variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
      functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
      This is not a good practice and makes it hard to add new parameters.
      This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
      ipv4 and include the above mentioned variables. And we only pass the
      pointer to this structure to corresponding functions. This makes it easier
      to add new parameters in the future and makes the function cleaner.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26879da5
  19. 28 4月, 2016 2 次提交
  20. 05 4月, 2016 1 次提交
    • S
      sock: enable timestamping using control messages · c14ac945
      Soheil Hassas Yeganeh 提交于
      Currently, SOL_TIMESTAMPING can only be enabled using setsockopt.
      This is very costly when users want to sample writes to gather
      tx timestamps.
      
      Add support for enabling SO_TIMESTAMPING via control messages by
      using tsflags added in `struct sockcm_cookie` (added in the previous
      patches in this series) to set the tx_flags of the last skb created in
      a sendmsg. With this patch, the timestamp recording bits in tx_flags
      of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg.
      
      Please note that this is only effective for overriding the recording
      timestamps flags. Users should enable timestamp reporting (e.g.,
      SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using
      socket options and then should ask for SOF_TIMESTAMPING_TX_*
      using control messages per sendmsg to sample timestamps for each
      write.
      Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c14ac945
  21. 04 12月, 2015 1 次提交
    • E
      ipv6: kill sk_dst_lock · 6bd4f355
      Eric Dumazet 提交于
      While testing the np->opt RCU conversion, I found that UDP/IPv6 was
      using a mixture of xchg() and sk_dst_lock to protect concurrent changes
      to sk->sk_dst_cache, leading to possible corruptions and crashes.
      
      ip6_sk_dst_lookup_flow() uses sk_dst_check() anyway, so the simplest
      way to fix the mess is to remove sk_dst_lock completely, as we did for
      IPv4.
      
      __ip6_dst_store() and ip6_dst_store() share same implementation.
      
      sk_setup_caps() being called with socket lock being held or not,
      we have to use sk_dst_set() instead of __sk_dst_set()
      
      Note that I had to move the "np->dst_cookie = rt6_get_cookie(rt);"
      in ip6_dst_store() before the sk_setup_caps(sk, dst) call.
      
      This is because ip6_dst_store() can be called from process context,
      without any lock held.
      
      As soon as the dst is installed in sk->sk_dst_cache, dst can be freed
      from another cpu doing a concurrent ip6_dst_store()
      
      Doing the dst dereference before doing the install is needed to make
      sure no use after free would trigger.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd4f355
  22. 27 10月, 2015 1 次提交
  23. 13 10月, 2015 1 次提交
  24. 01 8月, 2015 1 次提交
    • R
      ipv6: change ipv6_stub_impl.ipv6_dst_lookup to take net argument · 343d60aa
      Roopa Prabhu 提交于
      This patch adds net argument to ipv6_stub_impl.ipv6_dst_lookup
      for use cases where sk is not available (like mpls).
      sk appears to be needed to get the namespace 'net' and is optional
      otherwise. This patch series changes ipv6_stub_impl.ipv6_dst_lookup
      to take net argument. sk remains optional.
      
      All callers of ipv6_stub_impl.ipv6_dst_lookup have been modified
      to pass net. I have modified them to use already available
      'net' in the scope of the call. I can change them to
      sock_net(sk) to avoid any unintended change in behaviour if sock
      namespace is different. They dont seem to be from code inspection.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      343d60aa
  25. 26 5月, 2015 2 次提交
  26. 01 4月, 2015 1 次提交
  27. 15 1月, 2015 1 次提交
  28. 24 11月, 2014 1 次提交
  29. 19 11月, 2014 1 次提交
  30. 12 11月, 2014 1 次提交
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1
  31. 24 10月, 2014 1 次提交
  32. 08 10月, 2014 1 次提交
  33. 24 9月, 2014 1 次提交
    • E
      icmp: add a global rate limitation · 4cdf507d
      Eric Dumazet 提交于
      Current ICMP rate limiting uses inetpeer cache, which is an RBL tree
      protected by a lock, meaning that hosts can be stuck hard if all cpus
      want to check ICMP limits.
      
      When say a DNS or NTP server process is restarted, inetpeer tree grows
      quick and machine comes to its knees.
      
      iptables can not help because the bottleneck happens before ICMP
      messages are even cooked and sent.
      
      This patch adds a new global limitation, using a token bucket filter,
      controlled by two new sysctl :
      
      icmp_msgs_per_sec - INTEGER
          Limit maximal number of ICMP packets sent per second from this host.
          Only messages whose type matches icmp_ratemask are
          controlled by this limit.
          Default: 1000
      
      icmp_msgs_burst - INTEGER
          icmp_msgs_per_sec controls number of ICMP packets sent per second,
          while icmp_msgs_burst controls the burst size of these packets.
          Default: 50
      
      Note that if we really want to send millions of ICMP messages per
      second, we might extend idea and infra added in commit 04ca6973
      ("ip: make IP identifiers less predictable") :
      add a token bucket in the ip_idents hash and no longer rely on inetpeer.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cdf507d
  34. 25 8月, 2014 1 次提交
    • I
      ipv6: White-space cleansing : Line Layouts · 67ba4152
      Ian Morris 提交于
      This patch makes no changes to the logic of the code but simply addresses
      coding style issues as detected by checkpatch.
      
      Both objdump and diff -w show no differences.
      
      A number of items are addressed in this patch:
      * Multiple spaces converted to tabs
      * Spaces before tabs removed.
      * Spaces in pointer typing cleansed (char *)foo etc.
      * Remove space after sizeof
      * Ensure spacing around comparators such as if statements.
      Signed-off-by: NIan Morris <ipm@chirality.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67ba4152