1. 17 8月, 2017 1 次提交
  2. 16 8月, 2017 4 次提交
    • F
      e3a22b7f
    • F
      ipv6: route: make rtm_getroute not assume rtnl is locked · 121622db
      Florian Westphal 提交于
      __dev_get_by_index assumes RTNL is held, use _rcu version instead.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      121622db
    • E
      ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80
      Eric Dumazet 提交于
      Based on a syzkaller report [1], I found that a per cpu allocation
      failure in snmp6_alloc_dev() would then lead to NULL dereference in
      ip6_route_dev_notify().
      
      It seems this is a very old bug, thus no Fixes tag in this submission.
      
      Let's add in6_dev_put_clear() helper, as we will probably use
      it elsewhere (once available/present in net-next)
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff88019f456680 task.stack: ffff8801c6e58000
      RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
      RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
      RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
      RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
      RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
      R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
      FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
      DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
       in6_dev_put include/net/addrconf.h:335 [inline]
       ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
       call_netdevice_notifiers net/core/dev.c:1694 [inline]
       rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
       rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
       register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
       register_netdev+0x1a/0x30 net/core/dev.c:7669
       loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
       ops_init+0x10a/0x570 net/core/net_namespace.c:118
       setup_net+0x313/0x710 net/core/net_namespace.c:294
       copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
       create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
       SYSC_unshare kernel/fork.c:2347 [inline]
       SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512c9
      RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
      R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
      Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
      f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
      0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
      RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
      ffff8801c6e5f1b0
      ---[ end trace e441d046c6410d31 ]---
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d94a80
    • I
      ipv6: fib: Provide offload indication using nexthop flags · fe400799
      Ido Schimmel 提交于
      IPv6 routes currently lack nexthop flags as in IPv4. This has several
      implications.
      
      In the forwarding path, it requires us to check the carrier state of the
      nexthop device and potentially ignore a linkdown route, instead of
      checking for RTNH_F_LINKDOWN.
      
      It also requires capable drivers to use the user facing IPv6-specific
      route flags to provide offload indication, instead of using the nexthop
      flags as in IPv4.
      
      Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
      provide offload indication instead of the RTF_OFFLOAD flag, which is
      removed while it's still not part of any official kernel release.
      
      In the near future we would like to use the field for the
      RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
      not be ready in time for the current cycle.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe400799
  3. 15 8月, 2017 2 次提交
  4. 11 8月, 2017 1 次提交
  5. 10 8月, 2017 2 次提交
  6. 09 8月, 2017 2 次提交
    • V
      net: ipv6: avoid overhead when no custom FIB rules are installed · feca7d8c
      Vincent Bernat 提交于
      If the user hasn't installed any custom rules, don't go through the
      whole FIB rules layer. This is pretty similar to f4530fa5 (ipv4:
      Avoid overhead when no custom FIB rules are installed).
      
      Using a micro-benchmark module [1], timing ip6_route_output() with
      get_cycles(), with 40,000 routes in the main routing table, before this
      patch:
      
          min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                         199
            880 │▒▒▒░░░░░░░░░░░░░░░░                                      43
           1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░                                  48
           1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░                               43
           1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░                          59
           2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                      50
           2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    26
           2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                  31
           2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               28
           3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              17
           3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             17
           3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           11
           4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            6
           4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           6
           4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           9
      
      After:
      
          min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                        201
            800 │▒▒▒▒▒░░░░░░░░░░░░░░░░                                    63
           1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░                               68
           1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░                            39
           1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                         32
           1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                       32
           2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    34
           2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                 33
           2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               26
           2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              22
           3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              9
           3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             9
           3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            8
           4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
           4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
      
      At the frequency of the host during the bench (~ 3.7 GHz), this is
      about a 100 ns difference on the median value.
      
      A next step would be to collapse local and main tables, as in
      0ddcf43d (ipv4: FIB Local/MAIN table collapse).
      
      [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.cSigned-off-by: NVincent Bernat <vincent@bernat.im>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feca7d8c
    • W
      net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6
      Willem de Bruijn 提交于
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d63bee6
  7. 08 8月, 2017 8 次提交
  8. 04 8月, 2017 12 次提交
  9. 02 8月, 2017 1 次提交
  10. 01 8月, 2017 3 次提交
    • P
      udp6: fix jumbogram reception · cb891fa6
      Paolo Abeni 提交于
      Since commit 67a51780 ("ipv6: udp: leverage scratch area
      helpers") udp6_recvmsg() read the skb len from the scratch area,
      to avoid a cache miss.
      But the UDP6 rx path support RFC 2675 UDPv6 jumbograms, and their
      length exceeds the 16 bits available in the scratch area. As a side
      effect the length returned by recvmsg() is:
      <ingress datagram len> % (1<<16)
      
      This commit addresses the issue allocating one more bit in the
      IP6CB flags field and setting it for incoming jumbograms.
      Such field is still in the first cacheline, so at recvmsg()
      time we can check it and fallback to access skb->len if
      required, without a measurable overhead.
      
      Fixes: 67a51780 ("ipv6: udp: leverage scratch area helpers")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb891fa6
    • J
      ipv6: Avoid going through ->sk_net to access the netns · 1f139ed9
      Jakub Sitnicki 提交于
      There is no need to go through sk->sk_net to access the net namespace
      and its sysctl variables because we allocate the sock and initialize
      sk_net just a few lines earlier in the same routine.
      Signed-off-by: NJakub Sitnicki <jkbs@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f139ed9
    • F
      tcp: remove prequeue support · e7942d06
      Florian Westphal 提交于
      prequeue is a tcp receive optimization that moves part of rx processing
      from bh to process context.
      
      This only works if the socket being processed belongs to a process that
      is blocked in recv on that socket.
      
      In practice, this doesn't happen anymore that often because nowadays
      servers tend to use an event driven (epoll) model.
      
      Even normal client applications (web browsers) commonly use many tcp
      connections in parallel.
      
      This has measureable impact only in netperf (which uses plain recv and
      thus allows prequeue use) from host to locally running vm (~4%), however,
      there were no changes when using netperf between two physical hosts with
      ixgbe interfaces.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7942d06
  11. 30 7月, 2017 1 次提交
    • P
      udp6: fix socket leak on early demux · c9f2c1ae
      Paolo Abeni 提交于
      When an early demuxed packet reaches __udp6_lib_lookup_skb(), the
      sk reference is retrieved and used, but the relevant reference
      count is leaked and the socket destructor is never called.
      Beyond leaking the sk memory, if there are pending UDP packets
      in the receive queue, even the related accounted memory is leaked.
      
      In the long run, this will cause persistent forward allocation errors
      and no UDP skbs (both ipv4 and ipv6) will be able to reach the
      user-space.
      
      Fix this by explicitly accessing the early demux reference before
      the lookup, and properly decreasing the socket reference count
      after usage.
      
      Also drop the skb_steal_sock() in __udp6_lib_lookup_skb(), and
      the now obsoleted comment about "socket cache".
      
      The newly added code is derived from the current ipv4 code for the
      similar path.
      
      v1 -> v2:
        fixed the __udp6_lib_rcv() return code for resubmission,
        as suggested by Eric
      Reported-by: NSam Edwards <CFSworks@gmail.com>
      Reported-by: NMarc Haber <mh+netdev@zugschlus.de>
      Fixes: 5425077d ("net: ipv6: Add early demux handler for UDP unicast")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f2c1ae
  12. 29 7月, 2017 1 次提交
  13. 26 7月, 2017 1 次提交
    • S
      ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment() · afce615a
      Stefano Brivio 提交于
      RFC 2465 defines ipv6IfStatsOutFragFails as:
      
      	"The number of IPv6 datagrams that have been discarded
      	 because they needed to be fragmented at this output
      	 interface but could not be."
      
      The existing implementation, instead, would increase the counter
      twice in case we fail to allocate room for single fragments:
      once for the fragment, once for the datagram.
      
      This didn't look intentional though. In one of the two affected
      affected failure paths, the double increase was simply a result
      of a new 'goto fail' statement, introduced to avoid a skb leak.
      The other path appears to be affected since at least 2.6.12-rc2.
      Reported-by: NSabrina Dubroca <sdubroca@redhat.com>
      Fixes: 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afce615a
  14. 25 7月, 2017 1 次提交