1. 24 8月, 2017 1 次提交
    • M
      tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg · 98aaa913
      Mike Maloney 提交于
      When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
      timestamp corresponding to the highest sequence number data returned.
      
      Previously the skb->tstamp is overwritten when a TCP packet is placed
      in the out of order queue.  While the packet is in the ooo queue, save the
      timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
      options which are only used on the tx path, and a previously unused 4
      byte hole.
      
      When skbs are coalesced either in the sk_receive_queue or the
      out_of_order_queue always choose the timestamp of the appended skb to
      maintain the invariant of returning the timestamp of the last byte in
      the recvmsg buffer.
      Signed-off-by: NMike Maloney <maloney@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98aaa913
  2. 22 8月, 2017 1 次提交
    • D
      net: ipv6: put host and anycast routes on device with address · 4832c30d
      David Ahern 提交于
      One nagging difference between ipv4 and ipv6 is host routes for ipv6
      addresses are installed using the loopback device or VRF / L3 Master
      device. e.g.,
      
          2001:db8:1::/120 dev veth0 proto kernel metric 256 pref medium
          local 2001:db8:1::1 dev lo table local proto kernel metric 0 pref medium
      
      Using the loopback device is convenient -- necessary for local tx, but
      has some nasty side effects, most notably setting the 'lo' device down
      causes all host routes for all local IPv6 address to be removed from the
      FIB and completely breaks IPv6 networking across all interfaces.
      
      This patch puts FIB entries for IPv6 routes against the device. This
      simplifies the routes in the FIB, for example by making dst->dev and
      rt6i_idev->dev the same (a future patch can look at removing the device
      reference taken for rt6i_idev for FIB entries).
      
      When copies are made on FIB lookups, the cloned route has dst->dev
      set to loopback (or the L3 master device). This is needed for the
      local Tx of packets to local addresses.
      
      With fib entries allocated against the real network device, the addrconf
      code that reinserts host routes on admin up of 'lo' is no longer needed.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4832c30d
  3. 21 8月, 2017 1 次提交
    • W
      ipv6: repair fib6 tree in failure case · 348a4002
      Wei Wang 提交于
      In fib6_add(), it is possible that fib6_add_1() picks an intermediate
      node and sets the node's fn->leaf to NULL in order to add this new
      route. However, if fib6_add_rt2node() fails to add the new
      route for some reason, fn->leaf will be left as NULL and could
      potentially cause crash when fn->leaf is accessed in fib6_locate().
      This patch makes sure fib6_repair_tree() is called to properly repair
      fn->leaf in the above failure case.
      
      Here is the syzkaller reported general protection fault in fib6_locate:
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 0 PID: 40937 Comm: syz-executor3 Not tainted
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d7d64100 ti: ffff8801d01a0000 task.ti: ffff8801d01a0000
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] __ipv6_prefix_equal64_half include/net/ipv6.h:475 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] ipv6_prefix_equal include/net/ipv6.h:492 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate_1 net/ipv6/ip6_fib.c:1210 [inline]
      RIP: 0010:[<ffffffff82a3e0e1>]  [<ffffffff82a3e0e1>] fib6_locate+0x281/0x3c0 net/ipv6/ip6_fib.c:1233
      RSP: 0018:ffff8801d01a36a8  EFLAGS: 00010202
      RAX: 0000000000000020 RBX: ffff8801bc790e00 RCX: ffffc90002983000
      RDX: 0000000000001219 RSI: ffff8801d01a37a0 RDI: 0000000000000100
      RBP: ffff8801d01a36f0 R08: 00000000000000ff R09: 0000000000000000
      R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001
      R13: dffffc0000000000 R14: ffff8801d01a37a0 R15: 0000000000000000
      FS:  00007f6afd68c700(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004c6340 CR3: 00000000ba41f000 CR4: 00000000001426f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Stack:
       ffff8801d01a37a8 ffff8801d01a3780 ffffed003a0346f5 0000000c82a23ea0
       ffff8800b7bd7700 ffff8801d01a3780 ffff8800b6a1c940 ffffffff82a23ea0
       ffff8801d01a3920 ffff8801d01a3748 ffffffff82a223d6 ffff8801d7d64988
      Call Trace:
       [<ffffffff82a223d6>] ip6_route_del+0x106/0x570 net/ipv6/route.c:2109
       [<ffffffff82a23f9d>] inet6_rtm_delroute+0xfd/0x100 net/ipv6/route.c:3075
       [<ffffffff82621359>] rtnetlink_rcv_msg+0x549/0x7a0 net/core/rtnetlink.c:3450
       [<ffffffff8274c1d1>] netlink_rcv_skb+0x141/0x370 net/netlink/af_netlink.c:2281
       [<ffffffff82613ddf>] rtnetlink_rcv+0x2f/0x40 net/core/rtnetlink.c:3456
       [<ffffffff8274ad38>] netlink_unicast_kernel net/netlink/af_netlink.c:1206 [inline]
       [<ffffffff8274ad38>] netlink_unicast+0x518/0x750 net/netlink/af_netlink.c:1232
       [<ffffffff8274b83e>] netlink_sendmsg+0x8ce/0xc30 net/netlink/af_netlink.c:1778
       [<ffffffff82564aff>] sock_sendmsg_nosec net/socket.c:609 [inline]
       [<ffffffff82564aff>] sock_sendmsg+0xcf/0x110 net/socket.c:619
       [<ffffffff82564d62>] sock_write_iter+0x222/0x3a0 net/socket.c:834
       [<ffffffff8178523d>] new_sync_write+0x1dd/0x2b0 fs/read_write.c:478
       [<ffffffff817853f4>] __vfs_write+0xe4/0x110 fs/read_write.c:491
       [<ffffffff81786c38>] vfs_write+0x178/0x4b0 fs/read_write.c:538
       [<ffffffff817892a9>] SYSC_write fs/read_write.c:585 [inline]
       [<ffffffff817892a9>] SyS_write+0xd9/0x1b0 fs/read_write.c:577
       [<ffffffff82c71e32>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      Note: there is no "Fixes" tag as this seems to be a bug introduced
      very early.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      348a4002
  4. 19 8月, 2017 3 次提交
    • W
      ipv6: reset fn->rr_ptr when replacing route · 383143f3
      Wei Wang 提交于
      syzcaller reported the following use-after-free issue in rt6_select():
      BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
      BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
      Read of size 4 by task syz-executor1/439628
      CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
       0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
       ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
       ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
      Call Trace:
       [<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
       [<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
      sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
      Use struct sctp_sack_info instead
       [<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
       [<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
       [<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
       [<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
       [<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
       [<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
       [<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
       [<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
       [<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
       [<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
       [<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
       [<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
       [<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
       [<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
       [<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
       [<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
       [<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
       [<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
       [<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
       [<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
       [<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
       [<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
      Object at ffff8800bc699380, in cache ip6_dst_cache size: 384
      
      The root cause of it is that in fib6_add_rt2node(), when it replaces an
      existing route with the new one, it does not update fn->rr_ptr.
      This commit resets fn->rr_ptr to NULL when it points to a route which is
      replaced in fib6_add_rt2node().
      
      Fixes: 27596472 ("ipv6: fix ECMP route replacement")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      383143f3
    • M
      datagram: When peeking datagrams with offset < 0 don't skip empty skbs · a0917e0b
      Matthew Dawson 提交于
      Due to commit e6afc8ac ("udp: remove
      headers from UDP packets before queueing"), when udp packets are being
      peeked the requested extra offset is always 0 as there is no need to skip
      the udp header.  However, when the offset is 0 and the next skb is
      of length 0, it is only returned once.  The behaviour can be seen with
      the following python script:
      
      from socket import *;
      f=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      g=socket(AF_INET6, SOCK_DGRAM | SOCK_NONBLOCK, 0);
      f.bind(('::', 0));
      addr=('::1', f.getsockname()[1]);
      g.sendto(b'', addr)
      g.sendto(b'b', addr)
      print(f.recvfrom(10, MSG_PEEK));
      print(f.recvfrom(10, MSG_PEEK));
      
      Where the expected output should be the empty string twice.
      
      Instead, make sk_peek_offset return negative values, and pass those values
      to __skb_try_recv_datagram/__skb_try_recv_from_queue.  If the passed offset
      to __skb_try_recv_from_queue is negative, the checked skb is never skipped.
      __skb_try_recv_from_queue will then ensure the offset is reset back to 0
      if a peek is requested without an offset, unless no packets are found.
      
      Also simplify the if condition in __skb_try_recv_from_queue.  If _off is
      greater then 0, and off is greater then or equal to skb->len, then
      (_off || skb->len) must always be true assuming skb->len >= 0 is always
      true.
      
      Also remove a redundant check around a call to sk_peek_offset in af_unix.c,
      as it double checked if MSG_PEEK was set in the flags.
      
      V2:
       - Moved the negative fixup into __skb_try_recv_from_queue, and remove now
      redundant checks
       - Fix peeking in udp{,v6}_recvmsg to report the right value when the
      offset is 0
      
      V3:
       - Marked new branch in __skb_try_recv_from_queue as unlikely.
      Signed-off-by: NMatthew Dawson <matthew@mjdsystems.ca>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0917e0b
    • A
      ipv6: fix false-postive maybe-uninitialized warning · 401481e0
      Arnd Bergmann 提交于
      Adding a lock around one of the assignments prevents gcc from
      tracking the state of the local 'fibmatch' variable, so it can no
      longer prove that 'dst' is always initialized, leading to a bogus
      warning:
      
      net/ipv6/route.c: In function 'inet6_rtm_getroute':
      net/ipv6/route.c:3659:2: error: 'dst' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      This moves the other assignment into the same lock to shut up the
      warning.
      
      Fixes: 121622db ("ipv6: route: make rtm_getroute not assume rtnl is locked")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      401481e0
  5. 17 8月, 2017 1 次提交
  6. 16 8月, 2017 4 次提交
    • F
      e3a22b7f
    • F
      ipv6: route: make rtm_getroute not assume rtnl is locked · 121622db
      Florian Westphal 提交于
      __dev_get_by_index assumes RTNL is held, use _rcu version instead.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      121622db
    • E
      ipv6: fix NULL dereference in ip6_route_dev_notify() · 12d94a80
      Eric Dumazet 提交于
      Based on a syzkaller report [1], I found that a per cpu allocation
      failure in snmp6_alloc_dev() would then lead to NULL dereference in
      ip6_route_dev_notify().
      
      It seems this is a very old bug, thus no Fixes tag in this submission.
      
      Let's add in6_dev_put_clear() helper, as we will probably use
      it elsewhere (once available/present in net-next)
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17294 Comm: syz-executor6 Not tainted 4.13.0-rc2+ #10
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff88019f456680 task.stack: ffff8801c6e58000
      RIP: 0010:__read_once_size include/linux/compiler.h:250 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178
      RSP: 0018:ffff8801c6e5f1b0 EFLAGS: 00010202
      RAX: 0000000000000037 RBX: dffffc0000000000 RCX: ffffc90005d25000
      RDX: ffff8801c6e5f218 RSI: ffffffff82342bbf RDI: 0000000000000001
      RBP: ffff8801c6e5f240 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 1ffff10038dcbe37
      R13: 0000000000000006 R14: 0000000000000001 R15: 00000000000001b8
      FS:  00007f21e0429700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001ddbc22000 CR3: 00000001d632b000 CR4: 00000000001426e0
      DR0: 0000000020000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
       in6_dev_put include/net/addrconf.h:335 [inline]
       ip6_route_dev_notify+0x1c9/0x4a0 net/ipv6/route.c:3732
       notifier_call_chain+0x136/0x2c0 kernel/notifier.c:93
       __raw_notifier_call_chain kernel/notifier.c:394 [inline]
       raw_notifier_call_chain+0x2d/0x40 kernel/notifier.c:401
       call_netdevice_notifiers_info+0x51/0x90 net/core/dev.c:1678
       call_netdevice_notifiers net/core/dev.c:1694 [inline]
       rollback_registered_many+0x91c/0xe80 net/core/dev.c:7107
       rollback_registered+0x1be/0x3c0 net/core/dev.c:7149
       register_netdevice+0xbcd/0xee0 net/core/dev.c:7587
       register_netdev+0x1a/0x30 net/core/dev.c:7669
       loopback_net_init+0x76/0x160 drivers/net/loopback.c:214
       ops_init+0x10a/0x570 net/core/net_namespace.c:118
       setup_net+0x313/0x710 net/core/net_namespace.c:294
       copy_net_ns+0x27c/0x580 net/core/net_namespace.c:418
       create_new_namespaces+0x425/0x880 kernel/nsproxy.c:107
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:206
       SYSC_unshare kernel/fork.c:2347 [inline]
       SyS_unshare+0x653/0xfa0 kernel/fork.c:2297
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512c9
      RSP: 002b:00007f21e0428c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000110
      RAX: ffffffffffffffda RBX: 0000000000718150 RCX: 00000000004512c9
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000062020200
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b973d
      R13: 00000000ffffffff R14: 000000002001d000 R15: 00000000000002dd
      Code: 50 2b 34 82 c7 00 f1 f1 f1 f1 c7 40 04 04 f2 f2 f2 c7 40 08 f3 f3
      f3 f3 e8 a1 43 39 ff 4c 89 f8 48 8b 95 70 ff ff ff 48 c1 e8 03 <0f> b6
      0c 18 4c 89 f8 83 e0 07 83 c0 03 38 c8 7c 08 84 c9 0f 85
      RIP: __read_once_size include/linux/compiler.h:250 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: atomic_read arch/x86/include/asm/atomic.h:26 [inline] RSP:
      ffff8801c6e5f1b0
      RIP: refcount_sub_and_test+0x7d/0x1b0 lib/refcount.c:178 RSP:
      ffff8801c6e5f1b0
      ---[ end trace e441d046c6410d31 ]---
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12d94a80
    • I
      ipv6: fib: Provide offload indication using nexthop flags · fe400799
      Ido Schimmel 提交于
      IPv6 routes currently lack nexthop flags as in IPv4. This has several
      implications.
      
      In the forwarding path, it requires us to check the carrier state of the
      nexthop device and potentially ignore a linkdown route, instead of
      checking for RTNH_F_LINKDOWN.
      
      It also requires capable drivers to use the user facing IPv6-specific
      route flags to provide offload indication, instead of using the nexthop
      flags as in IPv4.
      
      Add nexthop flags to IPv6 routes in the 40 bytes hole and use it to
      provide offload indication instead of the RTF_OFFLOAD flag, which is
      removed while it's still not part of any official kernel release.
      
      In the near future we would like to use the field for the
      RTNH_F_{LINKDOWN,DEAD} flags, but this change is more involved and might
      not be ready in time for the current cycle.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe400799
  7. 15 8月, 2017 2 次提交
  8. 11 8月, 2017 2 次提交
    • L
      net: xfrm: support setting an output mark. · 077fbac4
      Lorenzo Colitti 提交于
      On systems that use mark-based routing it may be necessary for
      routing lookups to use marks in order for packets to be routed
      correctly. An example of such a system is Android, which uses
      socket marks to route packets via different networks.
      
      Currently, routing lookups in tunnel mode always use a mark of
      zero, making routing incorrect on such systems.
      
      This patch adds a new output_mark element to the xfrm state and
      a corresponding XFRMA_OUTPUT_MARK netlink attribute. The output
      mark differs from the existing xfrm mark in two ways:
      
      1. The xfrm mark is used to match xfrm policies and states, while
         the xfrm output mark is used to set the mark (and influence
         the routing) of the packets emitted by those states.
      2. The existing mark is constrained to be a subset of the bits of
         the originating socket or transformed packet, but the output
         mark is arbitrary and depends only on the state.
      
      The use of a separate mark provides additional flexibility. For
      example:
      
      - A packet subject to two transforms (e.g., transport mode inside
        tunnel mode) can have two different output marks applied to it,
        one for the transport mode SA and one for the tunnel mode SA.
      - On a system where socket marks determine routing, the packets
        emitted by an IPsec tunnel can be routed based on a mark that
        is determined by the tunnel, not by the marks of the
        unencrypted packets.
      - Support for setting the output marks can be introduced without
        breaking any existing setups that employ both mark-based
        routing and xfrm tunnel mode. Simply changing the code to use
        the xfrm mark for routing output packets could xfrm mark could
        change behaviour in a way that breaks these setups.
      
      If the output mark is unspecified or set to zero, the mark is not
      set or changed.
      
      Tested: make allyesconfig; make -j64
      Tested: https://android-review.googlesource.com/452776Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      077fbac4
    • W
      udp: consistently apply ufo or fragmentation · 85f1bd9a
      Willem de Bruijn 提交于
      When iteratively building a UDP datagram with MSG_MORE and that
      datagram exceeds MTU, consistently choose UFO or fragmentation.
      
      Once skb_is_gso, always apply ufo. Conversely, once a datagram is
      split across multiple skbs, do not consider ufo.
      
      Sendpage already maintains the first invariant, only add the second.
      IPv6 does not have a sendpage implementation to modify.
      
      A gso skb must have a partial checksum, do not follow sk_no_check_tx
      in udp_send_skb.
      
      Found by syzkaller.
      
      Fixes: e89e9cf5 ("[IPv4/IPv6]: UFO Scatter-gather approach")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      85f1bd9a
  9. 10 8月, 2017 2 次提交
  10. 09 8月, 2017 2 次提交
    • V
      net: ipv6: avoid overhead when no custom FIB rules are installed · feca7d8c
      Vincent Bernat 提交于
      If the user hasn't installed any custom rules, don't go through the
      whole FIB rules layer. This is pretty similar to f4530fa5 (ipv4:
      Avoid overhead when no custom FIB rules are installed).
      
      Using a micro-benchmark module [1], timing ip6_route_output() with
      get_cycles(), with 40,000 routes in the main routing table, before this
      patch:
      
          min=606 max=12911 count=627 average=1959 95th=4903 90th=3747 50th=1602 mad=821
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            600 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                         199
            880 │▒▒▒░░░░░░░░░░░░░░░░                                      43
           1160 │▒▒▒░░░░░░░░░░░░░░░░░░░░                                  48
           1440 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░                               43
           1720 │▒▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░                          59
           2000 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                      50
           2280 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    26
           2560 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                  31
           2840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               28
           3120 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              17
           3400 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             17
           3680 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3960 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           11
           4240 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            6
           4520 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           6
           4800 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           9
      
      After:
      
          min=544 max=11687 count=627 average=1776 95th=4546 90th=3585 50th=1227 mad=565
          table=254 avgdepth=21.8 maxdepth=39
          value │                         ┊                            count
            540 │▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒                                        201
            800 │▒▒▒▒▒░░░░░░░░░░░░░░░░                                    63
           1060 │▒▒▒▒▒░░░░░░░░░░░░░░░░░░░░░                               68
           1320 │▒▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░                            39
           1580 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                         32
           1840 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                       32
           2100 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                    34
           2360 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░                 33
           2620 │▒▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░               26
           2880 │▒░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              22
           3140 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░              9
           3400 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             8
           3660 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░             9
           3920 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░            8
           4180 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
           4440 │░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░           8
      
      At the frequency of the host during the bench (~ 3.7 GHz), this is
      about a 100 ns difference on the median value.
      
      A next step would be to collapse local and main tables, as in
      0ddcf43d (ipv4: FIB Local/MAIN table collapse).
      
      [1]: https://github.com/vincentbernat/network-lab/blob/master/lab-routes-ipv6/kbench_mod.cSigned-off-by: NVincent Bernat <vincent@bernat.im>
      Reviewed-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      feca7d8c
    • W
      net: avoid skb_warn_bad_offload false positives on UFO · 8d63bee6
      Willem de Bruijn 提交于
      skb_warn_bad_offload triggers a warning when an skb enters the GSO
      stack at __skb_gso_segment that does not have CHECKSUM_PARTIAL
      checksum offload set.
      
      Commit b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      observed that SKB_GSO_DODGY producers can trigger the check and
      that passing those packets through the GSO handlers will fix it
      up. But, the software UFO handler will set ip_summed to
      CHECKSUM_NONE.
      
      When __skb_gso_segment is called from the receive path, this
      triggers the warning again.
      
      Make UFO set CHECKSUM_UNNECESSARY instead of CHECKSUM_NONE. On
      Tx these two are equivalent. On Rx, this better matches the
      skb state (checksum computed), as CHECKSUM_NONE here means no
      checksum computed.
      
      See also this thread for context:
      http://patchwork.ozlabs.org/patch/799015/
      
      Fixes: b2504a5d ("net: reduce skb_warn_bad_offload() noise")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d63bee6
  11. 08 8月, 2017 8 次提交
  12. 04 8月, 2017 12 次提交
  13. 02 8月, 2017 1 次提交