1. 20 11月, 2020 1 次提交
  2. 17 11月, 2020 1 次提交
    • G
      ipv6/netfilter: Discard first fragment not including all headers · 9d9e937b
      Georg Kohmann 提交于
      Packets are processed even though the first fragment don't include all
      headers through the upper layer header. This breaks TAHI IPv6 Core
      Conformance Test v6LC.1.3.6.
      
      Referring to RFC8200 SECTION 4.5: "If the first fragment does not include
      all headers through an Upper-Layer header, then that fragment should be
      discarded and an ICMP Parameter Problem, Code 3, message should be sent to
      the source of the fragment, with the Pointer field set to zero."
      
      The fragment needs to be validated the same way it is done in
      commit 2efdaaaf ("IPv6: reply ICMP error if the first fragment don't
      include all headers") for ipv6. Wrap the validation into a common function,
      ipv6_frag_thdr_truncated() to check for truncation in the upper layer
      header. This validation does not fullfill all aspects of RFC 8200,
      section 4.5, but is at the moment sufficient to pass mentioned TAHI test.
      
      In netfilter, utilize the fragment offset returned by find_prev_fhdr() to
      let ipv6_frag_thdr_truncated() start it's traverse from the fragment
      header.
      
      Return 0 to drop the fragment in the netfilter. This is the same behaviour
      as used on other protocol errors in this function, e.g. when
      nf_ct_frag6_queue() returns -EPROTO. The Fragment will later be picked up
      by ipv6_frag_rcv() in reassembly.c. ipv6_frag_rcv() will then send an
      appropriate ICMP Parameter Problem message back to the source.
      
      References commit 2efdaaaf ("IPv6: reply ICMP error if the first
      fragment don't include all headers")
      Signed-off-by: NGeorg Kohmann <geokohma@cisco.com>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Link: https://lore.kernel.org/r/20201111115025.28879-1-geokohma@cisco.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      9d9e937b
  3. 25 7月, 2020 2 次提交
  4. 20 7月, 2020 1 次提交
  5. 29 5月, 2020 4 次提交
  6. 21 5月, 2020 2 次提交
  7. 19 5月, 2020 1 次提交
  8. 25 4月, 2020 1 次提交
  9. 25 2月, 2020 1 次提交
  10. 25 1月, 2020 1 次提交
    • F
      mptcp: do not inherit inet proto ops · e42f1ac6
      Florian Westphal 提交于
      We need to initialise the struct ourselves, else we expose tcp-specific
      callbacks such as tcp_splice_read which will then trigger splat because
      the socket is an mptcp one:
      
      BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
      Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478
      
      CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
      Call Trace:
       tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
       tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
       tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
       tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
       do_splice+0x1259/0x1560 fs/splice.c:1205
      
      To prevent build error with ipv6, add the recv/sendmsg function
      declaration to ipv6.h.  The functions are already accessible "thanks"
      to retpoline related work, but they are currently only made visible
      by socket.c specific INDIRECT_CALLABLE macros.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e42f1ac6
  11. 05 12月, 2019 1 次提交
  12. 02 10月, 2019 1 次提交
  13. 27 9月, 2019 1 次提交
  14. 09 7月, 2019 1 次提交
    • W
      ipv6: elide flowlabel check if no exclusive leases exist · 59c820b2
      Willem de Bruijn 提交于
      Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
      If not set, by default an autogenerated flowlabel is selected.
      
      Explicit flowlabels require a control operation per label plus a
      datapath check on every connection (every datagram if unconnected).
      This is particularly expensive on unconnected sockets multiplexing
      many flows, such as QUIC.
      
      In the common case, where no lease is exclusive, the check can be
      safely elided, as both lease request and check trivially succeed.
      Indeed, autoflowlabel does the same even with exclusive leases.
      
      Elide the check if no process has requested an exclusive lease.
      
      fl6_sock_lookup previously returns either a reference to a lease or
      NULL to denote failure. Modify to return a real error and update
      all callers. On return NULL, they can use the label and will elide
      the atomic_dec in fl6_sock_release.
      
      This is an optimization. Robust applications still have to revert to
      requesting leases if the fast path fails due to an exclusive lease.
      
      Changes RFC->v1:
        - use static_key_false_deferred to rate limit jump label operations
          - call static_key_deferred_flush to stop timers on exit
        - move decrement out of RCU context
        - defer optimization also if opt data is associated with a lease
        - updated all fp6_sock_lookup callers, not just udp
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59c820b2
  15. 02 7月, 2019 1 次提交
  16. 04 6月, 2019 1 次提交
    • E
      net: fix use-after-free in kfree_skb_list · b7034146
      Eric Dumazet 提交于
      syzbot reported nasty use-after-free [1]
      
      Lets remove frag_list field from structs ip_fraglist_iter
      and ip6_fraglist_iter. This seens not needed anyway.
      
      [1] :
      BUG: KASAN: use-after-free in kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
      Read of size 8 at addr ffff888085a3cbc0 by task syz-executor303/8947
      
      CPU: 0 PID: 8947 Comm: syz-executor303 Not tainted 5.2.0-rc2+ #12
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188
       __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       kasan_report+0x12/0x20 mm/kasan/common.c:614
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/generic_report.c:132
       kfree_skb_list+0x5d/0x60 net/core/skbuff.c:706
       ip6_fragment+0x1ef4/0x2680 net/ipv6/ip6_output.c:882
       __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
       ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
       dst_output include/net/dst.h:433 [inline]
       ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
       ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
       ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
       rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
       rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x44add9
      Code: e8 7c e6 ff ff 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b 05 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f826f33bce8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00000000006e7a18 RCX: 000000000044add9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000005
      RBP: 00000000006e7a10 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006e7a1c
      R13: 00007ffcec4f7ebf R14: 00007f826f33c9c0 R15: 20c49ba5e353f7cf
      
      Allocated by task 8947:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_kmalloc mm/kasan/common.c:489 [inline]
       __kasan_kmalloc.constprop.0+0xcf/0xe0 mm/kasan/common.c:462
       kasan_slab_alloc+0xf/0x20 mm/kasan/common.c:497
       slab_post_alloc_hook mm/slab.h:437 [inline]
       slab_alloc_node mm/slab.c:3269 [inline]
       kmem_cache_alloc_node+0x131/0x710 mm/slab.c:3579
       __alloc_skb+0xd5/0x5e0 net/core/skbuff.c:199
       alloc_skb include/linux/skbuff.h:1058 [inline]
       __ip6_append_data.isra.0+0x2a24/0x3640 net/ipv6/ip6_output.c:1519
       ip6_append_data+0x1e5/0x320 net/ipv6/ip6_output.c:1688
       rawv6_sendmsg+0x1467/0x35e0 net/ipv6/raw.c:940
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 8947:
       save_stack+0x23/0x90 mm/kasan/common.c:71
       set_track mm/kasan/common.c:79 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/common.c:451
       kasan_slab_free+0xe/0x10 mm/kasan/common.c:459
       __cache_free mm/slab.c:3432 [inline]
       kmem_cache_free+0x86/0x260 mm/slab.c:3698
       kfree_skbmem net/core/skbuff.c:625 [inline]
       kfree_skbmem+0xc5/0x150 net/core/skbuff.c:619
       __kfree_skb net/core/skbuff.c:682 [inline]
       kfree_skb net/core/skbuff.c:699 [inline]
       kfree_skb+0xf0/0x390 net/core/skbuff.c:693
       kfree_skb_list+0x44/0x60 net/core/skbuff.c:708
       __dev_xmit_skb net/core/dev.c:3551 [inline]
       __dev_queue_xmit+0x3034/0x36b0 net/core/dev.c:3850
       dev_queue_xmit+0x18/0x20 net/core/dev.c:3914
       neigh_direct_output+0x16/0x20 net/core/neighbour.c:1532
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x1034/0x2550 net/ipv6/ip6_output.c:120
       ip6_fragment+0x1ebb/0x2680 net/ipv6/ip6_output.c:863
       __ip6_finish_output+0x577/0xaa0 net/ipv6/ip6_output.c:144
       ip6_finish_output+0x38/0x1f0 net/ipv6/ip6_output.c:156
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0x235/0x7f0 net/ipv6/ip6_output.c:179
       dst_output include/net/dst.h:433 [inline]
       ip6_local_out+0xbb/0x1b0 net/ipv6/output_core.c:179
       ip6_send_skb+0xbb/0x350 net/ipv6/ip6_output.c:1796
       ip6_push_pending_frames+0xc8/0xf0 net/ipv6/ip6_output.c:1816
       rawv6_push_pending_frames net/ipv6/raw.c:617 [inline]
       rawv6_sendmsg+0x2993/0x35e0 net/ipv6/raw.c:947
       inet_sendmsg+0x141/0x5d0 net/ipv4/af_inet.c:802
       sock_sendmsg_nosec net/socket.c:652 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:671
       ___sys_sendmsg+0x803/0x920 net/socket.c:2292
       __sys_sendmsg+0x105/0x1d0 net/socket.c:2330
       __do_sys_sendmsg net/socket.c:2339 [inline]
       __se_sys_sendmsg net/socket.c:2337 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2337
       do_syscall_64+0xfd/0x680 arch/x86/entry/common.c:301
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff888085a3cbc0
       which belongs to the cache skbuff_head_cache of size 224
      The buggy address is located 0 bytes inside of
       224-byte region [ffff888085a3cbc0, ffff888085a3cca0)
      The buggy address belongs to the page:
      page:ffffea0002168f00 refcount:1 mapcount:0 mapping:ffff88821b6f63c0 index:0x0
      flags: 0x1fffc0000000200(slab)
      raw: 01fffc0000000200 ffffea00027bbf88 ffffea0002105b88 ffff88821b6f63c0
      raw: 0000000000000000 ffff888085a3c080 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888085a3ca80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff888085a3cb00: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      >ffff888085a3cb80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff888085a3cc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff888085a3cc80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 0feca619 ("net: ipv6: add skbuff fraglist splitter")
      Fixes: c8b17be0 ("net: ipv4: add skbuff fraglist splitter")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7034146
  17. 31 5月, 2019 3 次提交
    • P
      net: ipv6: split skbuff into fragments transformer · 8a6a1f17
      Pablo Neira Ayuso 提交于
      This patch exposes a new API to refragment a skbuff. This allows you to
      split either a linear skbuff or to force the refragmentation of an
      existing fraglist using a different mtu. The API consists of:
      
      * ip6_frag_init(), that initializes the internal state of the transformer.
      * ip6_frag_next(), that allows you to fetch the next fragment. This function
        internally allocates the skbuff that represents the fragment, it pushes
        the IPv6 header, and it also copies the payload for each fragment.
      
      The ip6_frag_state object stores the internal state of the splitter.
      
      This code has been extracted from ip6_fragment(). Symbols are also
      exported to allow to reuse this iterator from the bridge codepath to
      build its own refragmentation routine by reusing the existing codebase.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a6a1f17
    • P
      net: ipv6: add skbuff fraglist splitter · 0feca619
      Pablo Neira Ayuso 提交于
      This patch adds the skbuff fraglist split iterator. This API provides an
      iterator to transform the fraglist into single skbuff objects, it
      consists of:
      
      * ip6_fraglist_init(), that initializes the internal state of the
        fraglist iterator.
      * ip6_fraglist_prepare(), that restores the IPv6 header on the fragment.
      * ip6_fraglist_next(), that retrieves the fragment from the fraglist and
        updates the internal state of the iterator to point to the next
        fragment in the fraglist.
      
      The ip6_fraglist_iter object stores the internal state of the iterator.
      
      This code has been extracted from ip6_fragment(). Symbols are also
      exported to allow to reuse this iterator from the bridge codepath to
      build its own refragmentation routine by reusing the existing codebase.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0feca619
    • T
      treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152 · 2874c5fd
      Thomas Gleixner 提交于
      Based on 1 normalized pattern(s):
      
        this program is free software you can redistribute it and or modify
        it under the terms of the gnu general public license as published by
        the free software foundation either version 2 of the license or at
        your option any later version
      
      extracted by the scancode license scanner the SPDX license identifier
      
        GPL-2.0-or-later
      
      has been chosen to replace the boilerplate/reference in 3029 file(s).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAllison Randal <allison@lohutok.net>
      Cc: linux-spdx@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2874c5fd
  18. 08 11月, 2018 1 次提交
  19. 11 10月, 2018 1 次提交
  20. 02 8月, 2018 2 次提交
  21. 19 7月, 2018 1 次提交
  22. 18 7月, 2018 1 次提交
    • F
      ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module · 70b095c8
      Florian Westphal 提交于
      IPV6=m
      DEFRAG_IPV6=m
      CONNTRACK=y yields:
      
      net/netfilter/nf_conntrack_proto.o: In function `nf_ct_netns_do_get':
      net/netfilter/nf_conntrack_proto.c:802: undefined reference to `nf_defrag_ipv6_enable'
      net/netfilter/nf_conntrack_proto.o:(.rodata+0x640): undefined reference to `nf_conntrack_l4proto_icmpv6'
      
      Setting DEFRAG_IPV6=y causes undefined references to ip6_rhash_params
      ip6_frag_init and ip6_expire_frag_queue so it would be needed to force
      IPV6=y too.
      
      This patch gets rid of the 'followup linker error' by removing
      the dependency of ipv6.ko symbols from netfilter ipv6 defrag.
      
      Shared code is placed into a header, then used from both.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      70b095c8
  23. 17 7月, 2018 1 次提交
    • H
      ipv6/mcast: init as INCLUDE when join SSM INCLUDE group · c7ea20c9
      Hangbin Liu 提交于
      This an IPv6 version patch of "ipv4/igmp: init group mode as INCLUDE when
      join source group". From RFC3810, part 6.1:
      
         If no per-interface state existed for that
         multicast address before the change (i.e., the change consisted of
         creating a new per-interface record), or if no state exists after the
         change (i.e., the change consisted of deleting a per-interface
         record), then the "non-existent" state is considered to have an
         INCLUDE filter mode and an empty source list.
      
      Which means a new multicast group should start with state IN(). Currently,
      for MLDv2 SSM JOIN_SOURCE_GROUP mode, we first call ipv6_sock_mc_join(),
      then ip6_mc_source(), which will trigger a TO_IN() message instead of
      ALLOW().
      
      The issue was exposed by commit a052517a ("net/multicast: should not
      send source list records when have filter mode change"). Before this change,
      we sent both ALLOW(A) and TO_IN(A). Now, we only send TO_IN(A).
      
      Fix it by adding a new parameter to init group mode. Also add some wrapper
      functions to avoid changing too much code.
      
      v1 -> v2:
      In the first version I only cleared the group change record. But this is not
      enough. Because when a new group join, it will init as EXCLUDE and trigger
      a filter mode change in ip/ip6_mc_add_src(), which will clear all source
      addresses sf_crcount. This will prevent early joined address sending state
      change records if multi source addressed joined at the same time.
      
      In v2 patch, I fixed it by directly initializing the mode to INCLUDE for SSM
      JOIN_SOURCE_GROUP. I also split the original patch into two separated patches
      for IPv4 and IPv6.
      
      There is also a difference between v4 and v6 version. For IPv6, when the
      interface goes down and up, we will send correct state change record with
      unspecified IPv6 address (::) with function ipv6_mc_up(). But after DAD is
      completed, we resend the change record TO_IN() in mld_send_initial_cr().
      Fix it by sending ALLOW() for INCLUDE mode in mld_send_initial_cr().
      
      Fixes: a052517a ("net/multicast: should not send source list records when have filter mode change")
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7ea20c9
  24. 07 7月, 2018 2 次提交
  25. 06 7月, 2018 1 次提交
  26. 05 7月, 2018 1 次提交
    • P
      ipv6: make ipv6_renew_options() interrupt/kernel safe · a9ba23d4
      Paul Moore 提交于
      At present the ipv6_renew_options_kern() function ends up calling into
      access_ok() which is problematic if done from inside an interrupt as
      access_ok() calls WARN_ON_IN_IRQ() on some (all?) architectures
      (x86-64 is affected).  Example warning/backtrace is shown below:
      
       WARNING: CPU: 1 PID: 3144 at lib/usercopy.c:11 _copy_from_user+0x85/0x90
       ...
       Call Trace:
        <IRQ>
        ipv6_renew_option+0xb2/0xf0
        ipv6_renew_options+0x26a/0x340
        ipv6_renew_options_kern+0x2c/0x40
        calipso_req_setattr+0x72/0xe0
        netlbl_req_setattr+0x126/0x1b0
        selinux_netlbl_inet_conn_request+0x80/0x100
        selinux_inet_conn_request+0x6d/0xb0
        security_inet_conn_request+0x32/0x50
        tcp_conn_request+0x35f/0xe00
        ? __lock_acquire+0x250/0x16c0
        ? selinux_socket_sock_rcv_skb+0x1ae/0x210
        ? tcp_rcv_state_process+0x289/0x106b
        tcp_rcv_state_process+0x289/0x106b
        ? tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_do_rcv+0x1a7/0x3c0
        tcp_v6_rcv+0xc82/0xcf0
        ip6_input_finish+0x10d/0x690
        ip6_input+0x45/0x1e0
        ? ip6_rcv_finish+0x1d0/0x1d0
        ipv6_rcv+0x32b/0x880
        ? ip6_make_skb+0x1e0/0x1e0
        __netif_receive_skb_core+0x6f2/0xdf0
        ? process_backlog+0x85/0x250
        ? process_backlog+0x85/0x250
        ? process_backlog+0xec/0x250
        process_backlog+0xec/0x250
        net_rx_action+0x153/0x480
        __do_softirq+0xd9/0x4f7
        do_softirq_own_stack+0x2a/0x40
        </IRQ>
        ...
      
      While not present in the backtrace, ipv6_renew_option() ends up calling
      access_ok() via the following chain:
      
        access_ok()
        _copy_from_user()
        copy_from_user()
        ipv6_renew_option()
      
      The fix presented in this patch is to perform the userspace copy
      earlier in the call chain such that it is only called when the option
      data is actually coming from userspace; that place is
      do_ipv6_setsockopt().  Not only does this solve the problem seen in
      the backtrace above, it also allows us to simplify the code quite a
      bit by removing ipv6_renew_options_kern() completely.  We also take
      this opportunity to cleanup ipv6_renew_options()/ipv6_renew_option()
      a small amount as well.
      
      This patch is heavily based on a rough patch by Al Viro.  I've taken
      his original patch, converted a kmemdup() call in do_ipv6_setsockopt()
      to a memdup_user() call, made better use of the e_inval jump target in
      the same function, and cleaned up the use ipv6_renew_option() by
      ipv6_renew_options().
      
      CC: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a9ba23d4
  27. 05 6月, 2018 3 次提交
  28. 27 4月, 2018 2 次提交
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • W
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn 提交于
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cd7884d