1. 05 12月, 2018 1 次提交
  2. 04 12月, 2018 2 次提交
    • W
      udp: elide zerocopy operation in hot path · 52900d22
      Willem de Bruijn 提交于
      With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
      Release of its last reference triggers a completion notification.
      
      The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
      the skbs, because it can build, send and free skbs within its loop,
      possibly reaching refcount zero and freeing the ubuf_info too soon.
      
      The UDP stack currently also takes this extra ref, but does not need
      it as all skbs are sent after return from __ip(6)_append_data.
      
      Avoid the extra refcount_inc and refcount_dec_and_test, and generally
      the sock_zerocopy_put in the common path, by passing the initial
      reference to the first skb.
      
      This approach is taken instead of initializing the refcount to 0, as
      that would generate error "refcount_t: increment on 0" on the
      next skb_zcopy_set.
      
      Changes
        v3 -> v4
          - Move skb_zcopy_set below the only kfree_skb that might cause
            a premature uarg destroy before skb_zerocopy_put_abort
            - Move the entire skb_shinfo assignment block, to keep that
              cacheline access in one place
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52900d22
    • W
      udp: msg_zerocopy · b5947e5d
      Willem de Bruijn 提交于
      Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
      interpret flag MSG_ZEROCOPY.
      
      This patch was previously part of the zerocopy RFC patchsets. Zerocopy
      is not effective at small MTU. With segmentation offload building
      larger datagrams, the benefit of page flipping outweights the cost of
      generating a completion notification.
      
      tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
      test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
      
          ipv4 udp -t 1
          tx=191312 (11938 MB) txc=0 zc=n
          rx=191312 (11938 MB)
          ipv4 udp -z -t 1
          tx=304507 (19002 MB) txc=304507 zc=y
          rx=304507 (19002 MB)
          ok
          ipv6 udp -t 1
          tx=174485 (10888 MB) txc=0 zc=n
          rx=174485 (10888 MB)
          ipv6 udp -z -t 1
          tx=294801 (18396 MB) txc=294801 zc=y
          rx=294801 (18396 MB)
          ok
      
      Changes
        v1 -> v2
          - Fixup reverse christmas tree violation
        v2 -> v3
          - Split refcount avoidance optimization into separate patch
            - Fix refcount leak on error in fragmented case
              (thanks to Paolo Abeni for pointing this one out!)
            - Fix refcount inc on zero
            - Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
              This is needed since commit 5cf4a853 ("tcp: really ignore
      	MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5947e5d
  3. 25 11月, 2018 1 次提交
    • W
      net: always initialize pagedlen · aba36930
      Willem de Bruijn 提交于
      In ip packet generation, pagedlen is initialized for each skb at the
      start of the loop in __ip(6)_append_data, before label alloc_new_skb.
      
      Depending on compiler options, code can be generated that jumps to
      this label, triggering use of an an uninitialized variable.
      
      In practice, at -O2, the generated code moves the initialization below
      the label. But the code should not rely on that for correctness.
      
      Fixes: 15e36f5b ("udp: paged allocation with gso")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aba36930
  4. 17 9月, 2018 1 次提交
  5. 11 9月, 2018 1 次提交
  6. 24 7月, 2018 1 次提交
    • P
      ip: hash fragments consistently · 3dd1c9a1
      Paolo Abeni 提交于
      The skb hash for locally generated ip[v6] fragments belonging
      to the same datagram can vary in several circumstances:
      * for connected UDP[v6] sockets, the first fragment get its hash
        via set_owner_w()/skb_set_hash_from_sk()
      * for unconnected IPv6 UDPv6 sockets, the first fragment can get
        its hash via ip6_make_flowlabel()/skb_get_hash_flowi6(), if
        auto_flowlabel is enabled
      
      For the following frags the hash is usually computed via
      skb_get_hash().
      The above can cause OoO for unconnected IPv6 UDPv6 socket: in that
      scenario the egress tx queue can be selected on a per packet basis
      via the skb hash.
      It may also fool flow-oriented schedulers to place fragments belonging
      to the same datagram in different flows.
      
      Fix the issue by copying the skb hash from the head frag into
      the others at fragmentation time.
      
      Before this commit:
      perf probe -a "dev_queue_xmit skb skb->hash skb->l4_hash:b1@0/8 skb->sw_hash:b1@1/8"
      netperf -H $IPV4 -t UDP_STREAM -l 5 -- -m 2000 -n &
      perf record -e probe:dev_queue_xmit -e probe:skb_set_owner_w -a sleep 0.1
      perf script
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=3713014309 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=0 l4_hash=0 sw_hash=0
      
      After this commit:
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      probe:dev_queue_xmit: (ffffffff8c6b1b20) hash=2171763177 l4_hash=1 sw_hash=0
      
      Fixes: b73c3d0e ("net: Save TX flow hash in sock and set in skbuf on xmit")
      Fixes: 67800f9b ("ipv6: Call skb_get_hash_flowi6 to get skb->hash in ip6_make_flowlabel")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dd1c9a1
  7. 07 7月, 2018 3 次提交
  8. 04 7月, 2018 1 次提交
  9. 20 6月, 2018 1 次提交
  10. 04 6月, 2018 1 次提交
  11. 18 5月, 2018 1 次提交
  12. 27 4月, 2018 3 次提交
    • W
      udp: paged allocation with gso · 15e36f5b
      Willem de Bruijn 提交于
      When sending large datagrams that are later segmented, store data in
      page frags to avoid copying from linear in skb_segment.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e36f5b
    • W
      udp: generate gso with UDP_SEGMENT · bec1f6f6
      Willem de Bruijn 提交于
      Support generic segmentation offload for udp datagrams. Callers can
      concatenate and send at once the payload of multiple datagrams with
      the same destination.
      
      To set segment size, the caller sets socket option UDP_SEGMENT to the
      length of each discrete payload. This value must be smaller than or
      equal to the relevant MTU.
      
      A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a
      per send call basis.
      
      Total byte length may then exceed MTU. If not an exact multiple of
      segment size, the last segment will be shorter.
      
      The implementation adds a gso_size field to the udp socket, ip(v6)
      cmsg cookie and inet_cork structure to be able to set the value at
      setsockopt or cmsg time and to work with both lockless and corked
      paths.
      
      Initial benchmark numbers show UDP GSO about as expensive as TCP GSO.
      
          tcp tso
           3197 MB/s 54232 msg/s 54232 calls/s
               6,457,754,262      cycles
      
          tcp gso
           1765 MB/s 29939 msg/s 29939 calls/s
              11,203,021,806      cycles
      
          tcp without tso/gso *
            739 MB/s 12548 msg/s 12548 calls/s
              11,205,483,630      cycles
      
          udp
            876 MB/s 14873 msg/s 624666 calls/s
              11,205,777,429      cycles
      
          udp gso
           2139 MB/s 36282 msg/s 36282 calls/s
              11,204,374,561      cycles
      
         [*] after reverting commit 0a6b2a1d
             ("tcp: switch to GSO being always on")
      
      Measured total system cycles ('-a') for one core while pinning both
      the network receive path and benchmark process to that core:
      
        perf stat -a -C 12 -e cycles \
          ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4
      
      Note the reduction in calls/s with GSO. Bytes per syscall drops
      increases from 1470 to 61818.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bec1f6f6
    • W
      udp: expose inet cork to udp · 1cd7884d
      Willem de Bruijn 提交于
      UDP segmentation offload needs access to inet_cork in the udp layer.
      Pass the struct to ip(6)_make_skb instead of allocating it on the
      stack in that function itself.
      
      This patch is a noop otherwise.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cd7884d
  13. 22 4月, 2018 2 次提交
  14. 18 4月, 2018 2 次提交
  15. 06 4月, 2018 1 次提交
    • J
      net/ipv6: Increment OUTxxx counters after netfilter hook · 71a1c915
      Jeff Barnhill 提交于
      At the end of ip6_forward(), IPSTATS_MIB_OUTFORWDATAGRAMS and
      IPSTATS_MIB_OUTOCTETS are incremented immediately before the NF_HOOK call
      for NFPROTO_IPV6 / NF_INET_FORWARD.  As a result, these counters get
      incremented regardless of whether or not the netfilter hook allows the
      packet to continue being processed.  This change increments the counters
      in ip6_forward_finish() so that it will not happen if the netfilter hook
      chooses to terminate the packet, which is similar to how IPv4 works.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71a1c915
  16. 04 4月, 2018 2 次提交
  17. 02 4月, 2018 1 次提交
  18. 26 3月, 2018 1 次提交
    • P
      ipv6: the entire IPv6 header chain must fit the first fragment · 10b8a3de
      Paolo Abeni 提交于
      While building ipv6 datagram we currently allow arbitrary large
      extheaders, even beyond pmtu size. The syzbot has found a way
      to exploit the above to trigger the following splat:
      
      kernel BUG at ./include/linux/skbuff.h:2073!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
          (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 4230 Comm: syzkaller672661 Not tainted 4.16.0-rc2+ #326
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:__skb_pull include/linux/skbuff.h:2073 [inline]
      RIP: 0010:__ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636
      RSP: 0018:ffff8801bc18f0f0 EFLAGS: 00010293
      RAX: ffff8801b17400c0 RBX: 0000000000000738 RCX: ffffffff84f01828
      RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8801b415ac18
      RBP: ffff8801bc18f360 R08: ffff8801b4576844 R09: 0000000000000000
      R10: ffff8801bc18f380 R11: ffffed00367aee4e R12: 00000000000000d6
      R13: ffff8801b415a740 R14: dffffc0000000000 R15: ffff8801b45767c0
      FS:  0000000001535880(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000002000b000 CR3: 00000001b4123001 CR4: 00000000001606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        ip6_finish_skb include/net/ipv6.h:969 [inline]
        udp_v6_push_pending_frames+0x269/0x3b0 net/ipv6/udp.c:1073
        udpv6_sendmsg+0x2a96/0x3400 net/ipv6/udp.c:1343
        inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:764
        sock_sendmsg_nosec net/socket.c:630 [inline]
        sock_sendmsg+0xca/0x110 net/socket.c:640
        ___sys_sendmsg+0x320/0x8b0 net/socket.c:2046
        __sys_sendmmsg+0x1ee/0x620 net/socket.c:2136
        SYSC_sendmmsg net/socket.c:2167 [inline]
        SyS_sendmmsg+0x35/0x60 net/socket.c:2162
        do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
        entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x4404c9
      RSP: 002b:00007ffdce35f948 EFLAGS: 00000217 ORIG_RAX: 0000000000000133
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 00000000004404c9
      RDX: 0000000000000003 RSI: 0000000020001f00 RDI: 0000000000000003
      RBP: 00000000006cb018 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 0000000020000080 R11: 0000000000000217 R12: 0000000000401df0
      R13: 0000000000401e80 R14: 0000000000000000 R15: 0000000000000000
      Code: ff e8 1d 5e b9 fc e9 15 e9 ff ff e8 13 5e b9 fc e9 44 e8 ff ff e8 29
      5e b9 fc e9 c0 e6 ff ff e8 3f f3 80 fc 0f 0b e8 38 f3 80 fc <0f> 0b 49 8d
      87 80 00 00 00 4d 8d 87 84 00 00 00 48 89 85 20 fe
      RIP: __skb_pull include/linux/skbuff.h:2073 [inline] RSP: ffff8801bc18f0f0
      RIP: __ip6_make_skb+0x1ac8/0x2190 net/ipv6/ip6_output.c:1636 RSP:
      ffff8801bc18f0f0
      
      As stated by RFC 7112 section 5:
      
         When a host fragments an IPv6 datagram, it MUST include the entire
         IPv6 Header Chain in the First Fragment.
      
      So this patch addresses the issue dropping datagrams with excessive
      extheader length. It also updates the error path to report to the
      calling socket nonnegative pmtu values.
      
      The issue apparently predates git history.
      
      v1 -> v2: cleanup error path, as per Eric's suggestion
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: syzbot+91e6f9932ff122fa4410@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10b8a3de
  19. 05 3月, 2018 1 次提交
  20. 02 3月, 2018 1 次提交
  21. 24 1月, 2018 1 次提交
  22. 16 1月, 2018 2 次提交
    • E
      ipv6: ip6_make_skb() needs to clear cork.base.dst · 95ef498d
      Eric Dumazet 提交于
      In my last patch, I missed fact that cork.base.dst was not initialized
      in ip6_make_skb() :
      
      If ip6_setup_cork() returns an error, we might attempt a dst_release()
      on some random pointer.
      
      Fixes: 862c03ee ("ipv6: fix possible mem leaks in ipv6_make_skb()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ef498d
    • M
      ipv6: fix udpv6 sendmsg crash caused by too small MTU · 749439bf
      Mike Maloney 提交于
      The logic in __ip6_append_data() assumes that the MTU is at least large
      enough for the headers.  A device's MTU may be adjusted after being
      added while sendmsg() is processing data, resulting in
      __ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
      the fragmentation header, the math results in a negative 'maxfraglen',
      which causes problems when refragmenting any previous skb in the
      skb_write_queue, leaving it possibly malformed.
      
      Instead sendmsg returns EINVAL when the mtu is calculated to be less
      than IPV6_MIN_MTU.
      
      Found by syzkaller:
      kernel BUG at ./include/linux/skbuff.h:2064!
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801d0b68580 task.stack: ffff8801ac6b8000
      RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
      RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
      RSP: 0018:ffff8801ac6bf570 EFLAGS: 00010216
      RAX: 0000000000010000 RBX: 0000000000000028 RCX: ffffc90003cce000
      RDX: 00000000000001b8 RSI: ffffffff839df06f RDI: ffff8801d9478ca0
      RBP: ffff8801ac6bf780 R08: ffff8801cc3f1dbc R09: 0000000000000000
      R10: ffff8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: ffff8801cc3f1dc8
      R13: ffff8801cc3f1d40 R14: 0000000000001036 R15: dffffc0000000000
      FS:  00007f43d740c700(0000) GS:ffff8801dc100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7834984000 CR3: 00000001d79b9000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       ip6_finish_skb include/net/ipv6.h:911 [inline]
       udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
       udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x352/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4512e9
      RSP: 002b:00007f43d740bc08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00000000007180a8 RCX: 00000000004512e9
      RDX: 000000000000002e RSI: 0000000020d08000 RDI: 0000000000000005
      RBP: 0000000000000086 R08: 00000000209c1000 R09: 000000000000001c
      R10: 0000000000040800 R11: 0000000000000216 R12: 00000000004b9c69
      R13: 00000000ffffffff R14: 0000000000000005 R15: 00000000202c2000
      Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
      RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: ffff8801ac6bf570
      RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: ffff8801ac6bf570
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMike Maloney <maloney@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      749439bf
  23. 11 1月, 2018 1 次提交
  24. 09 1月, 2018 1 次提交
  25. 27 12月, 2017 1 次提交
  26. 22 12月, 2017 1 次提交
    • S
      net: reevalulate autoflowlabel setting after sysctl setting · 513674b5
      Shaohua Li 提交于
      sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2.
      If sockopt doesn't set autoflowlabel, outcome packets from the hosts are
      supposed to not include flowlabel. This is true for normal packet, but
      not for reset packet.
      
      The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if
      we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't
      changed, so the sock will keep the old behavior in terms of auto
      flowlabel. Reset packet is suffering from this problem, because reset
      packet is sent from a special control socket, which is created at boot
      time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control
      socket will always have its ipv6_pinfo.autoflowlabel set, even after
      user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always
      have flowlabel. Normal sock created before sysctl setting suffers from
      the same issue. We can't even turn off autoflowlabel unless we kill all
      socks in the hosts.
      
      To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the
      autoflowlabel setting from user, otherwise we always call
      ip6_default_np_autolabel() which has the new settings of sysctl.
      
      Note, this changes behavior a little bit. Before commit 42240901
      (ipv6: Implement different admin modes for automatic flow labels), the
      autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes,
      existing connection will change autoflowlabel behavior. After that
      commit, autoflowlabel behavior is sticky in the whole life of the sock.
      With this patch, the behavior isn't sticky again.
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Tom Herbert <tom@quantonium.net>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      513674b5
  27. 30 11月, 2017 1 次提交
    • D
      xfrm: Move dst->path into struct xfrm_dst · 0f6c480f
      David Miller 提交于
      The first member of an IPSEC route bundle chain sets it's dst->path to
      the underlying ipv4/ipv6 route that carries the bundle.
      
      Stated another way, if one were to follow the xfrm_dst->child chain of
      the bundle, the final non-NULL pointer would be the path and point to
      either an ipv4 or an ipv6 route.
      
      This is largely used to make sure that PMTU events propagate down to
      the correct ipv4 or ipv6 route.
      
      When we don't have the top of an IPSEC bundle 'dst->path == dst'.
      
      Move it down into xfrm_dst and key off of dst->xfrm.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      0f6c480f
  28. 22 10月, 2017 1 次提交
    • E
      ipv6: flowlabel: do not leave opt->tot_len with garbage · 864e2a1f
      Eric Dumazet 提交于
      When syzkaller team brought us a C repro for the crash [1] that
      had been reported many times in the past, I finally could find
      the root cause.
      
      If FlowLabel info is merged by fl6_merge_options(), we leave
      part of the opt_space storage provided by udp/raw/l2tp with random value
      in opt_space.tot_len, unless a control message was provided at sendmsg()
      time.
      
      Then ip6_setup_cork() would use this random value to perform a kzalloc()
      call. Undefined behavior and crashes.
      
      Fix is to properly set tot_len in fl6_merge_options()
      
      At the same time, we can also avoid consuming memory and cpu cycles
      to clear it, if every option is copied via a kmemdup(). This is the
      change in ip6_setup_cork().
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 6613 Comm: syz-executor0 Not tainted 4.14.0-rc4+ #127
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cb64a100 task.stack: ffff8801cc350000
      RIP: 0010:ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168
      RSP: 0018:ffff8801cc357550 EFLAGS: 00010203
      RAX: dffffc0000000000 RBX: ffff8801cc357748 RCX: 0000000000000010
      RDX: 0000000000000002 RSI: ffffffff842bd1d9 RDI: 0000000000000014
      RBP: ffff8801cc357620 R08: ffff8801cb17f380 R09: ffff8801cc357b10
      R10: ffff8801cb64a100 R11: 0000000000000000 R12: ffff8801cc357ab0
      R13: ffff8801cc357b10 R14: 0000000000000000 R15: ffff8801c3bbf0c0
      FS:  00007f9c5c459700(0000) GS:ffff8801db200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020324000 CR3: 00000001d1cf2000 CR4: 00000000001406f0
      DR0: 0000000020001010 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       ip6_make_skb+0x282/0x530 net/ipv6/ip6_output.c:1729
       udpv6_sendmsg+0x2769/0x3380 net/ipv6/udp.c:1340
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
       sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       SYSC_sendto+0x358/0x5a0 net/socket.c:1750
       SyS_sendto+0x40/0x50 net/socket.c:1718
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x4520a9
      RSP: 002b:00007f9c5c458c08 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004520a9
      RDX: 0000000000000001 RSI: 0000000020fd1000 RDI: 0000000000000016
      RBP: 0000000000000086 R08: 0000000020e0afe4 R09: 000000000000001c
      R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004bb1ee
      R13: 00000000ffffffff R14: 0000000000000016 R15: 0000000000000029
      Code: e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 ea 0f 00 00 48 8d 79 04 48 b8 00 00 00 00 00 fc ff df 45 8b 74 24 04 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85
      RIP: ip6_setup_cork+0x274/0x15c0 net/ipv6/ip6_output.c:1168 RSP: ffff8801cc357550
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      864e2a1f
  29. 11 8月, 2017 1 次提交
  30. 26 7月, 2017 1 次提交
    • S
      ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment() · afce615a
      Stefano Brivio 提交于
      RFC 2465 defines ipv6IfStatsOutFragFails as:
      
      	"The number of IPv6 datagrams that have been discarded
      	 because they needed to be fragmented at this output
      	 interface but could not be."
      
      The existing implementation, instead, would increase the counter
      twice in case we fail to allocate room for single fragments:
      once for the fragment, once for the datagram.
      
      This didn't look intentional though. In one of the two affected
      affected failure paths, the double increase was simply a result
      of a new 'goto fail' statement, introduced to avoid a skb leak.
      The other path appears to be affected since at least 2.6.12-rc2.
      Reported-by: NSabrina Dubroca <sdubroca@redhat.com>
      Fixes: 1d325d21 ("ipv6: ip6_fragment: fix headroom tests and skb leak")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afce615a
  31. 18 7月, 2017 1 次提交