1. 16 11月, 2018 1 次提交
  2. 09 11月, 2018 2 次提交
  3. 03 11月, 2018 1 次提交
  4. 20 10月, 2018 1 次提交
    • D
      net: fix pskb_trim_rcsum_slow() with odd trim offset · d55bef50
      Dimitris Michailidis 提交于
      We've been getting checksum errors involving small UDP packets, usually
      59B packets with 1 extra non-zero padding byte. netdev_rx_csum_fault()
      has been complaining that HW is providing bad checksums. Turns out the
      problem is in pskb_trim_rcsum_slow(), introduced in commit 88078d98
      ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends").
      
      The source of the problem is that when the bytes we are trimming start
      at an odd address, as in the case of the 1 padding byte above,
      skb_checksum() returns a byte-swapped value. We cannot just combine this
      with skb->csum using csum_sub(). We need to use csum_block_sub() here
      that takes into account the parity of the start address and handles the
      swapping.
      
      Matches existing code in __skb_postpull_rcsum() and esp_remove_trailer().
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Signed-off-by: NDimitris Michailidis <dmichail@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d55bef50
  5. 11 10月, 2018 1 次提交
    • E
      net: make skb_partial_csum_set() more robust against overflows · 52b5d6f5
      Eric Dumazet 提交于
      syzbot managed to crash in skb_checksum_help() [1] :
      
              BUG_ON(offset + sizeof(__sum16) > skb_headlen(skb));
      
      Root cause is the following check in skb_partial_csum_set()
      
      	if (unlikely(start > skb_headlen(skb)) ||
      	    unlikely((int)start + off > skb_headlen(skb) - 2))
      		return false;
      
      If skb_headlen(skb) is 1, then (skb_headlen(skb) - 2) becomes 0xffffffff
      and the check fails to detect that ((int)start + off) is off the limit,
      since the compare is unsigned.
      
      When we fix that, then the first condition (start > skb_headlen(skb))
      becomes obsolete.
      
      Then we should also check that (skb_headroom(skb) + start) wont
      overflow 16bit field.
      
      [1]
      kernel BUG at net/core/dev.c:2880!
      invalid opcode: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 7330 Comm: syz-executor4 Not tainted 4.19.0-rc6+ #253
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:skb_checksum_help+0x9e3/0xbb0 net/core/dev.c:2880
      Code: 85 00 ff ff ff 48 c1 e8 03 42 80 3c 28 00 0f 84 09 fb ff ff 48 8b bd 00 ff ff ff e8 97 a8 b9 fb e9 f8 fa ff ff e8 2d 09 76 fb <0f> 0b 48 8b bd 28 ff ff ff e8 1f a8 b9 fb e9 b1 f6 ff ff 48 89 cf
      RSP: 0018:ffff8801d83a6f60 EFLAGS: 00010293
      RAX: ffff8801b9834380 RBX: ffff8801b9f8d8c0 RCX: ffffffff8608c6d7
      RDX: 0000000000000000 RSI: ffffffff8608cc63 RDI: 0000000000000006
      RBP: ffff8801d83a7068 R08: ffff8801b9834380 R09: 0000000000000000
      R10: ffff8801d83a76d8 R11: 0000000000000000 R12: 0000000000000001
      R13: 0000000000010001 R14: 000000000000ffff R15: 00000000000000a8
      FS:  00007f1a66db5700(0000) GS:ffff8801daf00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f7d77f091b0 CR3: 00000001ba252000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       skb_csum_hwoffload_help+0x8f/0xe0 net/core/dev.c:3269
       validate_xmit_skb+0xa2a/0xf30 net/core/dev.c:3312
       __dev_queue_xmit+0xc2f/0x3950 net/core/dev.c:3797
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3838
       packet_snd net/packet/af_packet.c:2928 [inline]
       packet_sendmsg+0x422d/0x64c0 net/packet/af_packet.c:2953
      
      Fixes: 5ff8dda3 ("net: Ensure partial checksum offset is inside the skb head")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52b5d6f5
  6. 03 10月, 2018 1 次提交
  7. 08 9月, 2018 1 次提交
  8. 10 8月, 2018 1 次提交
  9. 06 8月, 2018 1 次提交
  10. 03 8月, 2018 1 次提交
  11. 22 7月, 2018 1 次提交
    • E
      net: skb_segment() should not return NULL · ff907a11
      Eric Dumazet 提交于
      syzbot caught a NULL deref [1], caused by skb_segment()
      
      skb_segment() has many "goto err;" that assume the @err variable
      contains -ENOMEM.
      
      A successful call to __skb_linearize() should not clear @err,
      otherwise a subsequent memory allocation error could return NULL.
      
      While we are at it, we might use -EINVAL instead of -ENOMEM when
      MAX_SKB_FRAGS limit is reached.
      
      [1]
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      CPU: 0 PID: 13285 Comm: syz-executor3 Not tainted 4.18.0-rc4+ #146
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:tcp_gso_segment+0x3dc/0x1780 net/ipv4/tcp_offload.c:106
      Code: f0 ff ff 0f 87 1c fd ff ff e8 00 88 0b fb 48 8b 75 d0 48 b9 00 00 00 00 00 fc ff df 48 8d be 90 00 00 00 48 89 f8 48 c1 e8 03 <0f> b6 14 08 48 8d 86 94 00 00 00 48 89 c6 83 e0 07 48 c1 ee 03 0f
      RSP: 0018:ffff88019b7fd060 EFLAGS: 00010206
      RAX: 0000000000000012 RBX: 0000000000000020 RCX: dffffc0000000000
      RDX: 0000000000040000 RSI: 0000000000000000 RDI: 0000000000000090
      RBP: ffff88019b7fd0f0 R08: ffff88019510e0c0 R09: ffffed003b5c46d6
      R10: ffffed003b5c46d6 R11: ffff8801dae236b3 R12: 0000000000000001
      R13: ffff8801d6c581f4 R14: 0000000000000000 R15: ffff8801d6c58128
      FS:  00007fcae64d6700(0000) GS:ffff8801dae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004e8664 CR3: 00000001b669b000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       tcp4_gso_segment+0x1c3/0x440 net/ipv4/tcp_offload.c:54
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       inet_gso_segment+0x64e/0x12d0 net/ipv4/af_inet.c:1342
       skb_mac_gso_segment+0x3b5/0x740 net/core/dev.c:2792
       __skb_gso_segment+0x3c3/0x880 net/core/dev.c:2865
       skb_gso_segment include/linux/netdevice.h:4099 [inline]
       validate_xmit_skb+0x640/0xf30 net/core/dev.c:3104
       __dev_queue_xmit+0xc14/0x3910 net/core/dev.c:3561
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_hh_output include/net/neighbour.h:473 [inline]
       neigh_output include/net/neighbour.h:481 [inline]
       ip_finish_output2+0x1063/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       iptunnel_xmit+0x567/0x850 net/ipv4/ip_tunnel_core.c:91
       ip_tunnel_xmit+0x1598/0x3af1 net/ipv4/ip_tunnel.c:778
       ipip_tunnel_xmit+0x264/0x2c0 net/ipv4/ipip.c:308
       __netdev_start_xmit include/linux/netdevice.h:4148 [inline]
       netdev_start_xmit include/linux/netdevice.h:4157 [inline]
       xmit_one net/core/dev.c:3034 [inline]
       dev_hard_start_xmit+0x26c/0xc30 net/core/dev.c:3050
       __dev_queue_xmit+0x29ef/0x3910 net/core/dev.c:3569
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3602
       neigh_direct_output+0x15/0x20 net/core/neighbour.c:1403
       neigh_output include/net/neighbour.h:483 [inline]
       ip_finish_output2+0xa67/0x1860 net/ipv4/ip_output.c:229
       ip_finish_output+0x841/0xfa0 net/ipv4/ip_output.c:317
       NF_HOOK_COND include/linux/netfilter.h:276 [inline]
       ip_output+0x223/0x880 net/ipv4/ip_output.c:405
       dst_output include/net/dst.h:444 [inline]
       ip_local_out+0xc5/0x1b0 net/ipv4/ip_output.c:124
       ip_queue_xmit+0x9df/0x1f80 net/ipv4/ip_output.c:504
       tcp_transmit_skb+0x1bf9/0x3f10 net/ipv4/tcp_output.c:1168
       tcp_write_xmit+0x1641/0x5c20 net/ipv4/tcp_output.c:2363
       __tcp_push_pending_frames+0xb2/0x290 net/ipv4/tcp_output.c:2536
       tcp_push+0x638/0x8c0 net/ipv4/tcp.c:735
       tcp_sendmsg_locked+0x2ec5/0x3f00 net/ipv4/tcp.c:1410
       tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1447
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:641 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:651
       __sys_sendto+0x3d7/0x670 net/socket.c:1797
       __do_sys_sendto net/socket.c:1809 [inline]
       __se_sys_sendto net/socket.c:1805 [inline]
       __x64_sys_sendto+0xe1/0x1a0 net/socket.c:1805
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x455ab9
      Code: 1d ba fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 eb b9 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fcae64d5c68 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 00007fcae64d66d4 RCX: 0000000000455ab9
      RDX: 0000000000000001 RSI: 0000000020000200 RDI: 0000000000000013
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
      R13: 00000000004c1145 R14: 00000000004d1818 R15: 0000000000000006
      Modules linked in:
      Dumping ftrace buffer:
         (ftrace buffer empty)
      
      Fixes: ddff00d4 ("net: Move skb_has_shared_frag check out of GRE code and into segmentation")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff907a11
  12. 19 7月, 2018 1 次提交
    • S
      net: Move skb decrypted field, avoid explicity copy · a48d189e
      Stefano Brivio 提交于
      Commit 784abe24 ("net: Add decrypted field to skb")
      introduced a 'decrypted' field that is explicitly copied on skb
      copy and clone.
      
      Move it between headers_start[0] and headers_end[0], so that we
      don't need to copy it explicitly as it's copied by the memcpy()
      in __copy_skb_header().
      
      While at it, drop the assignment in __skb_clone(), it was
      already redundant.
      
      This doesn't change the size of sk_buff or cacheline boundaries.
      
      The 15-bits hole before tc_index becomes a 14-bits hole, and
      will be again a 15-bits hole when this change is merged with
      commit 8b700862 ("net: Don't copy pfmemalloc flag in
      __copy_skb_header()").
      
      v2: as reported by kbuild test robot (oops, I forgot to build
          with CONFIG_TLS_DEVICE it seems), we can't use
          CHECK_SKB_FIELD() on a bit-field member. Just drop the
          check for the moment being, perhaps we could think of some
          magic to also check bit-field members one day.
      
      Fixes: 784abe24 ("net: Add decrypted field to skb")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a48d189e
  13. 16 7月, 2018 1 次提交
  14. 14 7月, 2018 1 次提交
  15. 13 7月, 2018 1 次提交
    • S
      net: Don't copy pfmemalloc flag in __copy_skb_header() · 8b700862
      Stefano Brivio 提交于
      The pfmemalloc flag indicates that the skb was allocated from
      the PFMEMALLOC reserves, and the flag is currently copied on skb
      copy and clone.
      
      However, an skb copied from an skb flagged with pfmemalloc
      wasn't necessarily allocated from PFMEMALLOC reserves, and on
      the other hand an skb allocated that way might be copied from an
      skb that wasn't.
      
      So we should not copy the flag on skb copy, and rather decide
      whether to allow an skb to be associated with sockets unrelated
      to page reclaim depending only on how it was allocated.
      
      Move the pfmemalloc flag before headers_start[0] using an
      existing 1-bit hole, so that __copy_skb_header() doesn't copy
      it.
      
      When cloning, we'll now take care of this flag explicitly,
      contravening to the warning comment of __skb_clone().
      
      While at it, restore the newline usage introduced by commit
      b1937227 ("net: reorganize sk_buff for faster
      __copy_skb_header()") to visually separate bytes used in
      bitfields after headers_start[0], that was gone after commit
      a9e419dc ("netfilter: merge ctinfo into nfct pointer storage
      area"), and describe the pfmemalloc flag in the kernel-doc
      structure comment.
      
      This doesn't change the size of sk_buff or cacheline boundaries,
      but consolidates the 15 bits hole before tc_index into a 2 bytes
      hole before csum, that could now be filled more easily.
      Reported-by: NPatrick Talbert <ptalbert@redhat.com>
      Fixes: c93bdd0e ("netvm: allow skb allocation to use PFMEMALLOC reserves")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b700862
  16. 04 7月, 2018 1 次提交
  17. 30 6月, 2018 1 次提交
    • M
      net: cleanup gfp mask in alloc_skb_with_frags · d14b56f5
      Michal Hocko 提交于
      alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations
      which is just a noop and a little bit confusing.
      
      __GFP_NORETRY was added by ed98df33 ("net: use __GFP_NORETRY for
      high order allocations") to prevent from the OOM killer. Yet this was
      not enough because fb05e7a8 ("net: don't wait for order-3 page
      allocation") didn't want an excessive reclaim for non-costly orders
      so it made it completely NOWAIT while it preserved __GFP_NORETRY in
      place which is now redundant.
      
      Drop the pointless __GFP_NORETRY because this function is used as
      copy&paste source for other places.
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d14b56f5
  18. 28 6月, 2018 1 次提交
    • F
      skbuff: preserve sock reference when scrubbing the skb. · 9c4c3252
      Flavio Leitner 提交于
      The sock reference is lost when scrubbing the packet and that breaks
      TSQ (TCP Small Queues) and XPS (Transmit Packet Steering) causing
      performance impacts of about 50% in a single TCP stream when crossing
      network namespaces.
      
      XPS breaks because the queue mapping stored in the socket is not
      available, so another random queue might be selected when the stack
      needs to transmit something like a TCP ACK, or TCP Retransmissions.
      That causes packet re-ordering and/or performance issues.
      
      TSQ breaks because it orphans the packet while it is still in the
      host, so packets are queued contributing to the buffer bloat problem.
      
      Preserving the sock reference fixes both issues. The socket is
      orphaned anyways in the receiving path before any relevant action
      and on TX side the netfilter checks if the reference is local before
      use it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c4c3252
  19. 26 6月, 2018 1 次提交
    • D
      net: Convert GRO SKB handling to list_head. · d4546c25
      David Miller 提交于
      Manage pending per-NAPI GRO packets via list_head.
      
      Return an SKB pointer from the GRO receive handlers.  When GRO receive
      handlers return non-NULL, it means that this SKB needs to be completed
      at this time and removed from the NAPI queue.
      
      Several operations are greatly simplified by this transformation,
      especially timing out the oldest SKB in the list when gro_count
      exceeds MAX_GRO_SKBS, and napi_gro_flush() which walks the queue
      in reverse order.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4546c25
  20. 01 5月, 2018 1 次提交
  21. 27 4月, 2018 1 次提交
    • W
      udp: add udp gso · ee80d1eb
      Willem de Bruijn 提交于
      Implement generic segmentation offload support for udp datagrams. A
      follow-up patch adds support to the protocol stack to generate such
      packets.
      
      UDP GSO is not UFO. UFO fragments a single large datagram. GSO splits
      a large payload into a number of discrete UDP datagrams.
      
      The implementation adds a GSO type SKB_UDP_GSO_L4 to differentiate it
      from UFO (SKB_UDP_GSO).
      
      IPPROTO_UDPLITE is excluded, as that protocol has no gso handler
      registered.
      
      [ Export __udp_gso_segment for ipv6. -DaveM ]
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee80d1eb
  22. 20 4月, 2018 1 次提交
    • E
      net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends · 88078d98
      Eric Dumazet 提交于
      After working on IP defragmentation lately, I found that some large
      packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
      zero paddings on the last (small) fragment.
      
      While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
      to CHECKSUM_NONE, forcing a full csum validation, even if all prior
      fragments had CHECKSUM_COMPLETE set.
      
      We can instead compute the checksum of the part we are trimming,
      usually smaller than the part we keep.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88078d98
  23. 08 4月, 2018 1 次提交
  24. 31 3月, 2018 1 次提交
    • T
      net: Fix untag for vlan packets without ethernet header · ae474573
      Toshiaki Makita 提交于
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol, and skb_vlan_untag() attempts to untag such
      packets.
      
      skb_vlan_untag() (more precisely, skb_reorder_vlan_header() called by it)
      however did not expect packets without ethernet headers, so in such a case
      size argument for memmove() underflowed and triggered crash.
      
      ====
      BUG: unable to handle kernel paging request at ffff8801cccb8000
      IP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      PGD 9cee067 P4D 9cee067 PUD 1d9401063 PMD 1cccb7063 PTE 2810100028101
      Oops: 000b [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 1 PID: 17663 Comm: syz-executor2 Not tainted 4.16.0-rc7+ #368
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43
      RSP: 0018:ffff8801cc046e28 EFLAGS: 00010287
      RAX: ffff8801ccc244c4 RBX: fffffffffffffffe RCX: fffffffffff6c4c2
      RDX: fffffffffffffffe RSI: ffff8801cccb7ffc RDI: ffff8801cccb8000
      RBP: ffff8801cc046e48 R08: ffff8801ccc244be R09: ffffed0039984899
      R10: 0000000000000001 R11: ffffed0039984898 R12: ffff8801ccc244c4
      R13: ffff8801ccc244c0 R14: ffff8801d96b7c06 R15: ffff8801d96b7b40
      FS:  00007febd562d700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff8801cccb8000 CR3: 00000001ccb2f006 CR4: 00000000001606e0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600
      Call Trace:
       memmove include/linux/string.h:360 [inline]
       skb_reorder_vlan_header net/core/skbuff.c:5031 [inline]
       skb_vlan_untag+0x470/0xc40 net/core/skbuff.c:5061
       __netif_receive_skb_core+0x119c/0x3460 net/core/dev.c:4460
       __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4627
       netif_receive_skb_internal+0x10b/0x670 net/core/dev.c:4701
       netif_receive_skb+0xae/0x390 net/core/dev.c:4725
       tun_rx_batched.isra.50+0x5ee/0x870 drivers/net/tun.c:1555
       tun_get_user+0x299e/0x3c20 drivers/net/tun.c:1962
       tun_chr_write_iter+0xb9/0x160 drivers/net/tun.c:1990
       call_write_iter include/linux/fs.h:1782 [inline]
       new_sync_write fs/read_write.c:469 [inline]
       __vfs_write+0x684/0x970 fs/read_write.c:482
       vfs_write+0x189/0x510 fs/read_write.c:544
       SYSC_write fs/read_write.c:589 [inline]
       SyS_write+0xef/0x220 fs/read_write.c:581
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7
      RIP: 0033:0x454879
      RSP: 002b:00007febd562cc68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007febd562d6d4 RCX: 0000000000454879
      RDX: 0000000000000157 RSI: 0000000020000180 RDI: 0000000000000014
      RBP: 000000000072bea0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000000006b0 R14: 00000000006fc120 R15: 0000000000000000
      Code: 90 90 90 90 90 90 90 48 89 f8 48 83 fa 20 0f 82 03 01 00 00 48 39 fe 7d 0f 49 89 f0 49 01 d0 49 39 f8 0f 8f 9f 00 00 00 48 89 d1 <f3> a4 c3 48 81 fa a8 02 00 00 72 05 40 38 fe 74 3b 48 83 ea 20
      RIP: __memmove+0x24/0x1a0 arch/x86/lib/memmove_64.S:43 RSP: ffff8801cc046e28
      CR2: ffff8801cccb8000
      ====
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      
      Fixes: 4bbb3e0e ("net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae474573
  25. 26 3月, 2018 1 次提交
    • Y
      net: permit skb_segment on head_frag frag_list skb · 13acc94e
      Yonghong Song 提交于
      One of our in-house projects, bpf-based NAT, hits a kernel BUG_ON at
      function skb_segment(), line 3667. The bpf program attaches to
      clsact ingress, calls bpf_skb_change_proto to change protocol
      from ipv4 to ipv6 or from ipv6 to ipv4, and then calls bpf_redirect
      to send the changed packet out.
      
      3472 struct sk_buff *skb_segment(struct sk_buff *head_skb,
      3473                             netdev_features_t features)
      3474 {
      3475         struct sk_buff *segs = NULL;
      3476         struct sk_buff *tail = NULL;
      ...
      3665                 while (pos < offset + len) {
      3666                         if (i >= nfrags) {
      3667                                 BUG_ON(skb_headlen(list_skb));
      3668
      3669                                 i = 0;
      3670                                 nfrags = skb_shinfo(list_skb)->nr_frags;
      3671                                 frag = skb_shinfo(list_skb)->frags;
      3672                                 frag_skb = list_skb;
      ...
      
      call stack:
      ...
       #1 [ffff883ffef03558] __crash_kexec at ffffffff8110c525
       #2 [ffff883ffef03620] crash_kexec at ffffffff8110d5cc
       #3 [ffff883ffef03640] oops_end at ffffffff8101d7e7
       #4 [ffff883ffef03668] die at ffffffff8101deb2
       #5 [ffff883ffef03698] do_trap at ffffffff8101a700
       #6 [ffff883ffef036e8] do_error_trap at ffffffff8101abfe
       #7 [ffff883ffef037a0] do_invalid_op at ffffffff8101acd0
       #8 [ffff883ffef037b0] invalid_op at ffffffff81a00bab
          [exception RIP: skb_segment+3044]
          RIP: ffffffff817e4dd4  RSP: ffff883ffef03860  RFLAGS: 00010216
          RAX: 0000000000002bf6  RBX: ffff883feb7aaa00  RCX: 0000000000000011
          RDX: ffff883fb87910c0  RSI: 0000000000000011  RDI: ffff883feb7ab500
          RBP: ffff883ffef03928   R8: 0000000000002ce2   R9: 00000000000027da
          R10: 000001ea00000000  R11: 0000000000002d82  R12: ffff883f90a1ee80
          R13: ffff883fb8791120  R14: ffff883feb7abc00  R15: 0000000000002ce2
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       #9 [ffff883ffef03930] tcp_gso_segment at ffffffff818713e7
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13acc94e
  26. 17 3月, 2018 1 次提交
  27. 16 3月, 2018 1 次提交
    • T
      net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off · 4bbb3e0e
      Toshiaki Makita 提交于
      When we have a bridge with vlan_filtering on and a vlan device on top of
      it, packets would be corrupted in skb_vlan_untag() called from
      br_dev_xmit().
      
      The problem sits in skb_reorder_vlan_header() used in skb_vlan_untag(),
      which makes use of skb->mac_len. In this function mac_len is meant for
      handling rx path with vlan devices with reorder_header disabled, but in
      tx path mac_len is typically 0 and cannot be used, which is the problem
      in this case.
      
      The current code even does not properly handle rx path (skb_vlan_untag()
      called from __netif_receive_skb_core()) with reorder_header off actually.
      
      In rx path single tag case, it works as follows:
      
      - Before skb_reorder_vlan_header()
      
       mac_header                                data
         v                                        v
         +-------------------+-------------+------+----
         |        ETH        |    VLAN     | ETH  |
         |       ADDRS       | TPID | TCI  | TYPE |
         +-------------------+-------------+------+----
         <-------- mac_len --------->
                             <------------->
                              to be removed
      
      - After skb_reorder_vlan_header()
      
                  mac_header                     data
                       v                          v
                       +-------------------+------+----
                       |        ETH        | ETH  |
                       |       ADDRS       | TYPE |
                       +-------------------+------+----
                       <-------- mac_len --------->
      
      This is ok, but in rx double tag case, it corrupts packets:
      
      - Before skb_reorder_vlan_header()
      
       mac_header                                              data
         v                                                      v
         +-------------------+-------------+-------------+------+----
         |        ETH        |    VLAN     |    VLAN     | ETH  |
         |       ADDRS       | TPID | TCI  | TPID | TCI  | TYPE |
         +-------------------+-------------+-------------+------+----
         <--------------- mac_len ---------------->
                                           <------------->
                                          should be removed
                             <--------------------------->
                               actually will be removed
      
      - After skb_reorder_vlan_header()
      
                  mac_header                                   data
                       v                                        v
                                     +-------------------+------+----
                                     |        ETH        | ETH  |
                                     |       ADDRS       | TYPE |
                                     +-------------------+------+----
                       <--------------- mac_len ---------------->
      
      So, two of vlan tags are both removed while only inner one should be
      removed and mac_header (and mac_len) is broken.
      
      skb_vlan_untag() is meant for removing the vlan header at (skb->data - 2),
      so use skb->data and skb->mac_header to calculate the right offset.
      Reported-by: NBrandon Carpenter <brandon.carpenter@cypherpath.com>
      Fixes: a6e18ff1 ("vlan: Fix untag operations of stacked vlans with REORDER_HEADER off")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4bbb3e0e
  28. 10 3月, 2018 1 次提交
  29. 05 3月, 2018 2 次提交
  30. 27 2月, 2018 1 次提交
  31. 17 2月, 2018 1 次提交
  32. 09 2月, 2018 1 次提交
    • K
      net: Whitelist the skbuff_head_cache "cb" field · 79a8a642
      Kees Cook 提交于
      Most callers of put_cmsg() use a "sizeof(foo)" for the length argument.
      Within put_cmsg(), a copy_to_user() call is made with a dynamic size, as a
      result of the cmsg header calculations. This means that hardened usercopy
      will examine the copy, even though it was technically a fixed size and
      should be implicitly whitelisted. All the put_cmsg() calls being built
      from values in skbuff_head_cache are coming out of the protocol-defined
      "cb" field, so whitelist this field entirely instead of creating per-use
      bounce buffers, for which there are concerns about performance.
      
      Original report was:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLAB object 'skbuff_head_cache' (offset 64, size 16)!
      WARNING: CPU: 0 PID: 3663 at mm/usercopy.c:81 usercopy_warn+0xdb/0x100 mm/usercopy.c:76
      ...
       __check_heap_object+0x89/0xc0 mm/slab.c:4426
       check_heap_object mm/usercopy.c:236 [inline]
       __check_object_size+0x272/0x530 mm/usercopy.c:259
       check_object_size include/linux/thread_info.h:112 [inline]
       check_copy_size include/linux/thread_info.h:143 [inline]
       copy_to_user include/linux/uaccess.h:154 [inline]
       put_cmsg+0x233/0x3f0 net/core/scm.c:242
       sock_recv_errqueue+0x200/0x3e0 net/core/sock.c:2913
       packet_recvmsg+0xb2e/0x17a0 net/packet/af_packet.c:3296
       sock_recvmsg_nosec net/socket.c:803 [inline]
       sock_recvmsg+0xc9/0x110 net/socket.c:810
       ___sys_recvmsg+0x2a4/0x640 net/socket.c:2179
       __sys_recvmmsg+0x2a9/0xaf0 net/socket.c:2287
       SYSC_recvmmsg net/socket.c:2368 [inline]
       SyS_recvmmsg+0xc4/0x160 net/socket.c:2352
       entry_SYSCALL_64_fastpath+0x29/0xa0
      
      Reported-by: syzbot+e2d6cfb305e9f3911dea@syzkaller.appspotmail.com
      Fixes: 6d07d1cd ("usercopy: Restrict non-usercopy caches to size 0")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79a8a642
  33. 01 2月, 2018 1 次提交
  34. 29 12月, 2017 1 次提交
    • W
      skbuff: in skb_copy_ubufs unclone before releasing zerocopy · f72c4ac6
      Willem de Bruijn 提交于
      skb_copy_ubufs must unclone before it is safe to modify its
      skb_shared_info with skb_zcopy_clear.
      
      Commit b90ddd56 ("skbuff: skb_copy_ubufs must release uarg even
      without user frags") ensures that all skbs release their zerocopy
      state, even those without frags.
      
      But I forgot an edge case where such an skb arrives that is cloned.
      
      The stack does not build such packets. Vhost/tun skbs have their
      frags orphaned before cloning. TCP skbs only attach zerocopy state
      when a frag is added.
      
      But if TCP packets can be trimmed or linearized, this might occur.
      Tracing the code I found no instance so far (e.g., skb_linearize
      ends up calling skb_zcopy_clear if !skb->data_len).
      
      Still, it is non-obvious that no path exists. And it is fragile to
      rely on this.
      
      Fixes: b90ddd56 ("skbuff: skb_copy_ubufs must release uarg even without user frags")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f72c4ac6
  35. 28 12月, 2017 1 次提交
    • W
      skbuff: in skb_segment, call zerocopy functions once per nskb · bf5c25d6
      Willem de Bruijn 提交于
      This is a net-next follow-up to commit 268b7906 ("skbuff: orphan
      frags before zerocopy clone"), which fixed a bug in net, but added a
      call to skb_zerocopy_clone at each frag to do so.
      
      When segmenting skbs with user frags, either the user frags must be
      replaced with private copies and uarg released, or the uarg must have
      its refcount increased for each new skb.
      
      skb_orphan_frags does the first, except for cases that can handle
      reference counting. skb_zerocopy_clone then does the second.
      
      Call these once per nskb, instead of once per frag.
      
      That is, in the common case. With a frag list, also refresh when the
      origin skb (frag_skb) changes.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf5c25d6
  36. 22 12月, 2017 2 次提交
  37. 16 12月, 2017 1 次提交