1. 10 10月, 2020 1 次提交
  2. 22 9月, 2020 5 次提交
    • O
      net/packet: fix overflow in tpacket_rcv · e962d472
      Or Cohen 提交于
      mainline inclusion
      from mainline-v5.9-rc4
      commit acf69c94
      category: bugfix
      bugzilla: NA
      CVE: CVE-2020-14386
      
      --------------------------------
      
      Using tp_reserve to calculate netoff can overflow as
      tp_reserve is unsigned int and netoff is unsigned short.
      
      This may lead to macoff receving a smaller value then
      sizeof(struct virtio_net_hdr), and if po->has_vnet_hdr
      is set, an out-of-bounds write will occur when
      calling virtio_net_hdr_from_skb.
      
      The bug is fixed by converting netoff to unsigned int
      and checking if it exceeds USHRT_MAX.
      
      This addresses CVE-2020-14386
      
      Fixes: 8913336a ("packet: add PACKET_RESERVE sockopt")
      Signed-off-by: NOr Cohen <orcohen@paloaltonetworks.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e962d472
    • E
      net/packet: make tp_drops atomic · 30493af2
      Eric Dumazet 提交于
      mainline inclusion
      from mainline-v5.3-rc1
      commit 8e8e2951
      category: bugfix
      bugzilla: NA
      CVE: CVE-2020-14386
      
      Prepare for fixing CVE-2020-14386.
      --------------------------------
      
      Under DDOS, we want to be able to increment tp_drops without
      touching the spinlock. This will help readers to drain
      the receive queue slightly faster :/
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      30493af2
    • W
      net/packet: tpacket_rcv: avoid a producer race condition · be6d55c8
      Willem de Bruijn 提交于
      stable inclusion
      from linux-4.19.114
      commit 6fb0e4385928900ccb8697748555b3f54bba5193
      
      --------------------------------
      
      [ Upstream commit 61fad681 ]
      
      PACKET_RX_RING can cause multiple writers to access the same slot if a
      fast writer wraps the ring while a slow writer is still copying. This
      is particularly likely with few, large, slots (e.g., GSO packets).
      
      Synchronize kernel thread ownership of rx ring slots with a bitmap.
      
      Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL
      while holding the sk receive queue lock. They release this lock before
      copying and set tp_status to TP_STATUS_USER to release to userspace
      when done. During copying, another writer may take the lock, also see
      TP_STATUS_KERNEL, and start writing to the same slot.
      
      Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a
      slot, test and set with the lock held. To release race-free, update
      tp_status and owner bit as a transaction, so take the lock again.
      
      This is the one of a variety of discussed options (see Link below):
      
      * instead of a shadow ring, embed the data in the slot itself, such as
      in tp_padding. But any test for this field may match a value left by
      userspace, causing deadlock.
      
      * avoid the lock on release. This leaves a small race if releasing the
      shadow slot before setting TP_STATUS_USER. The below reproducer showed
      that this race is not academic. If releasing the slot after tp_status,
      the race is more subtle. See the first link for details.
      
      * add a new tp_status TP_KERNEL_OWNED to avoid the transactional store
      of two fields. But, legacy applications may interpret all non-zero
      tp_status as owned by the user. As libpcap does. So this is possible
      only opt-in by newer processes. It can be added as an optional mode.
      
      * embed the struct at the tail of pg_vec to avoid extra allocation.
      The implementation proved no less complex than a separate field.
      
      The additional locking cost on release adds contention, no different
      than scaling on multicore or multiqueue h/w. In practice, below
      reproducer nor small packet tcpdump showed a noticeable change in
      perf report in cycles spent in spinlock. Where contention is
      problematic, packet sockets support mitigation through PACKET_FANOUT.
      And we can consider adding opt-in state TP_KERNEL_OWNED.
      
      Easy to reproduce by running multiple netperf or similar TCP_STREAM
      flows concurrently with `tcpdump -B 129 -n greater 60000`.
      
      Based on an earlier patchset by Jon Rosen. See links below.
      
      I believe this issue goes back to the introduction of tpacket_rcv,
      which predates git history.
      
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.htmlSuggested-by: NJon Rosen <jrosen@cisco.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NJon Rosen <jrosen@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NLi Aichun <liaichun@huawei.com>
      Reviewed-by: Nguodeqing <geffrey.guo@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      be6d55c8
    • W
      net/packet: tpacket_rcv: do not increment ring index on drop · d3fe424a
      Willem de Bruijn 提交于
      stable inclusion
      from linux-4.19.111
      commit 64fabf9bcadfb93fb21fb023a72a2ad8b7a40047
      
      --------------------------------
      
      [ Upstream commit 46e4c421 ]
      
      In one error case, tpacket_rcv drops packets after incrementing the
      ring producer index.
      
      If this happens, it does not update tp_status to TP_STATUS_USER and
      thus the reader is stalled for an iteration of the ring, causing out
      of order arrival.
      
      The only such error path is when virtio_net_hdr_from_skb fails due
      to encountering an unknown GSO type.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NLi Aichun <liaichun@huawei.com>
      Reviewed-by: Nguodeqing <geffrey.guo@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d3fe424a
    • E
      packet: fix data-race in fanout_flow_is_huge() · 94ff8a53
      Eric Dumazet 提交于
      stable inclusion
      from linux-4.19.99
      commit ab6c0f501d2cecf8c74cf8293fca9c2b780dd55b
      
      --------------------------------
      
      [ Upstream commit b756ad92 ]
      
      KCSAN reported the following data-race [1]
      
      Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it.
      
      Since the report hinted about multiple cpus using the history
      concurrently, I added a test avoiding writing on it if the
      victim slot already contains the desired value.
      
      [1]
      
      BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover
      
      read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1:
       fanout_flow_is_huge net/packet/af_packet.c:1303 [inline]
       fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0:
       fanout_flow_is_huge net/packet/af_packet.c:1306 [inline]
       fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3b3a5b0a ("packet: rollover huge flows before small flows")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NLi Aichun <liaichun@huawei.com>
      Reviewed-by: Nguodeqing <geffrey.guo@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      94ff8a53
  3. 27 12月, 2019 16 次提交
    • M
      af_packet: set defaule value for tmo · e35230dd
      Mao Wenan 提交于
      mainline inclusion
      from mainline-v5.5
      commit b43d1f9f7067c6759b1051e8ecb84e82cef569fe
      category: bugfix
      bugzilla: NA
      CVE: NA
      
      -------------------------------------------------
      
      There is softlockup when using TPACKET_V3:
      ...
      NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
      (__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
      (_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c)
      (mod_timer) from [<c0549c30>]
      (prb_retire_rx_blk_timer_expired+0x68/0x11c)
      (prb_retire_rx_blk_timer_expired) from [<c027a7ac>]
      (call_timer_fn+0x90/0x17c)
      (call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc)
      (run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318)
      (__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac)
      (irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4)
      (msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4)
      (handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118)
      (gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c)
      ...
      
      If __ethtool_get_link_ksettings() is failed in
      prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
      is zero and the timer expire for retire_blk_timer is turn to
      mod_timer(&pkc->retire_blk_timer, jiffies + 0),
      which will trigger cpu usage of softirq is 100%.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Tested-by: NXiao Jiangfeng <xiaojiangfeng@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Reviewed-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e35230dd
    • E
      net/packet: fix race in tpacket_snd() · 5ab72d34
      Eric Dumazet 提交于
      [ Upstream commit 32d3182c ]
      
      packet_sendmsg() checks tx_ring.pg_vec to decide
      if it must call tpacket_snd().
      
      Problem is that the check is lockless, meaning another thread
      can issue a concurrent setsockopt(PACKET_TX_RING ) to flip
      tx_ring.pg_vec back to NULL.
      
      Given that tpacket_snd() grabs pg_vec_lock mutex, we can
      perform the check again to solve the race.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474
      Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00
      RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c
      RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50
      R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000
      R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000
      FS:  00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       packet_current_frame net/packet/af_packet.c:487 [inline]
       tpacket_snd net/packet/af_packet.c:2667 [inline]
       packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5ab72d34
    • E
      net/packet: fix memory leak in packet_set_ring() · 9593dc8a
      Eric Dumazet 提交于
      [ Upstream commit 55655e3d ]
      
      syzbot found we can leak memory in packet_set_ring(), if user application
      provides buggy parameters.
      
      Fixes: 7f953ab2 ("af_packet: TX_RING support for TPACKET_V3")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9593dc8a
    • N
      af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET · 05f37b55
      Neil Horman 提交于
      [ Upstream commit 89ed5b51 ]
      
      When an application is run that:
      a) Sets its scheduler to be SCHED_FIFO
      and
      b) Opens a memory mapped AF_PACKET socket, and sends frames with the
      MSG_DONTWAIT flag cleared, its possible for the application to hang
      forever in the kernel.  This occurs because when waiting, the code in
      tpacket_snd calls schedule, which under normal circumstances allows
      other tasks to run, including ksoftirqd, which in some cases is
      responsible for freeing the transmitted skb (which in AF_PACKET calls a
      destructor that flips the status bit of the transmitted frame back to
      available, allowing the transmitting task to complete).
      
      However, when the calling application is SCHED_FIFO, its priority is
      such that the schedule call immediately places the task back on the cpu,
      preventing ksoftirqd from freeing the skb, which in turn prevents the
      transmitting task from detecting that the transmission is complete.
      
      We can fix this by converting the schedule call to a completion
      mechanism.  By using a completion queue, we force the calling task, when
      it detects there are no more frames to send, to schedule itself off the
      cpu until such time as the last transmitted skb is freed, allowing
      forward progress to be made.
      
      Tested by myself and the reporter, with good results
      
      Change Notes:
      
      V1->V2:
      	Enhance the sleep logic to support being interruptible and
      allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
      
      V2->V3:
      	Rearrage the point at which we wait for the completion queue, to
      avoid needing to check for ph/skb being null at the end of the loop.
      Also move the complete call to the skb destructor to avoid needing to
      modify __packet_set_status.  Also gate calling complete on
      packet_read_pending returning zero to avoid multiple calls to complete.
      (Willem de Bruijn)
      
      	Move timeo computation within loop, to re-fetch the socket
      timeout since we also use the timeo variable to record the return code
      from the wait_for_complete call (Neil Horman)
      
      V3->V4:
      	Willem has requested that the control flow be restored to the
      previous state.  Doing so lets us eliminate the need for the
      po->wait_on_complete flag variable, and lets us get rid of the
      packet_next_frame function, but introduces another complexity.
      Specifically, but using the packet pending count, we can, if an
      applications calls sendmsg multiple times with MSG_DONTWAIT set, each
      set of transmitted frames, when complete, will cause
      tpacket_destruct_skb to issue a complete call, for which there will
      never be a wait_on_completion call.  This imbalance will lead to any
      future call to wait_for_completion here to return early, when the frames
      they sent may not have completed.  To correct this, we need to re-init
      the completion queue on every call to tpacket_snd before we enter the
      loop so as to ensure we wait properly for the frames we send in this
      iteration.
      
      	Change the timeout and interrupted gotos to out_put rather than
      out_status so that we don't try to free a non-existant skb
      	Clean up some extra newlines (Willem de Bruijn)
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      05f37b55
    • W
      packet: unconditionally free po->rollover · 1037d951
      Willem de Bruijn 提交于
      [ Upstream commit afa0925c ]
      
      Rollover used to use a complex RCU mechanism for assignment, which had
      a race condition. The below patch fixed the bug and greatly simplified
      the logic.
      
      The feature depends on fanout, but the state is private to the socket.
      Fanout_release returns f only when the last member leaves and the
      fanout struct is to be freed.
      
      Destroy rollover unconditionally, regardless of fanout state.
      
      Fixes: 57f015f5 ("packet: fix crash in fanout_demux_rollover()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Diagnosed-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      1037d951
    • W
      packet: in recvmsg msg_name return at least sizeof sockaddr_ll · 6c566167
      Willem de Bruijn 提交于
      mainline inclusion
      from mainline-5.1
      commit b2cf86e1
      category: bugfix
      bugzilla: 14326
      CVE: NA
      
      -------------------------------------------------
      
      Packet send checks that msg_name is at least sizeof sockaddr_ll.
      Packet recv must return at least this length, so that its output
      can be passed unmodified to packet send.
      
      This ceased to be true since adding support for lladdr longer than
      sll_addr. Since, the return value uses true address length.
      
      Always return at least sizeof sockaddr_ll, even if address length
      is shorter. Zero the padding bytes.
      
      Change v1->v2: do not overwrite zeroed padding again. use copy_len.
      
      Fixes: 0fb375fb ("[AF_PACKET]: Allow for > 8 byte hardware addresses.")
      Suggested-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: NWenan Mao <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      6c566167
    • Y
      packet: Fix error path in packet_init · 9d790132
      YueHaibing 提交于
      [ Upstream commit 36096f2f ]
      
      kernel BUG at lib/list_debug.c:47!
      invalid opcode: 0000 [#1
      CPU: 0 PID: 12914 Comm: rmmod Tainted: G        W         5.1.0+ #47
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
      RIP: 0010:__list_del_entry_valid+0x53/0x90
      Code: 48 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 00 5d c3 48
      89 fe 48 89 c2 48 c7 c7 18 75 fe 82 e8 cb 34 78 ff <0f> 0b 48 89 fe 48 c7 c7 50 75 fe 82 e8 ba 34 78 ff 0f 0b 48 89 f2
      RSP: 0018:ffffc90001c2fe40 EFLAGS: 00010286
      RAX: 000000000000004e RBX: ffffffffa0184000 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: ffff888237a17788 RDI: 00000000ffffffff
      RBP: ffffc90001c2fe40 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffc90001c2fe10 R11: 0000000000000000 R12: 0000000000000000
      R13: ffffc90001c2fe50 R14: ffffffffa0184000 R15: 0000000000000000
      FS:  00007f3d83634540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000555c350ea818 CR3: 0000000231677000 CR4: 00000000000006f0
      Call Trace:
       unregister_pernet_operations+0x34/0x120
       unregister_pernet_subsys+0x1c/0x30
       packet_exit+0x1c/0x369 [af_packet
       __x64_sys_delete_module+0x156/0x260
       ? lockdep_hardirqs_on+0x133/0x1b0
       ? do_syscall_64+0x12/0x1f0
       do_syscall_64+0x6e/0x1f0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      When modprobe af_packet, register_pernet_subsys
      fails and does a cleanup, ops->list is set to LIST_POISON1,
      but the module init is considered to success, then while rmmod it,
      BUG() is triggered in __list_del_entry_valid which is called from
      unregister_pernet_subsys. This patch fix error handing path in
      packet_init to avoid possilbe issue if some error occur.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      9d790132
    • W
      packet: validate msg_namelen in send directly · e734219e
      Willem de Bruijn 提交于
      [ Upstream commit 486efdc8 ]
      
      Packet sockets in datagram mode take a destination address. Verify its
      length before passing to dev_hard_header.
      
      Prior to 2.6.14-rc3, the send code ignored sll_halen. This is
      established behavior. Directly compare msg_namelen to dev->addr_len.
      
      Change v1->v2: initialize addr in all paths
      
      Fixes: 6b8d95f1 ("packet: validate address length if non-zero")
      Suggested-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      e734219e
    • M
      packets: Always register packet sk in the same order · 90e84afa
      Maxime Chevallier 提交于
      [ Upstream commit a4dc6a49 ]
      
      When using fanouts with AF_PACKET, the demux functions such as
      fanout_demux_cpu will return an index in the fanout socket array, which
      corresponds to the selected socket.
      
      The ordering of this array depends on the order the sockets were added
      to a given fanout group, so for FANOUT_CPU this means sockets are bound
      to cpus in the order they are configured, which is OK.
      
      However, when stopping then restarting the interface these sockets are
      bound to, the sockets are reassigned to the fanout group in the reverse
      order, due to the fact that they were inserted at the head of the
      interface's AF_PACKET socket list.
      
      This means that traffic that was directed to the first socket in the
      fanout group is now directed to the last one after an interface restart.
      
      In the case of FANOUT_CPU, traffic from CPU0 will be directed to the
      socket that used to receive traffic from the last CPU after an interface
      restart.
      
      This commit introduces a helper to add a socket at the tail of a list,
      then uses it to register AF_PACKET sockets.
      
      Note that this changes the order in which sockets are listed in /proc and
      with sock_diag.
      
      Fixes: dc99f600 ("packet: Add fanout support")
      Signed-off-by: NMaxime Chevallier <maxime.chevallier@bootlin.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      90e84afa
    • C
      net/packet: Set __GFP_NOWARN upon allocation in alloc_pg_vec · 2d1ee4ee
      Christoph Paasch 提交于
      [ Upstream commit 398f0132 ]
      
      Since commit fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      one can now allocate packet ring buffers >= UINT_MAX. However, syzkaller
      found that that triggers a warning:
      
      [   21.100000] WARNING: CPU: 2 PID: 2075 at mm/page_alloc.c:4584 __alloc_pages_nod0
      [   21.101490] Modules linked in:
      [   21.101921] CPU: 2 PID: 2075 Comm: syz-executor.0 Not tainted 5.0.0 #146
      [   21.102784] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.5.1 01/01/2011
      [   21.103887] RIP: 0010:__alloc_pages_nodemask+0x2a0/0x630
      [   21.104640] Code: fe ff ff 65 48 8b 04 25 c0 de 01 00 48 05 90 0f 00 00 41 bd 01 00 00 00 48 89 44 24 48 e9 9c fe 3
      [   21.107121] RSP: 0018:ffff88805e1cf920 EFLAGS: 00010246
      [   21.107819] RAX: 0000000000000000 RBX: ffffffff85a488a0 RCX: 0000000000000000
      [   21.108753] RDX: 0000000000000000 RSI: dffffc0000000000 RDI: 0000000000000000
      [   21.109699] RBP: 1ffff1100bc39f28 R08: ffffed100bcefb67 R09: ffffed100bcefb67
      [   21.110646] R10: 0000000000000001 R11: ffffed100bcefb66 R12: 000000000000000d
      [   21.111623] R13: 0000000000000000 R14: ffff88805e77d888 R15: 000000000000000d
      [   21.112552] FS:  00007f7c7de05700(0000) GS:ffff88806d100000(0000) knlGS:0000000000000000
      [   21.113612] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   21.114405] CR2: 000000000065c000 CR3: 000000005e58e006 CR4: 00000000001606e0
      [   21.115367] Call Trace:
      [   21.115705]  ? __alloc_pages_slowpath+0x21c0/0x21c0
      [   21.116362]  alloc_pages_current+0xac/0x1e0
      [   21.116923]  kmalloc_order+0x18/0x70
      [   21.117393]  kmalloc_order_trace+0x18/0x110
      [   21.117949]  packet_set_ring+0x9d5/0x1770
      [   21.118524]  ? packet_rcv_spkt+0x440/0x440
      [   21.119094]  ? lock_downgrade+0x620/0x620
      [   21.119646]  ? __might_fault+0x177/0x1b0
      [   21.120177]  packet_setsockopt+0x981/0x2940
      [   21.120753]  ? __fget+0x2fb/0x4b0
      [   21.121209]  ? packet_release+0xab0/0xab0
      [   21.121740]  ? sock_has_perm+0x1cd/0x260
      [   21.122297]  ? selinux_secmark_relabel_packet+0xd0/0xd0
      [   21.123013]  ? __fget+0x324/0x4b0
      [   21.123451]  ? selinux_netlbl_socket_setsockopt+0x101/0x320
      [   21.124186]  ? selinux_netlbl_sock_rcv_skb+0x3a0/0x3a0
      [   21.124908]  ? __lock_acquire+0x529/0x3200
      [   21.125453]  ? selinux_socket_setsockopt+0x5d/0x70
      [   21.126075]  ? __sys_setsockopt+0x131/0x210
      [   21.126533]  ? packet_release+0xab0/0xab0
      [   21.127004]  __sys_setsockopt+0x131/0x210
      [   21.127449]  ? kernel_accept+0x2f0/0x2f0
      [   21.127911]  ? ret_from_fork+0x8/0x50
      [   21.128313]  ? do_raw_spin_lock+0x11b/0x280
      [   21.128800]  __x64_sys_setsockopt+0xba/0x150
      [   21.129271]  ? lockdep_hardirqs_on+0x37f/0x560
      [   21.129769]  do_syscall_64+0x9f/0x450
      [   21.130182]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      We should allocate with __GFP_NOWARN to handle this.
      
      Cc: Kal Conley <kal.conley@dectris.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Fixes: fc62814d ("net/packet: fix 4gb buffer limit due to overflow check")
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2d1ee4ee
    • K
      net/packet: fix 4gb buffer limit due to overflow check · 5879a28e
      Kal Conley 提交于
      mainline inclusion
      from mainline-v5.0
      commit fc62814d
      category: bugfix
      bugzilla: 9556
      CVE: NA
      
      -------------------------------------------------
      
      When calculating rb->frames_per_block * req->tp_block_nr the result
      can overflow. Check it for overflow without limiting the total buffer
      size to UINT_MAX.
      
      This change fixes support for packet ring buffers >= UINT_MAX.
      
      Fixes: 8f8d28e4 ("net/packet: fix overflow in check for tp_frame_nr")
      Signed-off-by: NKal Conley <kal.conley@dectris.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NShangli <shangli1@huawei.com>
      Reviewed-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      5879a28e
    • W
      net: add missing SOF_TIMESTAMPING_OPT_ID support · ac73ebc1
      Willem de Bruijn 提交于
      mainline inclusion
      from mainline-4.20
      commit 8f932f76
      category: bugfix
      bugzilla: 6159
      CVE: NA
      
      -------------------------------------------------
      
      SOF_TIMESTAMPING_OPT_ID is supported on TCP, UDP and RAW sockets.
      But it was missing on RAW with IPPROTO_IP, PF_PACKET and CAN.
      
      Add skb_setup_tx_timestamp that configures both tx_flags and tskey
      for these paths that do not need corking or use bytestream keys.
      
      Fixes: 09c2d251 ("net-timestamp: add key to disambiguate concurrent datagrams")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NLin Miaohe <linmiaohe@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ac73ebc1
    • N
      af_packet: fix raw sockets over 6in4 tunnel · ce024ef6
      Nicolas Dichtel 提交于
      mainline inclusion
      from mainline-5.0
      commit 88a8121d
      category: bugfix
      bugzilla: 7143
      CVE: NA
      
      -------------------------------------------------
      
      Since commit cb9f1b78, scapy (which uses an AF_PACKET socket in
      SOCK_RAW mode) is unable to send a basic icmp packet over a sit tunnel:
      
      Here is a example of the setup:
      $ ip link set ntfp2 up
      $ ip addr add 10.125.0.1/24 dev ntfp2
      $ ip tunnel add tun1 mode sit ttl 64 local 10.125.0.1 remote 10.125.0.2 dev ntfp2
      $ ip addr add fd00:cafe:cafe::1/128 dev tun1
      $ ip link set dev tun1 up
      $ ip route add fd00:200::/64 dev tun1
      $ scapy
      >>> p = []
      >>> p += IPv6(src='fd00:100::1', dst='fd00:200::1')/ICMPv6EchoRequest()
      >>> send(p, count=1, inter=0.1)
      >>> quit()
      $ ip -s link ls dev tun1 | grep -A1 "TX.*errors"
          TX: bytes  packets  errors  dropped carrier collsns
          0          0        1       0       0       0
      
      The problem is that the network offset is set to the hard_header_len of the
      output device (tun1, ie 14 + 20) and in our case, because the packet is
      small (48 bytes) the pskb_inet_may_pull() fails (it tries to pull 40 bytes
      (ipv6 header) starting from the network offset).
      
      This problem is more generally related to device with variable hard header
      length. To avoid a too intrusive patch in the current release, a (ugly)
      workaround is proposed in this patch. It has to be cleaned up in net-next.
      
      Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=993675a3100b1
      Link: http://patchwork.ozlabs.org/patch/1024489/
      Fixes: cb9f1b78 ("ip: validate header length on virtual device xmit")
      CC: Willem de Bruijn <willemb@google.com>
      CC: Maxim Mikityanskiy <maximmi@mellanox.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      ce024ef6
    • J
      packet: Do not leak dev refcounts on error exit · 93e0a147
      Jason Gunthorpe 提交于
      [ Upstream commit d972f3dc ]
      
      'dev' is non NULL when the addr_len check triggers so it must goto a label
      that does the dev_put otherwise dev will have a leaked refcount.
      
      This bug causes the ib_ipoib module to become unloadable when using
      systemd-network as it triggers this check on InfiniBand links.
      
      Fixes: 99137b78 ("packet: validate address length")
      Reported-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      93e0a147
    • W
      packet: validate address length if non-zero · b5f9b26a
      Willem de Bruijn 提交于
      [ Upstream commit 6b8d95f1 ]
      
      Validate packet socket address length if a length is given. Zero
      length is equivalent to not setting an address.
      
      Fixes: 99137b78 ("packet: validate address length")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      b5f9b26a
    • W
      packet: validate address length · 729aad07
      Willem de Bruijn 提交于
      [ Upstream commit 99137b78 ]
      
      Packet sockets with SOCK_DGRAM may pass an address for use in
      dev_hard_header. Ensure that it is of sufficient length.
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      729aad07
  4. 06 12月, 2018 1 次提交
    • W
      packet: copy user buffers before orphan or clone · f2a67e68
      Willem de Bruijn 提交于
      [ Upstream commit 5cd8d46ea1562be80063f53c7c6a5f40224de623 ]
      
      tpacket_snd sends packets with user pages linked into skb frags. It
      notifies that pages can be reused when the skb is released by setting
      skb->destructor to tpacket_destruct_skb.
      
      This can cause data corruption if the skb is orphaned (e.g., on
      transmit through veth) or cloned (e.g., on mirror to another psock).
      
      Create a kernel-private copy of data in these cases, same as tun/tap
      zerocopy transmission. Reuse that infrastructure: mark the skb as
      SKBTX_ZEROCOPY_FRAG, which will trigger copy in skb_orphan_frags(_rx).
      
      Unlike other zerocopy packets, do not set shinfo destructor_arg to
      struct ubuf_info. tpacket_destruct_skb already uses that ptr to notify
      when the original skb is released and a timestamp is recorded. Do not
      change this timestamp behavior. The ubuf_info->callback is not needed
      anyway, as no zerocopy notification is expected.
      
      Mark destructor_arg as not-a-uarg by setting the lower bit to 1. The
      resulting value is not a valid ubuf_info pointer, nor a valid
      tpacket_snd frame address. Add skb_zcopy_.._nouarg helpers for this.
      
      The fix relies on features introduced in commit 52267790 ("sock:
      add MSG_ZEROCOPY"), so can be backported as is only to 4.14.
      
      Tested with from `./in_netns.sh ./txring_overwrite` from
      http://github.com/wdebruij/kerneltools/tests
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Reported-by: NAnand H. Krishnan <anandhkrishnan@gmail.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f2a67e68
  5. 05 10月, 2018 1 次提交
  6. 01 9月, 2018 1 次提交
  7. 14 8月, 2018 1 次提交
  8. 07 8月, 2018 1 次提交
    • W
      packet: refine ring v3 block size test to hold one frame · 4576cd46
      Willem de Bruijn 提交于
      TPACKET_V3 stores variable length frames in fixed length blocks.
      Blocks must be able to store a block header, optional private space
      and at least one minimum sized frame.
      
      Frames, even for a zero snaplen packet, store metadata headers and
      optional reserved space.
      
      In the block size bounds check, ensure that the frame of the
      chosen configuration fits. This includes sockaddr_ll and optional
      tp_reserve.
      
      Syzbot was able to construct a ring with insuffient room for the
      sockaddr_ll in the header of a zero-length frame, triggering an
      out-of-bounds write in dev_parse_header.
      
      Convert the comparison to less than, as zero is a valid snap len.
      This matches the test for minimum tp_frame_size immediately below.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Fixes: eb73190f ("net/packet: refine check for priv area size")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4576cd46
  9. 05 8月, 2018 1 次提交
  10. 13 7月, 2018 1 次提交
  11. 10 7月, 2018 2 次提交
  12. 07 7月, 2018 1 次提交
  13. 04 7月, 2018 1 次提交
  14. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  15. 22 6月, 2018 1 次提交
    • E
      net/packet: fix use-after-free · 945d015e
      Eric Dumazet 提交于
      We should put copy_skb in receive_queue only after
      a successful call to virtio_net_hdr_from_skb().
      
      syzbot report :
      
      BUG: KASAN: use-after-free in __skb_unlink include/linux/skbuff.h:1843 [inline]
      BUG: KASAN: use-after-free in __skb_dequeue include/linux/skbuff.h:1863 [inline]
      BUG: KASAN: use-after-free in skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
      Read of size 8 at addr ffff8801b044ecc0 by task syz-executor217/4553
      
      CPU: 0 PID: 4553 Comm: syz-executor217 Not tainted 4.18.0-rc1+ #111
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       __skb_unlink include/linux/skbuff.h:1843 [inline]
       __skb_dequeue include/linux/skbuff.h:1863 [inline]
       skb_dequeue+0x16a/0x180 net/core/skbuff.c:2815
       skb_queue_purge+0x26/0x40 net/core/skbuff.c:2852
       packet_set_ring+0x675/0x1da0 net/packet/af_packet.c:4331
       packet_release+0x630/0xd90 net/packet/af_packet.c:2991
       __sock_release+0xd7/0x260 net/socket.c:603
       sock_close+0x19/0x20 net/socket.c:1186
       __fput+0x35b/0x8b0 fs/file_table.c:209
       ____fput+0x15/0x20 fs/file_table.c:243
       task_work_run+0x1ec/0x2a0 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x1b08/0x2750 kernel/exit.c:865
       do_group_exit+0x177/0x440 kernel/exit.c:968
       __do_sys_exit_group kernel/exit.c:979 [inline]
       __se_sys_exit_group kernel/exit.c:977 [inline]
       __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:977
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x4448e9
      Code: Bad RIP value.
      RSP: 002b:00007ffd5f777ca8 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004448e9
      RDX: 00000000004448e9 RSI: 000000000000fcfb RDI: 0000000000000001
      RBP: 00000000006cf018 R08: 00007ffd0000a45b R09: 0000000000000000
      R10: 00007ffd5f777e48 R11: 0000000000000202 R12: 00000000004021f0
      R13: 0000000000402280 R14: 0000000000000000 R15: 0000000000000000
      
      Allocated by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554
       skb_clone+0x1f5/0x500 net/core/skbuff.c:1282
       tpacket_rcv+0x28f7/0x3200 net/packet/af_packet.c:2221
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 4553:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x86/0x2d0 mm/slab.c:3756
       kfree_skbmem+0x154/0x230 net/core/skbuff.c:582
       __kfree_skb net/core/skbuff.c:642 [inline]
       kfree_skb+0x1a5/0x580 net/core/skbuff.c:659
       tpacket_rcv+0x189e/0x3200 net/packet/af_packet.c:2385
       deliver_skb net/core/dev.c:1925 [inline]
       deliver_ptype_list_skb net/core/dev.c:1940 [inline]
       __netif_receive_skb_core+0x1bfb/0x3680 net/core/dev.c:4611
       __netif_receive_skb+0x2c/0x1e0 net/core/dev.c:4693
       netif_receive_skb_internal+0x12e/0x7d0 net/core/dev.c:4767
       netif_receive_skb+0xbf/0x420 net/core/dev.c:4791
       tun_rx_batched.isra.55+0x4ba/0x8c0 drivers/net/tun.c:1571
       tun_get_user+0x2af1/0x42f0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xb9/0x154 drivers/net/tun.c:2009
       call_write_iter include/linux/fs.h:1795 [inline]
       new_sync_write fs/read_write.c:474 [inline]
       __vfs_write+0x6c6/0x9f0 fs/read_write.c:487
       vfs_write+0x1f8/0x560 fs/read_write.c:549
       ksys_write+0x101/0x260 fs/read_write.c:598
       __do_sys_write fs/read_write.c:610 [inline]
       __se_sys_write fs/read_write.c:607 [inline]
       __x64_sys_write+0x73/0xb0 fs/read_write.c:607
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The buggy address belongs to the object at ffff8801b044ecc0
       which belongs to the cache skbuff_head_cache of size 232
      The buggy address is located 0 bytes inside of
       232-byte region [ffff8801b044ecc0, ffff8801b044eda8)
      The buggy address belongs to the page:
      page:ffffea0006c11380 count:1 mapcount:0 mapping:ffff8801d9be96c0 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffffea0006c17988 ffff8801d9bec248 ffff8801d9be96c0
      raw: 0000000000000000 ffff8801b044e040 000000010000000c 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801b044eb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801b044ec00: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
      >ffff8801b044ec80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
                                                 ^
       ffff8801b044ed00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff8801b044ed80: fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc
      
      Fixes: 58d19b19 ("packet: vnet_hdr support for tpacket_rcv")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      945d015e
  16. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc() · fad953ce
      Kees Cook 提交于
      The vzalloc() function has no 2-factor argument form, so multiplication
      factors need to be wrapped in array_size(). This patch replaces cases of:
      
              vzalloc(a * b)
      
      with:
              vzalloc(array_size(a, b))
      
      as well as handling cases of:
      
              vzalloc(a * b * c)
      
      with:
      
              vzalloc(array3_size(a, b, c))
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc(4 * 1024)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc(C1 * C2 * C3, ...)
      |
        vzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc(C1 * C2, ...)
      |
        vzalloc(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fad953ce
  17. 08 6月, 2018 1 次提交
    • W
      net: in virtio_net_hdr only add VLAN_HLEN to csum_start if payload holds vlan · fd3a8862
      Willem de Bruijn 提交于
      Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
      to communicate packet metadata to userspace.
      
      For skbuffs with vlan, the first two return the packet as it may have
      existed on the wire, inserting the VLAN tag in the user buffer.  Then
      virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
      
      Commit f09e2249 ("macvtap: restore vlan header on user read")
      added this feature to macvtap. Commit 3ce9b20f ("macvtap: Fix
      csum_start when VLAN tags are present") then fixed up csum_start.
      
      Virtio, packet and uml do not insert the vlan header in the user
      buffer.
      
      When introducing virtio_net_hdr_from_skb to deduplicate filling in
      the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
      applied uniformly, breaking csum offset for packets with vlan on
      virtio and packet.
      
      Make insertion of VLAN_HLEN optional. Convert the callers to pass it
      when needed.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Fixes: 1276f24e ("packet: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd3a8862
  18. 04 6月, 2018 1 次提交
    • E
      net/packet: refine check for priv area size · eb73190f
      Eric Dumazet 提交于
      syzbot was able to trick af_packet again [1]
      
      Various commits tried to address the problem in the past,
      but failed to take into account V3 header size.
      
      [1]
      
      tpacket_rcv: packet too big, clamped from 72 to 4294967224. macoff=96
      BUG: KASAN: use-after-free in prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
      BUG: KASAN: use-after-free in prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
      Write of size 2 at addr ffff8801cb62000e by task kworker/1:2/2106
      
      CPU: 1 PID: 2106 Comm: kworker/1:2 Not tainted 4.17.0-rc7+ #77
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Workqueue: ipv6_addrconf addrconf_dad_work
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1b9/0x294 lib/dump_stack.c:113
       print_address_description+0x6c/0x20b mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
       __asan_report_store2_noabort+0x17/0x20 mm/kasan/report.c:436
       prb_run_all_ft_ops net/packet/af_packet.c:1016 [inline]
       prb_fill_curr_block.isra.59+0x4e5/0x5c0 net/packet/af_packet.c:1039
       __packet_lookup_frame_in_block net/packet/af_packet.c:1094 [inline]
       packet_current_rx_frame net/packet/af_packet.c:1117 [inline]
       tpacket_rcv+0x1866/0x3340 net/packet/af_packet.c:2282
       dev_queue_xmit_nit+0x891/0xb90 net/core/dev.c:2018
       xmit_one net/core/dev.c:3049 [inline]
       dev_hard_start_xmit+0x16b/0xc10 net/core/dev.c:3069
       __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
       neigh_resolve_output+0x679/0xad0 net/core/neighbour.c:1358
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xc9c/0x2810 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
       NF_HOOK_COND include/linux/netfilter.h:277 [inline]
       ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
       dst_output include/net/dst.h:444 [inline]
       NF_HOOK include/linux/netfilter.h:288 [inline]
       ndisc_send_skb+0x100d/0x1570 net/ipv6/ndisc.c:491
       ndisc_send_ns+0x3c1/0x8d0 net/ipv6/ndisc.c:633
       addrconf_dad_work+0xbef/0x1340 net/ipv6/addrconf.c:4033
       process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
       worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
       kthread+0x345/0x410 kernel/kthread.c:240
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
      
      The buggy address belongs to the page:
      page:ffffea00072d8800 count:0 mapcount:-127 mapping:0000000000000000 index:0xffff8801cb620e80
      flags: 0x2fffc0000000000()
      raw: 02fffc0000000000 0000000000000000 ffff8801cb620e80 00000000ffffff80
      raw: ffffea00072e3820 ffffea0007132d20 0000000000000002 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801cb61ff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8801cb61ff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8801cb620000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                            ^
       ffff8801cb620080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff8801cb620100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      Fixes: 2b6867c2 ("net/packet: fix overflow in check for priv area size")
      Fixes: dc808110 ("packet: handle too big packets for PACKET_V3")
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb73190f
  19. 26 5月, 2018 1 次提交
  20. 25 5月, 2018 1 次提交
    • W
      packet: fix reserve calculation · 9aad13b0
      Willem de Bruijn 提交于
      Commit b84bbaf7 ("packet: in packet_snd start writing at link
      layer allocation") ensures that packet_snd always starts writing
      the link layer header in reserved headroom allocated for this
      purpose.
      
      This is needed because packets may be shorter than hard_header_len,
      in which case the space up to hard_header_len may be zeroed. But
      that necessary padding is not accounted for in skb->len.
      
      The fix, however, is buggy. It calls skb_push, which grows skb->len
      when moving skb->data back. But in this case packet length should not
      change.
      
      Instead, call skb_reserve, which moves both skb->data and skb->tail
      back, without changing length.
      
      Fixes: b84bbaf7 ("packet: in packet_snd start writing at link layer allocation")
      Reported-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9aad13b0