1. 14 1月, 2022 1 次提交
  2. 12 10月, 2021 3 次提交
  3. 15 6月, 2021 1 次提交
  4. 03 6月, 2021 2 次提交
  5. 09 3月, 2021 1 次提交
    • Y
      net: fix proc_fs init handling in af_packet and tls · 7c06e2b6
      Yonatan Linik 提交于
      stable inclusion
      from stable-5.10.18
      commit 06ab1e63ec5c72255fae316c59ea9c0475eadac8
      bugzilla: 50148
      
      --------------------------------
      
      [ Upstream commit a268e0f2 ]
      
      proc_fs was used, in af_packet, without a surrounding #ifdef,
      although there is no hard dependency on proc_fs.
      That caused the initialization of the af_packet module to fail
      when CONFIG_PROC_FS=n.
      
      Specifically, proc_create_net() was used in af_packet.c,
      and when it fails, packet_net_init() returns -ENOMEM.
      It will always fail when the kernel is compiled without proc_fs,
      because, proc_create_net() for example always returns NULL.
      
      The calling order that starts in af_packet.c is as follows:
      packet_init()
      register_pernet_subsys()
      register_pernet_operations()
      __register_pernet_operations()
      ops_init()
      ops->init() (packet_net_ops.init=packet_net_init())
      proc_create_net()
      
      It worked in the past because register_pernet_subsys()'s return value
      wasn't checked before this Commit 36096f2f ("packet: Fix error path in
      packet_init.").
      It always returned an error, but was not checked before, so everything
      was working even when CONFIG_PROC_FS=n.
      
      The fix here is simply to add the necessary #ifdef.
      
      This also fixes a similar error in tls_proc.c, that was found by Jakub
      Kicinski.
      
      Fixes: d26b698d ("net/tls: add skeleton of MIB statistics")
      Fixes: 36096f2f ("packet: Fix error path in packet_init")
      Signed-off-by: NYonatan Linik <yonatanlinik@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NChen Jun <chenjun102@huawei.com>
      Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
      7c06e2b6
  6. 24 11月, 2020 1 次提交
    • E
      net/packet: fix packet receive on L3 devices without visible hard header · d5496990
      Eyal Birger 提交于
      In the patchset merged by commit b9fcf0a0
      ("Merge branch 'support-AF_PACKET-for-layer-3-devices'") L3 devices which
      did not have header_ops were given one for the purpose of protocol parsing
      on af_packet transmit path.
      
      That change made af_packet receive path regard these devices as having a
      visible L3 header and therefore aligned incoming skb->data to point to the
      skb's mac_header. Some devices, such as ipip, xfrmi, and others, do not
      reset their mac_header prior to ingress and therefore their incoming
      packets became malformed.
      
      Ideally these devices would reset their mac headers, or af_packet would be
      able to rely on dev->hard_header_len being 0 for such cases, but it seems
      this is not the case.
      
      Fix by changing af_packet RX ll visibility criteria to include the
      existence of a '.create()' header operation, which is used when creating
      a device hard header - via dev_hard_header() - by upper layers, and does
      not exist in these L3 devices.
      
      As this predicate may be useful in other situations, add it as a common
      dev_has_header() helper in netdevice.h.
      
      Fixes: b9fcf0a0 ("Merge branch 'support-AF_PACKET-for-layer-3-devices'")
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Acked-by: NJason A. Donenfeld <Jason@zx2c4.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/r/20201121062817.3178900-1-eyal.birger@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      d5496990
  7. 20 9月, 2020 1 次提交
  8. 18 9月, 2020 1 次提交
    • X
      net/packet: Fix a comment about mac_header · b79a80bd
      Xie He 提交于
      1. Change all "dev->hard_header" to "dev->header_ops"
      
      2. On receiving incoming frames when header_ops == NULL:
      
      The comment only says what is wrong, but doesn't say what is right.
      This patch changes the comment to make it clear what is right.
      
      3. On transmitting and receiving outgoing frames when header_ops == NULL:
      
      The comment explains that the LL header will be later added by the driver.
      
      However, I think it's better to simply say that the LL header is invisible
      to us. This phrasing is better from a software engineering perspective,
      because this makes it clear that what happens in the driver should be
      hidden from us and we should not care about what happens internally in the
      driver.
      
      4. On resuming the LL header (for RAW frames) when header_ops == NULL:
      
      The comment says we are "unlikely" to restore the LL header.
      
      However, we should say that we are "unable" to restore it.
      It's not possible (rather than not likely) to restore it, because:
      
      1) There is no way for us to restore because the LL header internally
      processed by the driver should be invisible to us.
      
      2) In function packet_rcv and tpacket_rcv, the code only tries to restore
      the LL header when header_ops != NULL.
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b79a80bd
  9. 15 9月, 2020 1 次提交
    • X
      net/packet: Fix a comment about hard_header_len and headroom allocation · b4c58814
      Xie He 提交于
      This comment is outdated and no longer reflects the actual implementation
      of af_packet.c.
      
      Reasons for the new comment:
      
      1.
      
      In af_packet.c, the function packet_snd first reserves a headroom of
      length (dev->hard_header_len + dev->needed_headroom).
      Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
      which calls dev->header_ops->create, to create the link layer header.
      If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
      length (dev->hard_header_len), and checks if the user has provided a
      header sized between (dev->min_header_len) and (dev->hard_header_len)
      (in dev_validate_header).
      This shows the developers of af_packet.c expect hard_header_len to
      be consistent with header_ops.
      
      2.
      
      In af_packet.c, the function packet_sendmsg_spkt has a FIXME comment.
      That comment states that prepending an LL header internally in a driver
      is considered a bug. I believe this bug can be fixed by setting
      hard_header_len to 0, making the internal header completely invisible
      to af_packet.c (and requesting the headroom in needed_headroom instead).
      
      3.
      
      There is a commit for a WiFi driver:
      commit 9454f7a8 ("mwifiex: set needed_headroom, not hard_header_len")
      According to the discussion about it at:
        https://patchwork.kernel.org/patch/11407493/
      The author tried to set the WiFi driver's hard_header_len to the Ethernet
      header length, and request additional header space internally needed by
      setting needed_headroom.
      This means this usage is already adopted by driver developers.
      
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Brian Norris <briannorris@chromium.org>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NXie He <xie.he.0141@gmail.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4c58814
  10. 07 9月, 2020 1 次提交
  11. 05 9月, 2020 1 次提交
  12. 24 8月, 2020 1 次提交
  13. 14 8月, 2020 1 次提交
    • J
      af_packet: TPACKET_V3: fix fill status rwlock imbalance · 88fd1cb8
      John Ogness 提交于
      After @blk_fill_in_prog_lock is acquired there is an early out vnet
      situation that can occur. In that case, the rwlock needs to be
      released.
      
      Also, since @blk_fill_in_prog_lock is only acquired when @tp_version
      is exactly TPACKET_V3, only release it on that exact condition as
      well.
      
      And finally, add sparse annotation so that it is clearer that
      prb_fill_curr_block() and prb_clear_blk_fill_status() are acquiring
      and releasing @blk_fill_in_prog_lock, respectively. sparse is still
      unable to understand the balance, but the warnings are now on a
      higher level that make more sense.
      
      Fixes: 632ca50f ("af_packet: TPACKET_V3: replace busy-wait loop")
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88fd1cb8
  14. 25 7月, 2020 2 次提交
  15. 20 7月, 2020 2 次提交
  16. 17 7月, 2020 1 次提交
  17. 02 7月, 2020 1 次提交
  18. 15 3月, 2020 1 次提交
    • W
      net/packet: tpacket_rcv: avoid a producer race condition · 61fad681
      Willem de Bruijn 提交于
      PACKET_RX_RING can cause multiple writers to access the same slot if a
      fast writer wraps the ring while a slow writer is still copying. This
      is particularly likely with few, large, slots (e.g., GSO packets).
      
      Synchronize kernel thread ownership of rx ring slots with a bitmap.
      
      Writers acquire a slot race-free by testing tp_status TP_STATUS_KERNEL
      while holding the sk receive queue lock. They release this lock before
      copying and set tp_status to TP_STATUS_USER to release to userspace
      when done. During copying, another writer may take the lock, also see
      TP_STATUS_KERNEL, and start writing to the same slot.
      
      Introduce a new rx_owner_map bitmap with a bit per slot. To acquire a
      slot, test and set with the lock held. To release race-free, update
      tp_status and owner bit as a transaction, so take the lock again.
      
      This is the one of a variety of discussed options (see Link below):
      
      * instead of a shadow ring, embed the data in the slot itself, such as
      in tp_padding. But any test for this field may match a value left by
      userspace, causing deadlock.
      
      * avoid the lock on release. This leaves a small race if releasing the
      shadow slot before setting TP_STATUS_USER. The below reproducer showed
      that this race is not academic. If releasing the slot after tp_status,
      the race is more subtle. See the first link for details.
      
      * add a new tp_status TP_KERNEL_OWNED to avoid the transactional store
      of two fields. But, legacy applications may interpret all non-zero
      tp_status as owned by the user. As libpcap does. So this is possible
      only opt-in by newer processes. It can be added as an optional mode.
      
      * embed the struct at the tail of pg_vec to avoid extra allocation.
      The implementation proved no less complex than a separate field.
      
      The additional locking cost on release adds contention, no different
      than scaling on multicore or multiqueue h/w. In practice, below
      reproducer nor small packet tcpdump showed a noticeable change in
      perf report in cycles spent in spinlock. Where contention is
      problematic, packet sockets support mitigation through PACKET_FANOUT.
      And we can consider adding opt-in state TP_KERNEL_OWNED.
      
      Easy to reproduce by running multiple netperf or similar TCP_STREAM
      flows concurrently with `tcpdump -B 129 -n greater 60000`.
      
      Based on an earlier patchset by Jon Rosen. See links below.
      
      I believe this issue goes back to the introduction of tpacket_rcv,
      which predates git history.
      
      Link: https://www.mail-archive.com/netdev@vger.kernel.org/msg237222.htmlSuggested-by: NJon Rosen <jrosen@cisco.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NJon Rosen <jrosen@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      61fad681
  19. 12 3月, 2020 1 次提交
  20. 27 12月, 2019 1 次提交
  21. 19 12月, 2019 1 次提交
    • A
      packet: clarify timestamp overflow · d413fcb4
      Arnd Bergmann 提交于
      The memory mapped packet socket data structure in version 1 through 3
      all contain 32-bit second values for the packet time stamps, which makes
      them suffer from the overflow of time_t in y2038 or y2106 (depending
      on whether user space interprets the value as signed or unsigned).
      
      The implementation uses the deprecated getnstimeofday() function.
      
      In order to get rid of that, this changes the code to use
      ktime_get_real_ts64() as a replacement, documenting the nature of the
      overflow. As long as the user applications treat the timestamps as
      unsigned, or only use the difference between timestamps, they are
      fine, and changing the timestamps to 64-bit wouldn't require a more
      invasive user space API change.
      
      Note: a lot of other APIs suffer from incompatible structures when
      time_t gets redefined to 64-bit in 32-bit user space, but this one
      does not.
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Link: https://lore.kernel.org/lkml/CAF=yD-Jomr-gWSR-EBNKnSpFL46UeG564FLfqTCMNEm-prEaXA@mail.gmail.com/T/#uSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      d413fcb4
  22. 10 12月, 2019 1 次提交
    • M
      af_packet: set defaule value for tmo · b43d1f9f
      Mao Wenan 提交于
      There is softlockup when using TPACKET_V3:
      ...
      NMI watchdog: BUG: soft lockup - CPU#2 stuck for 60010ms!
      (__irq_svc) from [<c0558a0c>] (_raw_spin_unlock_irqrestore+0x44/0x54)
      (_raw_spin_unlock_irqrestore) from [<c027b7e8>] (mod_timer+0x210/0x25c)
      (mod_timer) from [<c0549c30>]
      (prb_retire_rx_blk_timer_expired+0x68/0x11c)
      (prb_retire_rx_blk_timer_expired) from [<c027a7ac>]
      (call_timer_fn+0x90/0x17c)
      (call_timer_fn) from [<c027ab6c>] (run_timer_softirq+0x2d4/0x2fc)
      (run_timer_softirq) from [<c021eaf4>] (__do_softirq+0x218/0x318)
      (__do_softirq) from [<c021eea0>] (irq_exit+0x88/0xac)
      (irq_exit) from [<c0240130>] (msa_irq_exit+0x11c/0x1d4)
      (msa_irq_exit) from [<c0209cf0>] (handle_IPI+0x650/0x7f4)
      (handle_IPI) from [<c02015bc>] (gic_handle_irq+0x108/0x118)
      (gic_handle_irq) from [<c0558ee4>] (__irq_usr+0x44/0x5c)
      ...
      
      If __ethtool_get_link_ksettings() is failed in
      prb_calc_retire_blk_tmo(), msec and tmo will be zero, so tov_in_jiffies
      is zero and the timer expire for retire_blk_timer is turn to
      mod_timer(&pkc->retire_blk_timer, jiffies + 0),
      which will trigger cpu usage of softirq is 100%.
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Tested-by: NXiao Jiangfeng <xiaojiangfeng@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b43d1f9f
  23. 09 11月, 2019 1 次提交
    • E
      packet: fix data-race in fanout_flow_is_huge() · b756ad92
      Eric Dumazet 提交于
      KCSAN reported the following data-race [1]
      
      Adding a couple of READ_ONCE()/WRITE_ONCE() should silence it.
      
      Since the report hinted about multiple cpus using the history
      concurrently, I added a test avoiding writing on it if the
      victim slot already contains the desired value.
      
      [1]
      
      BUG: KCSAN: data-race in fanout_demux_rollover / fanout_demux_rollover
      
      read to 0xffff8880b01786cc of 4 bytes by task 18921 on cpu 1:
       fanout_flow_is_huge net/packet/af_packet.c:1303 [inline]
       fanout_demux_rollover+0x33e/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      write to 0xffff8880b01786cc of 4 bytes by task 18922 on cpu 0:
       fanout_flow_is_huge net/packet/af_packet.c:1306 [inline]
       fanout_demux_rollover+0x3a4/0x3f0 net/packet/af_packet.c:1353
       packet_rcv_fanout+0x34e/0x490 net/packet/af_packet.c:1453
       deliver_skb net/core/dev.c:1888 [inline]
       dev_queue_xmit_nit+0x15b/0x540 net/core/dev.c:1958
       xmit_one net/core/dev.c:3195 [inline]
       dev_hard_start_xmit+0x3f5/0x430 net/core/dev.c:3215
       __dev_queue_xmit+0x14ab/0x1b40 net/core/dev.c:3792
       dev_queue_xmit+0x21/0x30 net/core/dev.c:3825
       neigh_direct_output+0x1f/0x30 net/core/neighbour.c:1530
       neigh_output include/net/neighbour.h:511 [inline]
       ip6_finish_output2+0x7a2/0xec0 net/ipv6/ip6_output.c:116
       __ip6_finish_output net/ipv6/ip6_output.c:142 [inline]
       __ip6_finish_output+0x2d7/0x330 net/ipv6/ip6_output.c:127
       ip6_finish_output+0x41/0x160 net/ipv6/ip6_output.c:152
       NF_HOOK_COND include/linux/netfilter.h:294 [inline]
       ip6_output+0xf2/0x280 net/ipv6/ip6_output.c:175
       dst_output include/net/dst.h:436 [inline]
       ip6_local_out+0x74/0x90 net/ipv6/output_core.c:179
       ip6_send_skb+0x53/0x110 net/ipv6/ip6_output.c:1795
       udp_v6_send_skb.isra.0+0x3ec/0xa70 net/ipv6/udp.c:1173
       udpv6_sendmsg+0x1906/0x1c20 net/ipv6/udp.c:1471
       inet6_sendmsg+0x6d/0x90 net/ipv6/af_inet6.c:576
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0x9f/0xc0 net/socket.c:657
       ___sys_sendmsg+0x2b7/0x5d0 net/socket.c:2311
       __sys_sendmmsg+0x123/0x350 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x64/0x80 net/socket.c:2439
       do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 18922 Comm: syz-executor.3 Not tainted 5.4.0-rc6+ #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      
      Fixes: 3b3a5b0a ("packet: rollover huge flows before small flows")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b756ad92
  24. 02 10月, 2019 1 次提交
    • F
      netfilter: drop bridge nf reset from nf_reset · 895b5c9f
      Florian Westphal 提交于
      commit 174e2381
      ("sk_buff: drop all skb extensions on free and skb scrubbing") made napi
      recycle always drop skb extensions.  The additional skb_ext_del() that is
      performed via nf_reset on napi skb recycle is not needed anymore.
      
      Most nf_reset() calls in the stack are there so queued skb won't block
      'rmmod nf_conntrack' indefinitely.
      
      This removes the skb_ext_del from nf_reset, and renames it to a more
      fitting nf_reset_ct().
      
      In a few selected places, add a call to skb_ext_reset to make sure that
      no active extensions remain.
      
      I am submitting this for "net", because we're still early in the release
      cycle.  The patch applies to net-next too, but I think the rename causes
      needless divergence between those trees.
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      895b5c9f
  25. 16 8月, 2019 1 次提交
    • E
      net/packet: fix race in tpacket_snd() · 32d3182c
      Eric Dumazet 提交于
      packet_sendmsg() checks tx_ring.pg_vec to decide
      if it must call tpacket_snd().
      
      Problem is that the check is lockless, meaning another thread
      can issue a concurrent setsockopt(PACKET_TX_RING ) to flip
      tx_ring.pg_vec back to NULL.
      
      Given that tpacket_snd() grabs pg_vec_lock mutex, we can
      perform the check again to solve the race.
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 1 PID: 11429 Comm: syz-executor394 Not tainted 5.3.0-rc4+ #101
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:packet_lookup_frame+0x8d/0x270 net/packet/af_packet.c:474
      Code: c1 ee 03 f7 73 0c 80 3c 0e 00 0f 85 cb 01 00 00 48 8b 0b 89 c0 4c 8d 24 c1 48 b8 00 00 00 00 00 fc ff df 4c 89 e1 48 c1 e9 03 <80> 3c 01 00 0f 85 94 01 00 00 48 8d 7b 10 4d 8b 3c 24 48 b8 00 00
      RSP: 0018:ffff88809f82f7b8 EFLAGS: 00010246
      RAX: dffffc0000000000 RBX: ffff8880a45c7030 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 1ffff110148b8e06 RDI: ffff8880a45c703c
      RBP: ffff88809f82f7e8 R08: ffff888087aea200 R09: fffffbfff134ae50
      R10: fffffbfff134ae4f R11: ffffffff89a5727f R12: 0000000000000000
      R13: 0000000000000001 R14: ffff8880a45c6ac0 R15: 0000000000000000
      FS:  00007fa04716f700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fa04716edb8 CR3: 0000000091eb4000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       packet_current_frame net/packet/af_packet.c:487 [inline]
       tpacket_snd net/packet/af_packet.c:2667 [inline]
       packet_sendmsg+0x590/0x6250 net/packet/af_packet.c:2975
       sock_sendmsg_nosec net/socket.c:637 [inline]
       sock_sendmsg+0xd7/0x130 net/socket.c:657
       ___sys_sendmsg+0x3e2/0x920 net/socket.c:2311
       __sys_sendmmsg+0x1bf/0x4d0 net/socket.c:2413
       __do_sys_sendmmsg net/socket.c:2442 [inline]
       __se_sys_sendmmsg net/socket.c:2439 [inline]
       __x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2439
       do_syscall_64+0xfd/0x6a0 arch/x86/entry/common.c:296
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 69e3c75f ("net: TX_RING and packet mmap")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32d3182c
  26. 27 6月, 2019 1 次提交
    • N
      af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET · 89ed5b51
      Neil Horman 提交于
      When an application is run that:
      a) Sets its scheduler to be SCHED_FIFO
      and
      b) Opens a memory mapped AF_PACKET socket, and sends frames with the
      MSG_DONTWAIT flag cleared, its possible for the application to hang
      forever in the kernel.  This occurs because when waiting, the code in
      tpacket_snd calls schedule, which under normal circumstances allows
      other tasks to run, including ksoftirqd, which in some cases is
      responsible for freeing the transmitted skb (which in AF_PACKET calls a
      destructor that flips the status bit of the transmitted frame back to
      available, allowing the transmitting task to complete).
      
      However, when the calling application is SCHED_FIFO, its priority is
      such that the schedule call immediately places the task back on the cpu,
      preventing ksoftirqd from freeing the skb, which in turn prevents the
      transmitting task from detecting that the transmission is complete.
      
      We can fix this by converting the schedule call to a completion
      mechanism.  By using a completion queue, we force the calling task, when
      it detects there are no more frames to send, to schedule itself off the
      cpu until such time as the last transmitted skb is freed, allowing
      forward progress to be made.
      
      Tested by myself and the reporter, with good results
      
      Change Notes:
      
      V1->V2:
      	Enhance the sleep logic to support being interruptible and
      allowing for honoring to SK_SNDTIMEO (Willem de Bruijn)
      
      V2->V3:
      	Rearrage the point at which we wait for the completion queue, to
      avoid needing to check for ph/skb being null at the end of the loop.
      Also move the complete call to the skb destructor to avoid needing to
      modify __packet_set_status.  Also gate calling complete on
      packet_read_pending returning zero to avoid multiple calls to complete.
      (Willem de Bruijn)
      
      	Move timeo computation within loop, to re-fetch the socket
      timeout since we also use the timeo variable to record the return code
      from the wait_for_complete call (Neil Horman)
      
      V3->V4:
      	Willem has requested that the control flow be restored to the
      previous state.  Doing so lets us eliminate the need for the
      po->wait_on_complete flag variable, and lets us get rid of the
      packet_next_frame function, but introduces another complexity.
      Specifically, but using the packet pending count, we can, if an
      applications calls sendmsg multiple times with MSG_DONTWAIT set, each
      set of transmitted frames, when complete, will cause
      tpacket_destruct_skb to issue a complete call, for which there will
      never be a wait_on_completion call.  This imbalance will lead to any
      future call to wait_for_completion here to return early, when the frames
      they sent may not have completed.  To correct this, we need to re-init
      the completion queue on every call to tpacket_snd before we enter the
      loop so as to ensure we wait properly for the frames we send in this
      iteration.
      
      	Change the timeout and interrupted gotos to out_put rather than
      out_status so that we don't try to free a non-existant skb
      	Clean up some extra newlines (Willem de Bruijn)
      Reviewed-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ed5b51
  27. 24 6月, 2019 1 次提交
  28. 15 6月, 2019 8 次提交