1. 16 1月, 2023 3 次提交
  2. 12 12月, 2022 1 次提交
  3. 24 11月, 2022 1 次提交
    • L
      virtio_net: Fix probe failed when modprobe virtio_net · b0686565
      Li Zetao 提交于
      When doing the following test steps, an error was found:
        step 1: modprobe virtio_net succeeded
          # modprobe virtio_net        <-- OK
      
        step 2: fault injection in register_netdevice()
          # modprobe -r virtio_net     <-- OK
          # ...
            FAULT_INJECTION: forcing a failure.
            name failslab, interval 1, probability 0, space 0, times 0
            CPU: 0 PID: 3521 Comm: modprobe
            Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
            Call Trace:
             <TASK>
             ...
             should_failslab+0xa/0x20
             ...
             dev_set_name+0xc0/0x100
             netdev_register_kobject+0xc2/0x340
             register_netdevice+0xbb9/0x1320
             virtnet_probe+0x1d72/0x2658 [virtio_net]
             ...
             </TASK>
            virtio_net: probe of virtio0 failed with error -22
      
        step 3: modprobe virtio_net failed
          # modprobe virtio_net        <-- failed
            virtio_net: probe of virtio0 failed with error -2
      
      The root cause of the problem is that the queues are not
      disable on the error handling path when register_netdevice()
      fails in virtnet_probe(), resulting in an error "-ENOENT"
      returned in the next modprobe call in setup_vq().
      
      virtio_pci_modern_device uses virtqueues to send or
      receive message, and "queue_enable" records whether the
      queues are available. In vp_modern_find_vqs(), all queues
      will be selected and activated, but once queues are enabled
      there is no way to go back except reset.
      
      Fix it by reset virtio device on error handling path. This
      makes error handling follow the same order as normal device
      cleanup in virtnet_remove() which does: unregister, destroy
      failover, then reset. And that flow is better tested than
      error handling so we can be reasonably sure it works well.
      
      Fixes: 02465555 ("virtio_net: fix use after free on allocation failure")
      Signed-off-by: NLi Zetao <lizetao1@huawei.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Link: https://lore.kernel.org/r/20221122150046.3910638-1-lizetao1@huawei.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      b0686565
  4. 29 10月, 2022 1 次提交
  5. 07 10月, 2022 2 次提交
  6. 01 9月, 2022 1 次提交
  7. 16 8月, 2022 1 次提交
  8. 12 8月, 2022 1 次提交
  9. 11 8月, 2022 7 次提交
  10. 08 8月, 2022 1 次提交
  11. 27 7月, 2022 1 次提交
    • J
      virtio-net: fix the race between refill work and close · 5a159128
      Jason Wang 提交于
      We try using cancel_delayed_work_sync() to prevent the work from
      enabling NAPI. This is insufficient since we don't disable the source
      of the refill work scheduling. This means an NAPI poll callback after
      cancel_delayed_work_sync() can schedule the refill work then can
      re-enable the NAPI that leads to use-after-free [1].
      
      Since the work can enable NAPI, we can't simply disable NAPI before
      calling cancel_delayed_work_sync(). So fix this by introducing a
      dedicated boolean to control whether or not the work could be
      scheduled from NAPI.
      
      [1]
      ==================================================================
      BUG: KASAN: use-after-free in refill_work+0x43/0xd4
      Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42
      
      CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Workqueue: events refill_work
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       print_report.cold+0xbb/0x6ac
       ? _printk+0xad/0xde
       ? refill_work+0x43/0xd4
       kasan_report+0xa8/0x130
       ? refill_work+0x43/0xd4
       refill_work+0x43/0xd4
       process_one_work+0x43d/0x780
       worker_thread+0x2a0/0x6f0
       ? process_one_work+0x780/0x780
       kthread+0x167/0x1a0
       ? kthread_exit+0x50/0x50
       ret_from_fork+0x22/0x30
       </TASK>
      ...
      
      Fixes: b2baed69 ("virtio_net: set/cancel work on ndo_open/ndo_stop")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a159128
  12. 27 6月, 2022 1 次提交
    • J
      virtio-net: fix race between ndo_open() and virtio_device_ready() · 50c0ada6
      Jason Wang 提交于
      We currently call virtio_device_ready() after netdev
      registration. Since ndo_open() can be called immediately
      after register_netdev, this means there exists a race between
      ndo_open() and virtio_device_ready(): the driver may start to use the
      device before DRIVER_OK which violates the spec.
      
      Fix this by switching to use register_netdevice() and protect the
      virtio_device_ready() with rtnl_lock() to make sure ndo_open() can
      only be called after virtio_device_ready().
      
      Fixes: 4baf1e33 ("virtio_net: enable VQs early")
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Message-Id: <20220617072949.30734-1-jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      50c0ada6
  13. 23 6月, 2022 1 次提交
    • S
      virtio_net: fix xdp_rxq_info bug after suspend/resume · 8af52fe9
      Stephan Gerhold 提交于
      The following sequence currently causes a driver bug warning
      when using virtio_net:
      
        # ip link set eth0 up
        # echo mem > /sys/power/state (or e.g. # rtcwake -s 10 -m mem)
        <resume>
        # ip link set eth0 down
      
        Missing register, driver bug
        WARNING: CPU: 0 PID: 375 at net/core/xdp.c:138 xdp_rxq_info_unreg+0x58/0x60
        Call trace:
         xdp_rxq_info_unreg+0x58/0x60
         virtnet_close+0x58/0xac
         __dev_close_many+0xac/0x140
         __dev_change_flags+0xd8/0x210
         dev_change_flags+0x24/0x64
         do_setlink+0x230/0xdd0
         ...
      
      This happens because virtnet_freeze() frees the receive_queue
      completely (including struct xdp_rxq_info) but does not call
      xdp_rxq_info_unreg(). Similarly, virtnet_restore() sets up the
      receive_queue again but does not call xdp_rxq_info_reg().
      
      Actually, parts of virtnet_freeze_down() and virtnet_restore_up()
      are almost identical to virtnet_close() and virtnet_open(): only
      the calls to xdp_rxq_info_(un)reg() are missing. This means that
      we can fix this easily and avoid such problems in the future by
      just calling virtnet_close()/open() from the freeze/restore handlers.
      
      Aside from adding the missing xdp_rxq_info calls the only difference
      is that the refill work is only cancelled if netif_running(). However,
      this should not make any functional difference since the refill work
      should only be active if the network interface is actually up.
      
      Fixes: 754b8a21 ("virtio_net: setup xdp_rxq_info")
      Signed-off-by: NStephan Gerhold <stephan.gerhold@kernkonzept.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220621114845.3650258-1-stephan.gerhold@kernkonzept.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8af52fe9
  14. 08 5月, 2022 1 次提交
  15. 06 5月, 2022 1 次提交
  16. 26 4月, 2022 1 次提交
    • N
      virtio_net: fix wrong buf address calculation when using xdp · acb16b39
      Nikolay Aleksandrov 提交于
      We received a report[1] of kernel crashes when Cilium is used in XDP
      mode with virtio_net after updating to newer kernels. After
      investigating the reason it turned out that when using mergeable bufs
      with an XDP program which adjusts xdp.data or xdp.data_meta page_to_buf()
      calculates the build_skb address wrong because the offset can become less
      than the headroom so it gets the address of the previous page (-X bytes
      depending on how lower offset is):
       page_to_skb: page addr ffff9eb2923e2000 buf ffff9eb2923e1ffc offset 252 headroom 256
      
      This is a pr_err() I added in the beginning of page_to_skb which clearly
      shows offset that is less than headroom by adding 4 bytes of metadata
      via an xdp prog. The calculations done are:
       receive_mergeable():
       headroom = VIRTIO_XDP_HEADROOM; // VIRTIO_XDP_HEADROOM == 256 bytes
       offset = xdp.data - page_address(xdp_page) -
                vi->hdr_len - metasize;
      
       page_to_skb():
       p = page_address(page) + offset;
       ...
       buf = p - headroom;
      
      Now buf goes -4 bytes from the page's starting address as can be seen
      above which is set as skb->head and skb->data by build_skb later. Depending
      on what's done with the skb (when it's freed most often) we get all kinds
      of corruptions and BUG_ON() triggers in mm[2]. We have to recalculate
      the new headroom after the xdp program has run, similar to how offset
      and len are recalculated. Headroom is directly related to
      data_hard_start, data and data_meta, so we use them to get the new size.
      The result is correct (similar pr_err() in page_to_skb, one case of
      xdp_page and one case of virtnet buf):
       a) Case with 4 bytes of metadata
       [  115.949641] page_to_skb: page addr ffff8b4dcfad2000 offset 252 headroom 252
       [  121.084105] page_to_skb: page addr ffff8b4dcf018000 offset 20732 headroom 252
       b) Case of pushing data +32 bytes
       [  153.181401] page_to_skb: page addr ffff8b4dd0c4d000 offset 288 headroom 288
       [  158.480421] page_to_skb: page addr ffff8b4dd00b0000 offset 24864 headroom 288
       c) Case of pushing data -33 bytes
       [  835.906830] page_to_skb: page addr ffff8b4dd3270000 offset 223 headroom 223
       [  840.839910] page_to_skb: page addr ffff8b4dcdd68000 offset 12511 headroom 223
      
      Offset and headroom are equal because offset points to the start of
      reserved bytes for the virtio_net header which are at buf start +
      headroom, while data points at buf start + vnet hdr size + headroom so
      when data or data_meta are adjusted by the xdp prog both the headroom size
      and the offset change equally. We can use data_hard_start to compute the
      new headroom after the xdp prog (linearized / page start case, the
      virtnet buf case is similar just with bigger base offset):
       xdp.data_hard_start = page_address + vnet_hdr
       xdp.data = page_address + vnet_hdr + headroom
       new headroom after xdp prog = xdp.data - xdp.data_hard_start - metasize
      
      An example reproducer xdp prog[3] is below.
      
      [1] https://github.com/cilium/cilium/issues/19453
      
      [2] Two of the many traces:
       [   40.437400] BUG: Bad page state in process swapper/0  pfn:14940
       [   40.916726] BUG: Bad page state in process systemd-resolve  pfn:053b7
       [   41.300891] kernel BUG at include/linux/mm.h:720!
       [   41.301801] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
       [   41.302784] CPU: 1 PID: 1181 Comm: kubelet Kdump: loaded Tainted: G    B   W         5.18.0-rc1+ #37
       [   41.304458] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
       [   41.306018] RIP: 0010:page_frag_free+0x79/0xe0
       [   41.306836] Code: 00 00 75 ea 48 8b 07 a9 00 00 01 00 74 e0 48 8b 47 48 48 8d 50 ff a8 01 48 0f 45 fa eb d0 48 c7 c6 18 b8 30 a6 e8 d7 f8 fc ff <0f> 0b 48 8d 78 ff eb bc 48 8b 07 a9 00 00 01 00 74 3a 66 90 0f b6
       [   41.310235] RSP: 0018:ffffac05c2a6bc78 EFLAGS: 00010292
       [   41.311201] RAX: 000000000000003e RBX: 0000000000000000 RCX: 0000000000000000
       [   41.312502] RDX: 0000000000000001 RSI: ffffffffa6423004 RDI: 00000000ffffffff
       [   41.313794] RBP: ffff993c98823600 R08: 0000000000000000 R09: 00000000ffffdfff
       [   41.315089] R10: ffffac05c2a6ba68 R11: ffffffffa698ca28 R12: ffff993c98823600
       [   41.316398] R13: ffff993c86311ebc R14: 0000000000000000 R15: 000000000000005c
       [   41.317700] FS:  00007fe13fc56740(0000) GS:ffff993cdd900000(0000) knlGS:0000000000000000
       [   41.319150] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [   41.320152] CR2: 000000c00008a000 CR3: 0000000014908000 CR4: 0000000000350ee0
       [   41.321387] Call Trace:
       [   41.321819]  <TASK>
       [   41.322193]  skb_release_data+0x13f/0x1c0
       [   41.322902]  __kfree_skb+0x20/0x30
       [   41.343870]  tcp_recvmsg_locked+0x671/0x880
       [   41.363764]  tcp_recvmsg+0x5e/0x1c0
       [   41.384102]  inet_recvmsg+0x42/0x100
       [   41.406783]  ? sock_recvmsg+0x1d/0x70
       [   41.428201]  sock_read_iter+0x84/0xd0
       [   41.445592]  ? 0xffffffffa3000000
       [   41.462442]  new_sync_read+0x148/0x160
       [   41.479314]  ? 0xffffffffa3000000
       [   41.496937]  vfs_read+0x138/0x190
       [   41.517198]  ksys_read+0x87/0xc0
       [   41.535336]  do_syscall_64+0x3b/0x90
       [   41.551637]  entry_SYSCALL_64_after_hwframe+0x44/0xae
       [   41.568050] RIP: 0033:0x48765b
       [   41.583955] Code: e8 4a 35 fe ff eb 88 cc cc cc cc cc cc cc cc e8 fb 7a fe ff 48 8b 7c 24 10 48 8b 74 24 18 48 8b 54 24 20 48 8b 44 24 08 0f 05 <48> 3d 01 f0 ff ff 76 20 48 c7 44 24 28 ff ff ff ff 48 c7 44 24 30
       [   41.632818] RSP: 002b:000000c000a2f5b8 EFLAGS: 00000212 ORIG_RAX: 0000000000000000
       [   41.664588] RAX: ffffffffffffffda RBX: 000000c000062000 RCX: 000000000048765b
       [   41.681205] RDX: 0000000000005e54 RSI: 000000c000e66000 RDI: 0000000000000016
       [   41.697164] RBP: 000000c000a2f608 R08: 0000000000000001 R09: 00000000000001b4
       [   41.713034] R10: 00000000000000b6 R11: 0000000000000212 R12: 00000000000000e9
       [   41.728755] R13: 0000000000000001 R14: 000000c000a92000 R15: ffffffffffffffff
       [   41.744254]  </TASK>
       [   41.758585] Modules linked in: br_netfilter bridge veth netconsole virtio_net
      
       and
      
       [   33.524802] BUG: Bad page state in process systemd-network  pfn:11e60
       [   33.528617] page ffffe05dc0147b00 ffffe05dc04e7a00 ffff8ae9851ec000 (1) len 82 offset 252 metasize 4 hroom 0 hdr_len 12 data ffff8ae9851ec10c data_meta ffff8ae9851ec108 data_end ffff8ae9851ec14e
       [   33.529764] page:000000003792b5ba refcount:0 mapcount:-512 mapping:0000000000000000 index:0x0 pfn:0x11e60
       [   33.532463] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff)
       [   33.532468] raw: 000fffffc0000000 0000000000000000 dead000000000122 0000000000000000
       [   33.532470] raw: 0000000000000000 0000000000000000 00000000fffffdff 0000000000000000
       [   33.532471] page dumped because: nonzero mapcount
       [   33.532472] Modules linked in: br_netfilter bridge veth netconsole virtio_net
       [   33.532479] CPU: 0 PID: 791 Comm: systemd-network Kdump: loaded Not tainted 5.18.0-rc1+ #37
       [   33.532482] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1.fc35 04/01/2014
       [   33.532484] Call Trace:
       [   33.532496]  <TASK>
       [   33.532500]  dump_stack_lvl+0x45/0x5a
       [   33.532506]  bad_page.cold+0x63/0x94
       [   33.532510]  free_pcp_prepare+0x290/0x420
       [   33.532515]  free_unref_page+0x1b/0x100
       [   33.532518]  skb_release_data+0x13f/0x1c0
       [   33.532524]  kfree_skb_reason+0x3e/0xc0
       [   33.532527]  ip6_mc_input+0x23c/0x2b0
       [   33.532531]  ip6_sublist_rcv_finish+0x83/0x90
       [   33.532534]  ip6_sublist_rcv+0x22b/0x2b0
      
      [3] XDP program to reproduce(xdp_pass.c):
       #include <linux/bpf.h>
       #include <bpf/bpf_helpers.h>
      
       SEC("xdp_pass")
       int xdp_pkt_pass(struct xdp_md *ctx)
       {
                bpf_xdp_adjust_head(ctx, -(int)32);
                return XDP_PASS;
       }
      
       char _license[] SEC("license") = "GPL";
      
       compile: clang -O2 -g -Wall -target bpf -c xdp_pass.c -o xdp_pass.o
       load on virtio_net: ip link set enp1s0 xdpdrv obj xdp_pass.o sec xdp_pass
      
      CC: stable@vger.kernel.org
      CC: Jason Wang <jasowang@redhat.com>
      CC: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: "Michael S. Tsirkin" <mst@redhat.com>
      CC: virtualization@lists.linux-foundation.org
      Fixes: 8fb7da9e ("virtio_net: get build_skb() buf by data ptr")
      Signed-off-by: NNikolay Aleksandrov <razor@blackwall.org>
      Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20220425103703.3067292-1-razor@blackwall.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
      acb16b39
  17. 29 3月, 2022 4 次提交
  18. 15 2月, 2022 1 次提交
  19. 16 1月, 2022 1 次提交
  20. 15 1月, 2022 1 次提交
  21. 16 12月, 2021 1 次提交
  22. 14 12月, 2021 1 次提交
  23. 25 11月, 2021 1 次提交
  24. 22 11月, 2021 1 次提交
  25. 17 11月, 2021 1 次提交
  26. 01 11月, 2021 2 次提交
  27. 28 10月, 2021 1 次提交