1. 03 8月, 2021 1 次提交
  2. 11 7月, 2021 1 次提交
  3. 08 7月, 2021 1 次提交
    • M
      virtio_net: disable cb aggressively · a7766ef1
      Michael S. Tsirkin 提交于
      There are currently two cases where we poll TX vq not in response to a
      callback: start xmit and rx napi.  We currently do this with callbacks
      enabled which can cause extra interrupts from the card.  Used not to be
      a big issue as we run with interrupts disabled but that is no longer the
      case, and in some cases the rate of spurious interrupts is so high
      linux detects this and actually kills the interrupt.
      
      Fix up by disabling the callbacks before polling the tx vq.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      a7766ef1
  4. 03 7月, 2021 3 次提交
  5. 24 6月, 2021 1 次提交
  6. 08 6月, 2021 1 次提交
  7. 04 6月, 2021 1 次提交
    • X
      virtio-net: fix for skb_over_panic inside big mode · 1a802423
      Xuan Zhuo 提交于
      In virtio-net's large packet mode, there is a hole in the space behind
      buf.
      
          hdr_padded_len - hdr_len
      
      We must take this into account when calculating tailroom.
      
      [   44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1))
      [   44.544864] page_to_skb (drivers/net/virtio_net.c:485) [   44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131)
      [   44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714)
      [   44.546628] ? dev_gro_receive (net/core/dev.c:6103)
      [   44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565)
      [   44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525)
      [   44.548251] __napi_poll (net/core/dev.c:6985)
      [   44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139)
      [   44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560)
      [   44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649)
      [   44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13))
      [   44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
      [   44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Reported-by: NCorentin Noël <corentin.noel@collabora.com>
      Tested-by: NCorentin Noël <corentin.noel@collabora.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a802423
  8. 02 6月, 2021 2 次提交
    • X
      virtio_net: get build_skb() buf by data ptr · 8fb7da9e
      Xuan Zhuo 提交于
      In the case of merge, the page passed into page_to_skb() may be a head
      page, not the page where the current data is located. So when trying to
      get the buf where the data is located, we should get buf based on
      headroom instead of offset.
      
      This patch solves this problem. But if you don't use this patch, the
      original code can also run, because if the page is not the page of the
      current data, the calculated tailroom will be less than 0, and will not
      enter the logic of build_skb() . The significance of this patch is to
      modify this logical problem, allowing more situations to use
      build_skb().
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fb7da9e
    • X
      virtio-net: fix for unable to handle page fault for address · 5c37711d
      Xuan Zhuo 提交于
      In merge mode, when xdp is enabled, if the headroom of buf is smaller
      than virtnet_get_headroom(), xdp_linearize_page() will be called but the
      variable of "headroom" is still 0, which leads to wrong logic after
      entering page_to_skb().
      
      [   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
      [   16.603350] #PF: error_code(0x0000) - not-present page
      [   16.604200] PGD 0 P4D 0
      [   16.604686] Oops: 0000 [#1] SMP PTI
      [   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
      [   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
      [   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.624047] Call Trace:
      [   16.624525]  ? release_pages+0x24d/0x730
      [   16.625209]  unmap_single_vma+0xa9/0x130
      [   16.625885]  unmap_vmas+0x76/0xf0
      [   16.626480]  exit_mmap+0xa0/0x210
      [   16.627129]  mmput+0x67/0x180
      [   16.627673]  do_exit+0x3d1/0xf10
      [   16.628259]  ? do_user_addr_fault+0x231/0x840
      [   16.629000]  do_group_exit+0x53/0xd0
      [   16.629631]  __x64_sys_exit_group+0x1d/0x20
      [   16.630354]  do_syscall_64+0x3c/0x80
      [   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.631828] RIP: 0033:0x7f1a043d0191
      [   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
      [   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      [   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
      [   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      [   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
      [   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
      [   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
      [   16.640408] Modules linked in:
      [   16.640958] CR2: ffffecbfff7b43c8
      [   16.641557] ---[ end trace bc4891c6ce46354c ]---
      [   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.658290] Kernel panic - not syncing: Fatal exception
      [   16.659613] Kernel Offset: disabled
      [   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c37711d
  9. 01 6月, 2021 1 次提交
  10. 14 5月, 2021 2 次提交
    • X
      virtio-net: get build_skb() buf by data ptr · 7bf64460
      Xuan Zhuo 提交于
      In the case of merge, the page passed into page_to_skb() may be a head
      page, not the page where the current data is located. So when trying to
      get the buf where the data is located, you should directly use the
      pointer(p) to get the address corresponding to the page.
      
      At the same time, the offset of the data in the page should also be
      obtained using offset_in_page().
      
      This patch solves this problem. But if you don’t use this patch, the
      original code can also run, because if the page is not the page of the
      current data, the calculated tailroom will be less than 0, and will not
      enter the logic of build_skb() . The significance of this patch is to
      modify this logical problem, allowing more situations to use
      build_skb().
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bf64460
    • X
      virtio-net: fix for unable to handle page fault for address · 6c66c147
      Xuan Zhuo 提交于
      In merge mode, when xdp is enabled, if the headroom of buf is smaller
      than virtnet_get_headroom(), xdp_linearize_page() will be called but the
      variable of "headroom" is still 0, which leads to wrong logic after
      entering page_to_skb().
      
      [   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
      [   16.603350] #PF: error_code(0x0000) - not-present page
      [   16.604200] PGD 0 P4D 0
      [   16.604686] Oops: 0000 [#1] SMP PTI
      [   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
      [   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
      [   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.624047] Call Trace:
      [   16.624525]  ? release_pages+0x24d/0x730
      [   16.625209]  unmap_single_vma+0xa9/0x130
      [   16.625885]  unmap_vmas+0x76/0xf0
      [   16.626480]  exit_mmap+0xa0/0x210
      [   16.627129]  mmput+0x67/0x180
      [   16.627673]  do_exit+0x3d1/0xf10
      [   16.628259]  ? do_user_addr_fault+0x231/0x840
      [   16.629000]  do_group_exit+0x53/0xd0
      [   16.629631]  __x64_sys_exit_group+0x1d/0x20
      [   16.630354]  do_syscall_64+0x3c/0x80
      [   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [   16.631828] RIP: 0033:0x7f1a043d0191
      [   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
      [   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
      [   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
      [   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
      [   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
      [   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
      [   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
      [   16.640408] Modules linked in:
      [   16.640958] CR2: ffffecbfff7b43c8
      [   16.641557] ---[ end trace bc4891c6ce46354c ]---
      [   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
      [   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
      [   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
      [   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
      [   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
      [   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
      [   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
      [   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
      [   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
      [   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
      [   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   16.658290] Kernel panic - not syncing: Fatal exception
      [   16.659613] Kernel Offset: disabled
      [   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c66c147
  11. 03 5月, 2021 1 次提交
  12. 24 4月, 2021 1 次提交
    • X
      virtio-net: fix use-after-free in skb_gro_receive · f80bd740
      Xuan Zhuo 提交于
      When "headroom" > 0, the actual allocated memory space is the entire
      page, so the address of the page should be used when passing it to
      build_skb().
      
      BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260)
      Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534
      CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382
      Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc]
      Call Trace:
       <IRQ>
      dump_stack (lib/dump_stack.c:122)
      print_address_description.constprop.0 (mm/kasan/report.c:233)
      kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416)
      skb_gro_receive (net/core/skbuff.c:4260)
      tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1))
      tcp4_gro_receive (net/ipv4/tcp_offload.c:316)
      inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2))
      dev_gro_receive (net/core/dev.c:6075)
      napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198)
      receive_buf (drivers/net/virtio_net.c:1151) virtio_net
      virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net
      __napi_poll (net/core/dev.c:6964)
      net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118)
      __do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346)
      irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434)
      common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
      </IRQ>
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Reported-by: NIdo Schimmel <idosch@nvidia.com>
      Tested-by: NIdo Schimmel <idosch@nvidia.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f80bd740
  13. 21 4月, 2021 2 次提交
    • E
      virtio-net: fix use-after-free in page_to_skb() · af39c8f7
      Eric Dumazet 提交于
      KASAN/syzbot had 4 reports, one of them being:
      
      BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline]
      BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
      Read of size 12 at addr ffff888014a5f800 by task systemd-udevd/8445
      
      CPU: 0 PID: 8445 Comm: systemd-udevd Not tainted 5.12.0-rc8-next-20210419-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:79 [inline]
       dump_stack+0x141/0x1d7 lib/dump_stack.c:120
       print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
       __kasan_report mm/kasan/report.c:419 [inline]
       kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
       check_region_inline mm/kasan/generic.c:180 [inline]
       kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
       memcpy+0x20/0x60 mm/kasan/shadow.c:65
       memcpy include/linux/fortify-string.h:191 [inline]
       page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
       receive_mergeable drivers/net/virtio_net.c:1009 [inline]
       receive_buf+0x2bc0/0x6250 drivers/net/virtio_net.c:1119
       virtnet_receive drivers/net/virtio_net.c:1411 [inline]
       virtnet_poll+0x568/0x10b0 drivers/net/virtio_net.c:1516
       __napi_poll+0xaf/0x440 net/core/dev.c:6962
       napi_poll net/core/dev.c:7029 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7116
       __do_softirq+0x29b/0x9fe kernel/softirq.c:559
       invoke_softirq kernel/softirq.c:433 [inline]
       __irq_exit_rcu+0x136/0x200 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Reported-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af39c8f7
    • E
      virtio-net: restrict build_skb() use to some arches · f5d7872a
      Eric Dumazet 提交于
      build_skb() is supposed to be followed by
      skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned.
      (Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD
      part is only a performance optimization if tunnel encaps are added.)
      
      Unfortunately virtio_net has not provisioned this reserve.
      We can only use build_skb() for arches where NET_IP_ALIGN == 0
      
      We might refine this later, with enough testing.
      
      Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f5d7872a
  14. 17 4月, 2021 1 次提交
    • X
      virtio-net: page_to_skb() use build_skb when there's sufficient tailroom · fb32856b
      Xuan Zhuo 提交于
      In page_to_skb(), if we have enough tailroom to save skb_shared_info, we
      can use build_skb to create skb directly. No need to alloc for
      additional space. And it can save a 'frags slot', which is very friendly
      to GRO.
      
      Here, if the payload of the received package is too small (less than
      GOOD_COPY_LEN), we still choose to copy it directly to the space got by
      napi_alloc_skb. So we can reuse these pages.
      
      Testing Machine:
          The four queues of the network card are bound to the cpu1.
      
      Test command:
          for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done
      
      The size of the udp package is 1000, so in the case of this patch, there
      will always be enough tailroom to use build_skb. The sent udp packet
      will be discarded because there is no port to receive it. The irqsoftd
      of the machine is 100%, we observe the received quantity displayed by
      sar -n DEV 1:
      
      no build_skb:  956864.00 rxpck/s
      build_skb:    1158465.00 rxpck/s
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Suggested-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb32856b
  15. 07 4月, 2021 1 次提交
    • E
      virtio_net: Do not pull payload in skb->head · 0f6925b3
      Eric Dumazet 提交于
      Xuan Zhuo reported that commit 3226b158 ("net: avoid 32 x truesize
      under-estimation for tiny skbs") brought  a ~10% performance drop.
      
      The reason for the performance drop was that GRO was forced
      to chain sk_buff (using skb_shinfo(skb)->frag_list), which
      uses more memory but also cause packet consumers to go over
      a lot of overhead handling all the tiny skbs.
      
      It turns out that virtio_net page_to_skb() has a wrong strategy :
      It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then
      copies 128 bytes from the page, before feeding the packet to GRO stack.
      
      This was suboptimal before commit 3226b158 ("net: avoid 32 x truesize
      under-estimation for tiny skbs") because GRO was using 2 frags per MSS,
      meaning we were not packing MSS with 100% efficiency.
      
      Fix is to pull only the ethernet header in page_to_skb()
      
      Then, we change virtio_net_hdr_to_skb() to pull the missing
      headers, instead of assuming they were already pulled by callers.
      
      This fixes the performance regression, but could also allow virtio_net
      to accept packets with more than 128bytes of headers.
      
      Many thanks to Xuan Zhuo for his report, and his tests/help.
      
      Fixes: 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs")
      Reported-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Link: https://www.spinics.net/lists/netdev/msg731397.htmlCo-Developed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Jason Wang <jasowang@redhat.com>
      Cc: virtualization@lists.linux-foundation.org
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0f6925b3
  16. 19 3月, 2021 1 次提交
  17. 18 3月, 2021 2 次提交
  18. 11 3月, 2021 1 次提交
  19. 25 2月, 2021 1 次提交
  20. 23 2月, 2021 1 次提交
  21. 09 1月, 2021 2 次提交
  22. 24 12月, 2020 1 次提交
  23. 19 12月, 2020 1 次提交
  24. 01 12月, 2020 1 次提交
  25. 22 10月, 2020 1 次提交
    • M
      Revert "virtio-net: ethtool configurable RXCSUM" · cf8691cb
      Michael S. Tsirkin 提交于
      This reverts commit 3618ad2a.
      
      When control vq is not negotiated, that commit causes a crash:
      
      [   72.229171] kernel BUG at drivers/net/virtio_net.c:1667!
      [   72.230266] invalid opcode: 0000 [#1] PREEMPT SMP
      [   72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-02934-g3618ad2a #1
      [   72.231172] EIP: virtnet_send_command+0x120/0x140
      [   72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00
      +00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00
      [   72.231172] EAX: 0000000d EBX: f72895c0 ECX: 00000017 EDX: 00000011
      [   72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98
      [   72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
      [   72.231172] CR0: 80050033 CR2: 00000000 CR3: 02c84000 CR4: 000406f0
      [   72.231172] Call Trace:
      [   72.231172]  ? __virt_addr_valid+0x45/0x60
      [   72.231172]  ? ___cache_free+0x51f/0x760
      [   72.231172]  ? kobject_uevent_env+0xf4/0x560
      [   72.231172]  virtnet_set_guest_offloads+0x4d/0x80
      [   72.231172]  virtnet_set_features+0x85/0x120
      [   72.231172]  ? virtnet_set_guest_offloads+0x80/0x80
      [   72.231172]  __netdev_update_features+0x27a/0x8e0
      [   72.231172]  ? kobject_uevent+0xa/0x20
      [   72.231172]  ? netdev_register_kobject+0x12c/0x160
      [   72.231172]  register_netdevice+0x4fe/0x740
      [   72.231172]  register_netdev+0x1c/0x40
      [   72.231172]  virtnet_probe+0x728/0xb60
      [   72.231172]  ? _raw_spin_unlock+0x1d/0x40
      [   72.231172]  ? virtio_vdpa_get_status+0x1c/0x20
      [   72.231172]  virtio_dev_probe+0x1c6/0x271
      [   72.231172]  really_probe+0x195/0x2e0
      [   72.231172]  driver_probe_device+0x26/0x60
      [   72.231172]  device_driver_attach+0x49/0x60
      [   72.231172]  __driver_attach+0x46/0xc0
      [   72.231172]  ? device_driver_attach+0x60/0x60
      [   72.231172]  bus_add_driver+0x197/0x1c0
      [   72.231172]  driver_register+0x66/0xc0
      [   72.231172]  register_virtio_driver+0x1b/0x40
      [   72.231172]  virtio_net_driver_init+0x61/0x86
      [   72.231172]  ? veth_init+0x14/0x14
      [   72.231172]  do_one_initcall+0x76/0x2e4
      [   72.231172]  ? rdinit_setup+0x2a/0x2a
      [   72.231172]  do_initcalls+0xb2/0xd5
      [   72.231172]  kernel_init_freeable+0x14f/0x179
      [   72.231172]  ? rest_init+0x100/0x100
      [   72.231172]  kernel_init+0xd/0xe0
      [   72.231172]  ret_from_fork+0x1c/0x30
      [   72.231172] Modules linked in:
      [   72.269563] ---[ end trace a6ebc4afea0e6cb1 ]---
      
      The reason is that virtnet_set_features now calls virtnet_set_guest_offloads
      unconditionally, it used to only call it when there is something
      to configure.
      
      If device does not have a control vq, everything breaks.
      
      Revert the original commit for now.
      
      Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com>
      Fixes: 3618ad2a ("virtio-net: ethtool configurable RXCSUM")
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/r/20201021142944.13615-1-mst@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      cf8691cb
  26. 14 10月, 2020 1 次提交
  27. 30 9月, 2020 1 次提交
  28. 11 9月, 2020 1 次提交
    • J
      net: remove napi_hash_del() from driver-facing API · 5198d545
      Jakub Kicinski 提交于
      We allow drivers to call napi_hash_del() before calling
      netif_napi_del() to batch RCU grace periods. This makes
      the API asymmetric and leaks internal implementation details.
      Soon we will want the grace period to protect more than just
      the NAPI hash table.
      
      Restructure the API and have drivers call a new function -
      __netif_napi_del() if they want to take care of RCU waits.
      
      Note that only core was checking the return status from
      napi_hash_del() so the new helper does not report if the
      NAPI was actually deleted.
      
      Some notes on driver oddness:
       - veth observed the grace period before calling netif_napi_del()
         but that should not matter
       - myri10ge observed normal RCU flavor
       - bnx2x and enic did not actually observe the grace period
         (unless they did so implicitly)
       - virtio_net and enic only unhashed Rx NAPIs
      
      The last two points seem to indicate that the calls to
      napi_hash_del() were a left over rather than an optimization.
      Regardless, it's easy enough to correct them.
      
      This patch may introduce extra synchronize_net() calls for
      interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
      free_netdev() to call netif_napi_del(). This seems inevitable
      since we want to use RCU for netpoll dev->napi_list traversal,
      and almost no drivers set IFF_DISABLE_NETPOLL.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5198d545
  29. 24 8月, 2020 1 次提交
  30. 05 8月, 2020 1 次提交
  31. 26 7月, 2020 1 次提交
  32. 02 6月, 2020 1 次提交
  33. 15 5月, 2020 1 次提交
    • J
      virtio_net: Add XDP frame size in two code paths · 9ce6146e
      Jesper Dangaard Brouer 提交于
      The virtio_net driver is running inside the guest-OS. There are two
      XDP receive code-paths in virtio_net, namely receive_small() and
      receive_mergeable(). The receive_big() function does not support XDP.
      
      In receive_small() the frame size is available in buflen. The buffer
      backing these frames are allocated in add_recvbuf_small() with same
      size, except for the headroom, but tailroom have reserved room for
      skb_shared_info. The headroom is encoded in ctx pointer as a value.
      
      In receive_mergeable() the frame size is more dynamic. There are two
      basic cases: (1) buffer size is based on a exponentially weighted
      moving average (see DECLARE_EWMA) of packet length. Or (2) in case
      virtnet_get_headroom() have any headroom then buffer size is
      PAGE_SIZE. The ctx pointer is this time used for encoding two values;
      the buffer len "truesize" and headroom. In case (1) if the rx buffer
      size is underestimated, the packet will have been split over more
      buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
      buffer area). If that happens the XDP path does a xdp_linearize_page
      operation.
      
      V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang.
      
      The code is really hard to follow, so some hints to reviewers.
      The receive_mergeable() case gets frames that were allocated in
      add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(),
      and 'buf' ptr is advanced this headroom.  The headroom can only
      be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really
      simple:
      
        static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
        {
      	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
        }
      
      As frame_sz is an offset size from xdp.data_hard_start, reviewers
      should notice how this is calculated in receive_mergeable():
      
        int offset = buf - page_address(page);
        [...]
        data = page_address(xdp_page) + offset;
        xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
      
      The calculated offset will always be VIRTIO_XDP_HEADROOM when
      reaching this code.  Thus, xdp.data_hard_start will be page-start
      address plus vi->hdr_len.  Given this xdp.frame_sz need to be
      reduced with vi->hdr_len size.
      
      IMHO a followup patch should cleanup this code to make it easier
      to maintain and understand, but it is outside the scope of this
      patchset.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
      9ce6146e