- 03 8月, 2021 1 次提交
-
-
由 Jakub Kicinski 提交于
We ended up merging two versions of the same patch set: commit 8fb7da9e ("virtio_net: get build_skb() buf by data ptr") commit 5c37711d ("virtio-net: fix for unable to handle page fault for address") into net, and commit 7bf64460 ("virtio-net: get build_skb() buf by data ptr") commit 6c66c147 ("virtio-net: fix for unable to handle page fault for address") into net-next. Redo the merge from commit 12628565 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net"), so that the most recent code remains. Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 7月, 2021 1 次提交
-
-
由 Yunjian Wang 提交于
As virtqueue_add_sgs() can fail, we should check the return value. Addresses-Coverity-ID: 1464439 ("Unchecked return value") Signed-off-by: NYunjian Wang <wangyunjian@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 7月, 2021 1 次提交
-
-
由 Michael S. Tsirkin 提交于
There are currently two cases where we poll TX vq not in response to a callback: start xmit and rx napi. We currently do this with callbacks enabled which can cause extra interrupts from the card. Used not to be a big issue as we run with interrupts disabled but that is no longer the case, and in some cases the rate of spurious interrupts is so high linux detects this and actually kills the interrupt. Fix up by disabling the callbacks before polling the tx vq. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 03 7月, 2021 3 次提交
-
-
由 Michael S. Tsirkin 提交于
We currently check num_free outside tx q lock which is unsafe: new packets can arrive meanwhile and there won't be space in the queue. Thus a spurious queue wakeup causing overhead and even packet drops. Move the check under the lock to fix that. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Michael S. Tsirkin 提交于
It's unsafe to operate a vq from multiple threads. Unfortunately this is exactly what we do when invoking clean tx poll from rx napi. Same happens with napi-tx even without the opportunistic cleaning from the receive interrupt: that races with processing the vq in start_xmit. As a fix move everything that deals with the vq to under tx lock. Fixes: b92f1e67 ("virtio-net: transmit napi") Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Xie Yongji 提交于
Do some cleanups in virtnet_restore() when virtnet_cpu_notif_add() failed. Signed-off-by: NXie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210517084516.332-1-xieyongji@bytedance.comAcked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 24 6月, 2021 1 次提交
-
-
由 Xianting Tian 提交于
virtio_find_vqs_ctx() is defined but never be called currently, it is the right place to use it. Signed-off-by: NXianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: NStefano Garzarella <sgarzare@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 6月, 2021 1 次提交
-
-
由 Xianting Tian 提交于
We should not directly BUG() when there is hdr error, it is better to output a print when such error happens. Currently, the caller of xmit_skb() already did it. Signed-off-by: NXianting Tian <xianting.tian@linux.alibaba.com> Reviewed-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2021 1 次提交
-
-
由 Xuan Zhuo 提交于
In virtio-net's large packet mode, there is a hole in the space behind buf. hdr_padded_len - hdr_len We must take this into account when calculating tailroom. [ 44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1)) [ 44.544864] page_to_skb (drivers/net/virtio_net.c:485) [ 44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131) [ 44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714) [ 44.546628] ? dev_gro_receive (net/core/dev.c:6103) [ 44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565) [ 44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525) [ 44.548251] __napi_poll (net/core/dev.c:6985) [ 44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139) [ 44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560) [ 44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649) [ 44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13)) [ 44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) [ 44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638) Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: NCorentin Noël <corentin.noel@collabora.com> Tested-by: NCorentin Noël <corentin.noel@collabora.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 6月, 2021 2 次提交
-
-
由 Xuan Zhuo 提交于
In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, we should get buf based on headroom instead of offset. This patch solves this problem. But if you don't use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xuan Zhuo 提交于
In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 6月, 2021 1 次提交
-
-
由 Xie Yongji 提交于
This adds validation for used length (might come from an untrusted device) to avoid data corruption or loss. Signed-off-by: NXie Yongji <xieyongji@bytedance.com> Acked-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210531135852.113-1-xieyongji@bytedance.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 5月, 2021 2 次提交
-
-
由 Xuan Zhuo 提交于
In the case of merge, the page passed into page_to_skb() may be a head page, not the page where the current data is located. So when trying to get the buf where the data is located, you should directly use the pointer(p) to get the address corresponding to the page. At the same time, the offset of the data in the page should also be obtained using offset_in_page(). This patch solves this problem. But if you don’t use this patch, the original code can also run, because if the page is not the page of the current data, the calculated tailroom will be less than 0, and will not enter the logic of build_skb() . The significance of this patch is to modify this logical problem, allowing more situations to use build_skb(). Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xuan Zhuo 提交于
In merge mode, when xdp is enabled, if the headroom of buf is smaller than virtnet_get_headroom(), xdp_linearize_page() will be called but the variable of "headroom" is still 0, which leads to wrong logic after entering page_to_skb(). [ 16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[ 16.602175] #PF: supervisor read access in kernel mode [ 16.603350] #PF: error_code(0x0000) - not-present page [ 16.604200] PGD 0 P4D 0 [ 16.604686] Oops: 0000 [#1] SMP PTI [ 16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G B 5.12.0+ #312 [ 16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04 [ 16.608217] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.618423] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.619738] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.624047] Call Trace: [ 16.624525] ? release_pages+0x24d/0x730 [ 16.625209] unmap_single_vma+0xa9/0x130 [ 16.625885] unmap_vmas+0x76/0xf0 [ 16.626480] exit_mmap+0xa0/0x210 [ 16.627129] mmput+0x67/0x180 [ 16.627673] do_exit+0x3d1/0xf10 [ 16.628259] ? do_user_addr_fault+0x231/0x840 [ 16.629000] do_group_exit+0x53/0xd0 [ 16.629631] __x64_sys_exit_group+0x1d/0x20 [ 16.630354] do_syscall_64+0x3c/0x80 [ 16.630988] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.631828] RIP: 0033:0x7f1a043d0191 [ 16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167. [ 16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 [ 16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191 [ 16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000 [ 16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001 [ 16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490 [ 16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000 [ 16.640408] Modules linked in: [ 16.640958] CR2: ffffecbfff7b43c8 [ 16.641557] ---[ end trace bc4891c6ce46354c ]--- [ 16.642335] RIP: 0010:unmap_page_range+0x947/0xde0 [ 16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065 [ 16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286 [ 16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359 [ 16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005 [ 16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030 [ 16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f [ 16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000 [ 16.652529] FS: 0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000 [ 16.653887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0 [ 16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 16.658290] Kernel panic - not syncing: Fatal exception [ 16.659613] Kernel Offset: disabled [ 16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2021 1 次提交
-
-
由 Max Gurtovoy 提交于
Not all virtio_net devices support the ctrl queue feature. Thus, there is no need to allocate unused resources. Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Link: https://lore.kernel.org/r/20210502093319.61313-1-mgurtovoy@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 24 4月, 2021 1 次提交
-
-
由 Xuan Zhuo 提交于
When "headroom" > 0, the actual allocated memory space is the entire page, so the address of the page should be used when passing it to build_skb(). BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260) Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534 CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382 Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc] Call Trace: <IRQ> dump_stack (lib/dump_stack.c:122) print_address_description.constprop.0 (mm/kasan/report.c:233) kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416) skb_gro_receive (net/core/skbuff.c:4260) tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1)) tcp4_gro_receive (net/ipv4/tcp_offload.c:316) inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2)) dev_gro_receive (net/core/dev.c:6075) napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198) receive_buf (drivers/net/virtio_net.c:1151) virtio_net virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net __napi_poll (net/core/dev.c:6964) net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118) __do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346) irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434) common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14)) </IRQ> Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Reported-by: NIdo Schimmel <idosch@nvidia.com> Tested-by: NIdo Schimmel <idosch@nvidia.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 4月, 2021 2 次提交
-
-
由 Eric Dumazet 提交于
KASAN/syzbot had 4 reports, one of them being: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline] BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480 Read of size 12 at addr ffff888014a5f800 by task systemd-udevd/8445 CPU: 0 PID: 8445 Comm: systemd-udevd Not tainted 5.12.0-rc8-next-20210419-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x141/0x1d7 lib/dump_stack.c:120 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436 check_region_inline mm/kasan/generic.c:180 [inline] kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186 memcpy+0x20/0x60 mm/kasan/shadow.c:65 memcpy include/linux/fortify-string.h:191 [inline] page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480 receive_mergeable drivers/net/virtio_net.c:1009 [inline] receive_buf+0x2bc0/0x6250 drivers/net/virtio_net.c:1119 virtnet_receive drivers/net/virtio_net.c:1411 [inline] virtnet_poll+0x568/0x10b0 drivers/net/virtio_net.c:1516 __napi_poll+0xaf/0x440 net/core/dev.c:6962 napi_poll net/core/dev.c:7029 [inline] net_rx_action+0x801/0xb40 net/core/dev.c:7116 __do_softirq+0x29b/0x9fe kernel/softirq.c:559 invoke_softirq kernel/softirq.c:433 [inline] __irq_exit_rcu+0x136/0x200 kernel/softirq.c:637 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649 common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240 Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Reported-by: NGuenter Roeck <linux@roeck-us.net> Reported-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
build_skb() is supposed to be followed by skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned. (Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD part is only a performance optimization if tunnel encaps are added.) Unfortunately virtio_net has not provisioned this reserve. We can only use build_skb() for arches where NET_IP_ALIGN == 0 We might refine this later, with enough testing. Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NGuenter Roeck <linux@roeck-us.net> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2021 1 次提交
-
-
由 Xuan Zhuo 提交于
In page_to_skb(), if we have enough tailroom to save skb_shared_info, we can use build_skb to create skb directly. No need to alloc for additional space. And it can save a 'frags slot', which is very friendly to GRO. Here, if the payload of the received package is too small (less than GOOD_COPY_LEN), we still choose to copy it directly to the space got by napi_alloc_skb. So we can reuse these pages. Testing Machine: The four queues of the network card are bound to the cpu1. Test command: for ((i=0;i<5;++i)); do sockperf tp --ip 192.168.122.64 -m 1000 -t 150& done The size of the udp package is 1000, so in the case of this patch, there will always be enough tailroom to use build_skb. The sent udp packet will be discarded because there is no port to receive it. The irqsoftd of the machine is 100%, we observe the received quantity displayed by sar -n DEV 1: no build_skb: 956864.00 rxpck/s build_skb: 1158465.00 rxpck/s Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Suggested-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 4月, 2021 1 次提交
-
-
由 Eric Dumazet 提交于
Xuan Zhuo reported that commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") brought a ~10% performance drop. The reason for the performance drop was that GRO was forced to chain sk_buff (using skb_shinfo(skb)->frag_list), which uses more memory but also cause packet consumers to go over a lot of overhead handling all the tiny skbs. It turns out that virtio_net page_to_skb() has a wrong strategy : It allocates skbs with GOOD_COPY_LEN (128) bytes in skb->head, then copies 128 bytes from the page, before feeding the packet to GRO stack. This was suboptimal before commit 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") because GRO was using 2 frags per MSS, meaning we were not packing MSS with 100% efficiency. Fix is to pull only the ethernet header in page_to_skb() Then, we change virtio_net_hdr_to_skb() to pull the missing headers, instead of assuming they were already pulled by callers. This fixes the performance regression, but could also allow virtio_net to accept packets with more than 128bytes of headers. Many thanks to Xuan Zhuo for his report, and his tests/help. Fixes: 3226b158 ("net: avoid 32 x truesize under-estimation for tiny skbs") Reported-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://www.spinics.net/lists/netdev/msg731397.htmlCo-Developed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtualization@lists.linux-foundation.org Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2021 1 次提交
-
-
由 Antoine Tenart 提交于
Move the xps maps (xps_cpus_map and xps_rxqs_map) to an array in net_device. That will simplify a lot the code removing the need for lots of if/else conditionals as the correct map will be available using its offset in the array. This should not modify the xps maps behaviour in any way. Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com> Signed-off-by: NAntoine Tenart <atenart@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2021 2 次提交
-
-
由 Lorenzo Bianconi 提交于
We want to change the current ndo_xdp_xmit drop semantics because it will allow us to implement better queue overflow handling. This is working towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error path handling from each XDP ethernet driver to devmap code. According to the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx loop whenever the hw reports a tx error and it will just return to devmap caller the number of successfully transmitted frames. It will be devmap responsibility to free dropped frames. Move each XDP ndo_xdp_xmit capable driver to the new APIs: - veth - virtio-net - mvneta - mvpp2 - socionext - amazon ena - bnxt - freescale (dpaa2, dpaa) - xen-frontend - qede - ice - igb - ixgbe - i40e - mlx5 - ti (cpsw, cpsw-new) - tun - sfc Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NIoana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: NCamelia Groza <camelia.groza@nxp.com> Acked-by: NEdward Cree <ecree.xilinx@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NShay Agroskin <shayagr@amazon.com> Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
-
由 Alexander Duyck 提交于
Update the code to replace instances of snprintf and a pointer update with just calling ethtool_sprintf. Also replace the char pointer with a u8 pointer to avoid having to recast the pointer type. Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2021 1 次提交
-
-
由 Xuan Zhuo 提交于
The number of queues implemented by many virtio backends is limited, especially some machines have a large number of CPUs. In this case, it is often impossible to allocate a separate queue for XDP_TX/XDP_REDIRECT, then xdp cannot be loaded to work, even xdp does not use the XDP_TX/XDP_REDIRECT. This patch allows XDP_TX/XDP_REDIRECT to run by reuse the existing SQ with __netif_tx_lock() hold when there are not enough queues. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Reviewed-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 2月, 2021 1 次提交
-
-
由 Xuan Zhuo 提交于
Virtio net supports the case where the skb linear space is empty, so add priv_flags. Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210218204908.5455-4-alobakin@pm.me
-
- 23 2月, 2021 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enable -Wimplicit-fallthrough for Clang, fix a warning by explicitly adding a goto statement instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Link: https://lore.kernel.org/r/cb9b9534572bc476f4fb7b49a73dc8646b780c84.1605896060.git.gustavoars@kernel.orgSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 09 1月, 2021 2 次提交
-
-
由 Lorenzo Bianconi 提交于
Introduce xdp_prepare_buff utility routine to initialize per-descriptor xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in all XDP capable drivers. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NShay Agroskin <shayagr@amazon.com> Acked-by: NMartin Habets <habetsm.xilinx@gmail.com> Acked-by: NCamelia Groza <camelia.groza@nxp.com> Acked-by: NMarcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Lorenzo Bianconi 提交于
Introduce xdp_init_buff utility routine to initialize xdp_buff fields const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on xdp_init_buff in all XDP capable drivers. Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NShay Agroskin <shayagr@amazon.com> Acked-by: NMartin Habets <habetsm.xilinx@gmail.com> Acked-by: NCamelia Groza <camelia.groza@nxp.com> Acked-by: NMarcin Wojtas <mw@semihalf.com> Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 24 12月, 2020 1 次提交
-
-
由 Jeff Dike 提交于
virtnet_set_channels can recursively call cpus_read_lock if CONFIG_XPS and CONFIG_HOTPLUG are enabled. The path is: virtnet_set_channels - calls get_online_cpus(), which is a trivial wrapper around cpus_read_lock() netif_set_real_num_tx_queues netif_reset_xps_queues_gt netif_reset_xps_queues - calls cpus_read_lock() This call chain and potential deadlock happens when the number of TX queues is reduced. This commit the removes netif_set_real_num_[tr]x_queues calls from inside the get/put_online_cpus section, as they don't require that it be held. Fixes: 47be2479 ("virtio-net: fix the set affinity bug when CPU IDs are not consecutive") Signed-off-by: NJeff Dike <jdike@akamai.com> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20201223025421.671-1-jdike@akamai.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 12月, 2020 1 次提交
-
-
由 Dan Carpenter 提交于
Set a negative error code intead of returning success if the MTU has been changed to something invalid. Fixes: fe36cbe0 ("virtio_net: clear MTU when out of range") Reported-by: NRobert Buhren <robert.buhren@sect.tu-berlin.de> Reported-by: NFelicitas Hetzelt <file@sect.tu-berlin.de> Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/X8pGVJSeeCdII1Ys@mwandaSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com>
-
- 01 12月, 2020 1 次提交
-
-
由 Björn Töpel 提交于
Add napi_id to the xdp_rxq_info structure, and make sure the XDP socket pick up the napi_id in the Rx path. The napi_id is used to find the corresponding NAPI structure for socket busy polling. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NTariq Toukan <tariqt@nvidia.com> Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
-
- 22 10月, 2020 1 次提交
-
-
由 Michael S. Tsirkin 提交于
This reverts commit 3618ad2a. When control vq is not negotiated, that commit causes a crash: [ 72.229171] kernel BUG at drivers/net/virtio_net.c:1667! [ 72.230266] invalid opcode: 0000 [#1] PREEMPT SMP [ 72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc8-02934-g3618ad2a #1 [ 72.231172] EIP: virtnet_send_command+0x120/0x140 [ 72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00 +00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00 [ 72.231172] EAX: 0000000d EBX: f72895c0 ECX: 00000017 EDX: 00000011 [ 72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98 [ 72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246 [ 72.231172] CR0: 80050033 CR2: 00000000 CR3: 02c84000 CR4: 000406f0 [ 72.231172] Call Trace: [ 72.231172] ? __virt_addr_valid+0x45/0x60 [ 72.231172] ? ___cache_free+0x51f/0x760 [ 72.231172] ? kobject_uevent_env+0xf4/0x560 [ 72.231172] virtnet_set_guest_offloads+0x4d/0x80 [ 72.231172] virtnet_set_features+0x85/0x120 [ 72.231172] ? virtnet_set_guest_offloads+0x80/0x80 [ 72.231172] __netdev_update_features+0x27a/0x8e0 [ 72.231172] ? kobject_uevent+0xa/0x20 [ 72.231172] ? netdev_register_kobject+0x12c/0x160 [ 72.231172] register_netdevice+0x4fe/0x740 [ 72.231172] register_netdev+0x1c/0x40 [ 72.231172] virtnet_probe+0x728/0xb60 [ 72.231172] ? _raw_spin_unlock+0x1d/0x40 [ 72.231172] ? virtio_vdpa_get_status+0x1c/0x20 [ 72.231172] virtio_dev_probe+0x1c6/0x271 [ 72.231172] really_probe+0x195/0x2e0 [ 72.231172] driver_probe_device+0x26/0x60 [ 72.231172] device_driver_attach+0x49/0x60 [ 72.231172] __driver_attach+0x46/0xc0 [ 72.231172] ? device_driver_attach+0x60/0x60 [ 72.231172] bus_add_driver+0x197/0x1c0 [ 72.231172] driver_register+0x66/0xc0 [ 72.231172] register_virtio_driver+0x1b/0x40 [ 72.231172] virtio_net_driver_init+0x61/0x86 [ 72.231172] ? veth_init+0x14/0x14 [ 72.231172] do_one_initcall+0x76/0x2e4 [ 72.231172] ? rdinit_setup+0x2a/0x2a [ 72.231172] do_initcalls+0xb2/0xd5 [ 72.231172] kernel_init_freeable+0x14f/0x179 [ 72.231172] ? rest_init+0x100/0x100 [ 72.231172] kernel_init+0xd/0xe0 [ 72.231172] ret_from_fork+0x1c/0x30 [ 72.231172] Modules linked in: [ 72.269563] ---[ end trace a6ebc4afea0e6cb1 ]--- The reason is that virtnet_set_features now calls virtnet_set_guest_offloads unconditionally, it used to only call it when there is something to configure. If device does not have a control vq, everything breaks. Revert the original commit for now. Cc: Tonghao Zhang <xiangxia.m.yue@gmail.com> Fixes: 3618ad2a ("virtio-net: ethtool configurable RXCSUM") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20201021142944.13615-1-mst@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 10月, 2020 1 次提交
-
-
由 Tonghao Zhang 提交于
Allow user configuring RXCSUM separately with ethtool -K, reusing the existing virtnet_set_guest_offloads helper that configures RXCSUM for XDP. This is conditional on VIRTIO_NET_F_CTRL_GUEST_OFFLOADS. If Rx checksum is disabled, LRO should also be disabled. Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20201012015820.62042-1-xiangxia.m.yue@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 30 9月, 2020 1 次提交
-
-
由 Tonghao Zhang 提交于
Open vSwitch and Linux bridge will disable LRO of the interface when this interface added to them. Now when disable the LRO, the virtio-net csum is disable too. That drops the forwarding performance. Fixes: a02e8964 ("virtio-net: ethtool configurable LRO") Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 9月, 2020 1 次提交
-
-
由 Jakub Kicinski 提交于
We allow drivers to call napi_hash_del() before calling netif_napi_del() to batch RCU grace periods. This makes the API asymmetric and leaks internal implementation details. Soon we will want the grace period to protect more than just the NAPI hash table. Restructure the API and have drivers call a new function - __netif_napi_del() if they want to take care of RCU waits. Note that only core was checking the return status from napi_hash_del() so the new helper does not report if the NAPI was actually deleted. Some notes on driver oddness: - veth observed the grace period before calling netif_napi_del() but that should not matter - myri10ge observed normal RCU flavor - bnx2x and enic did not actually observe the grace period (unless they did so implicitly) - virtio_net and enic only unhashed Rx NAPIs The last two points seem to indicate that the calls to napi_hash_del() were a left over rather than an optimization. Regardless, it's easy enough to correct them. This patch may introduce extra synchronize_net() calls for interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on free_netdev() to call netif_napi_del(). This seems inevitable since we want to use RCU for netpoll dev->napi_list traversal, and almost no drivers set IFF_DISABLE_NETPOLL. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 8月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 05 8月, 2020 1 次提交
-
-
由 Michael S. Tsirkin 提交于
Speed and duplex config fields depend on VIRTIO_NET_F_SPEED_DUPLEX which being 63>31 depends on VIRTIO_F_VERSION_1. Accordingly, use LE accessors for these fields. Reported-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
- 26 7月, 2020 1 次提交
-
-
由 Andrii Nakryiko 提交于
Now that BPF program/link management is centralized in generic net_device code, kernel code never queries program id from drivers, so XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary. This patch removes all the implementations of those commands in kernel, along the xdp_attachment_query(). This patch was compile-tested on allyesconfig. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
-
- 02 6月, 2020 1 次提交
-
-
由 Lorenzo Bianconi 提交于
In order to use standard 'xdp' prefix, rename convert_to_xdp_frame utility routine in xdp_convert_buff_to_frame and replace all the occurrences Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org
-
- 15 5月, 2020 1 次提交
-
-
由 Jesper Dangaard Brouer 提交于
The virtio_net driver is running inside the guest-OS. There are two XDP receive code-paths in virtio_net, namely receive_small() and receive_mergeable(). The receive_big() function does not support XDP. In receive_small() the frame size is available in buflen. The buffer backing these frames are allocated in add_recvbuf_small() with same size, except for the headroom, but tailroom have reserved room for skb_shared_info. The headroom is encoded in ctx pointer as a value. In receive_mergeable() the frame size is more dynamic. There are two basic cases: (1) buffer size is based on a exponentially weighted moving average (see DECLARE_EWMA) of packet length. Or (2) in case virtnet_get_headroom() have any headroom then buffer size is PAGE_SIZE. The ctx pointer is this time used for encoding two values; the buffer len "truesize" and headroom. In case (1) if the rx buffer size is underestimated, the packet will have been split over more buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of buffer area). If that happens the XDP path does a xdp_linearize_page operation. V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang. The code is really hard to follow, so some hints to reviewers. The receive_mergeable() case gets frames that were allocated in add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(), and 'buf' ptr is advanced this headroom. The headroom can only be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really simple: static unsigned int virtnet_get_headroom(struct virtnet_info *vi) { return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0; } As frame_sz is an offset size from xdp.data_hard_start, reviewers should notice how this is calculated in receive_mergeable(): int offset = buf - page_address(page); [...] data = page_address(xdp_page) + offset; xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len; The calculated offset will always be VIRTIO_XDP_HEADROOM when reaching this code. Thus, xdp.data_hard_start will be page-start address plus vi->hdr_len. Given this xdp.frame_sz need to be reduced with vi->hdr_len size. IMHO a followup patch should cleanup this code to make it easier to maintain and understand, but it is outside the scope of this patchset. Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NJason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/bpf/158945344436.97035.9445115070189151680.stgit@firesoul
-