提交 · 91f41f01d2195d4d059ad7f141e41d40a45e1e1c · openeuler / Kernel

29 3月, 2022 3 次提交

drivers/net/virtio_net: Added RSS hash report. · 91f41f01

由 Andrew Melnychenko 提交于 3月 28, 2022

Added features for RSS hash report.
If hash is provided - it sets to skb.
Added checks if rss and/or hash are enabled together.
Signed-off-by: NAndrew Melnychenko <andrew@daynix.com>
Link: https://lore.kernel.org/r/20220328175336.10802-4-andrew@daynix.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

91f41f01

drivers/net/virtio_net: Added basic RSS support. · c7114b12

由 Andrew Melnychenko 提交于 3月 28, 2022

Added features for RSS.
Added initialization, RXHASH feature and ethtool ops.
By default RSS/RXHASH is disabled.
Virtio RSS "IPv6 extensions" hashes disabled.
Added ethtools ops to set key and indirection table.
Signed-off-by: NAndrew Melnychenko <andrew@daynix.com>
Link: https://lore.kernel.org/r/20220328175336.10802-3-andrew@daynix.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

c7114b12

drivers/net/virtio_net: Fixed padded vheader to use v1 with hash. · c1ddc42d

由 Andrew Melnychenko 提交于 3月 28, 2022

The header v1 provides additional info about RSS.
Added changes to computing proper header length.
In the next patches, the header may contain RSS hash info
for the hash population.
Signed-off-by: NAndrew Melnychenko <andrew@daynix.com>
Link: https://lore.kernel.org/r/20220328175336.10802-2-andrew@daynix.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

c1ddc42d

16 1月, 2022 1 次提交

cpumask: replace cpumask_next_* with cpumask_first_* where appropriate · 9b51d9d8

由 Yury Norov 提交于 8月 14, 2021

cpumask_first() is a more effective analogue of 'next' version if n == -1
(which means start == 0). This patch replaces 'next' with 'first' where
things look trivial.

There's no cpumask_first_zero() function, so create it.
Signed-off-by: NYury Norov <yury.norov@gmail.com>
Tested-by: NWolfram Sang <wsa+renesas@sang-engineering.com>

9b51d9d8

15 1月, 2022 1 次提交

virtio: wrap config->reset calls · d9679d00

由 Michael S. Tsirkin 提交于 10月 13, 2021

This will enable cleanups down the road.
The idea is to disable cbs, then add "flush_queued_cbs" callback
as a parameter, this way drivers can flush any work
queued after callbacks have been disabled.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

d9679d00

16 12月, 2021 1 次提交

virtio_net: fix rx_drops stat for small pkts · 053c9e18

由 Wenliang Wang 提交于 12月 16, 2021

We found the stat of rx drops for small pkts does not increment when
build_skb fail, it's not coherent with other mode's rx drops stat.
Signed-off-by: NWenliang Wang <wangwenliang.1995@bytedance.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

053c9e18

14 12月, 2021 1 次提交

bpf: Let bpf_warn_invalid_xdp_action() report more info · c8064e5b

由 Paolo Abeni 提交于 11月 30, 2021

In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com

c8064e5b

25 11月, 2021 1 次提交

Revert "virtio-net: don't let virtio core to validate used length" · fcfb65f8

由 Michael S. Tsirkin 提交于 11月 24, 2021

This reverts commit 816625c1.

Attempts to validate length in the core did not work out.
We'll drop them, so revert the dependent changes in drivers.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

fcfb65f8

22 11月, 2021 1 次提交

ethtool: extend ringparam setting/getting API with rx_buf_len · 74624944

由 Hao Chen 提交于 11月 18, 2021

Add two new parameters kernel_ringparam and extack for
.get_ringparam and .set_ringparam to extend more ring params
through netlink.
Signed-off-by: NHao Chen <chenhao288@hisilicon.com>
Signed-off-by: NGuangbin Huang <huangguangbin2@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74624944

17 11月, 2021 1 次提交

net: annotate accesses to queue->trans_start · 5337824f

由 Eric Dumazet 提交于 11月 16, 2021

In following patches, dev_watchdog() will no longer stop all queues.
It will read queue->trans_start locklessly.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5337824f

01 11月, 2021 2 次提交

virtio-net: don't let virtio core to validate used length · 816625c1

由 Jason Wang 提交于 10月 27, 2021

For RX virtuqueue, the used length is validated in all the three paths
(big, small and mergeable). For control vq, we never tries to use used
length. So this patch forbids the core to validate the used length.
Signed-off-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027022107.14357-3-jasowang@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

816625c1

virtio_net: clarify tailroom logic · fc02e8cb

由 Michael S. Tsirkin 提交于 10月 09, 2021

Make tailroom math follow same logic as everything else, subtracing
values in the order in which things are laid out in the buffer.
Tested-by: NCorentin Noël <corentin.noel@collabora.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

fc02e8cb

28 10月, 2021 1 次提交

net: virtio: use eth_hw_addr_set() · f2edaa4a

由 Jakub Kicinski 提交于 10月 27, 2021

Commit 406f42fa ("net-next: When a bond have a massive amount
of VLANs...") introduced a rbtree for faster Ethernet address look
up. To maintain netdev->dev_addr in this tree we need to make all
the writes to it go through appropriate helpers.

Even though the current code uses dev->addr_len the we can switch
to eth_hw_addr_set() instead of dev_addr_set(). The netdev is
always allocated by alloc_etherdev_mq() and there are at least two
places which assume Ethernet address:
 - the line below calling eth_hw_addr_random()
 - virtnet_set_mac_address() -> eth_commit_mac_addr_change()
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20211027152012.3393077-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

f2edaa4a

10 10月, 2021 1 次提交

virtio_net: skip RCU read lock by checking xdp_enabled of vi · 6213f07c

由 Li RongQing 提交于 10月 09, 2021

networking benchmark shows that __rcu_read_lock and
__rcu_read_unlock takes some cpu cycles, and we can avoid
calling them partially in virtio rx path by check xdp_enabled
of vi, and xdp is disabled most of time
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6213f07c

09 10月, 2021 1 次提交

virtio-net: fix for skb_over_panic inside big mode · 732b74d6

由 Xuan Zhuo 提交于 10月 09, 2021

commit 12628565 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
accidentally reverted the effect of
commit 1a802423 ("virtio-net: fix for skb_over_panic inside big mode")
on drivers/net/virtio_net.c

As a result, users of crosvm (which is using large packet mode)
are experiencing crashes with 5.14-rc1 and above that do not
occur with 5.13.

Crash trace:

[   61.346677] skbuff: skb_over_panic: text:ffffffff881ae2c7 len:3762 put:3762 head:ffff8a5ec8c22000 data:ffff8a5ec8c22010 tail:0xec2 end:0xec0 dev:<NULL>
[   61.369192] kernel BUG at net/core/skbuff.c:111!
[   61.372840] invalid opcode: 0000 [#1] SMP PTI
[   61.374892] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 5.14.0-rc1 linux-v5.14-rc1-for-mesa-ci.tar.bz2 #1
[   61.376450] Hardware name: ChromiumOS crosvm, BIOS 0

..

[   61.393635] Call Trace:
[   61.394127]  <IRQ>
[   61.394488]  skb_put.cold+0x10/0x10
[   61.395095]  page_to_skb+0xf7/0x410
[   61.395689]  receive_buf+0x81/0x1660
[   61.396228]  ? netif_receive_skb_list_internal+0x1ad/0x2b0
[   61.397180]  ? napi_gro_flush+0x97/0xe0
[   61.397896]  ? detach_buf_split+0x67/0x120
[   61.398573]  virtnet_poll+0x2cf/0x420
[   61.399197]  __napi_poll+0x25/0x150
[   61.399764]  net_rx_action+0x22f/0x280
[   61.400394]  __do_softirq+0xba/0x257
[   61.401012]  irq_exit_rcu+0x8e/0xb0
[   61.401618]  common_interrupt+0x7b/0xa0
[   61.402270]  </IRQ>

See
https://lore.kernel.org/r/5edaa2b7c2fe4abd0347b8454b2ac032b6694e2c.camel%40collabora.com
for the report.

Apply the original 1a802423 ("virtio-net: fix for skb_over_panic inside big mode")
again, the original logic still holds:

In virtio-net's large packet mode, there is a hole in the space behind
buf.

    hdr_padded_len - hdr_len

We must take this into account when calculating tailroom.

Cc: Greg KH <gregkh@linuxfoundation.org>
Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Fixes: 12628565 ("Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: NCorentin Noël <corentin.noel@collabora.com>
Tested-by: NCorentin Noël <corentin.noel@collabora.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

732b74d6

20 9月, 2021 1 次提交

virtio_net: introduce TX timeout watchdog · a520794b

由 Tony Lu 提交于 9月 17, 2021

This implements ndo_tx_timeout handler and put this into stats. When
there is something wrong to send out packets, we could notice tx timeout
events and total timeout counter.

We have suffered send timeout issues due to the backends hung. With this,
we can find the details, and collect the counters by monitor systems.
Signed-off-by: NTony Lu <tony.ly@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a520794b

19 9月, 2021 2 次提交

virtio_net: use netdev_warn_once to output warn when without enough queues · 9ce4e3d6

由 Xuan Zhuo 提交于 9月 18, 2021

This warning is output when virtnet does not have enough queues, but it
only needs to be printed once to inform the user of this situation. It
is not necessary to print it every time. If the user loads xdp
frequently, this log appears too much.
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ce4e3d6

virtio-net: fix pages leaking when building skb in big mode · afd92d82

由 Jason Wang 提交于 9月 17, 2021

We try to use build_skb() if we had sufficient tailroom. But we forget
to release the unused pages chained via private in big mode which will
leak pages. Fixing this by release the pages after building the skb in
big mode.

Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NJason Wang <jasowang@redhat.com>
Reviewed-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afd92d82

29 8月, 2021 1 次提交

virtio_net: reduce raw_smp_processor_id() calling in virtnet_xdp_get_sq · 3dcc1edc

由 Li RongQing 提交于 8月 26, 2021

smp_processor_id()/raw* will be called once each when not
more queues in virtnet_xdp_get_sq() which is called in
non-preemptible context, so it's safe to call the function
smp_processor_id() once.
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3dcc1edc

24 8月, 2021 1 次提交

ethtool: extend coalesce setting uAPI with CQE mode · f3ccfda1

由 Yufeng Mo 提交于 8月 20, 2021

In order to support more coalesce parameters through netlink,
add two new parameter kernel_coal and extack for .set_coalesce
and .get_coalesce, then some extra info can return to user with
the netlink API.
Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

f3ccfda1

17 8月, 2021 1 次提交

virtio-net: use NETIF_F_GRO_HW instead of NETIF_F_LRO · dbcf24d1

由 Jason Wang 提交于 8月 17, 2021

Commit a02e8964 ("virtio-net: ethtool configurable LRO")
maps LRO to virtio guest offloading features and allows the
administrator to enable and disable those features via ethtool.

This leads to several issues:

- For a device that doesn't support control guest offloads, the "LRO"
  can't be disabled triggering WARN in dev_disable_lro() when turning
  off LRO or when enabling forwarding bridging etc.

- For a device that supports control guest offloads, the guest
  offloads are disabled in cases of bridging, forwarding etc slowing
  down the traffic.

Fix this by using NETIF_F_GRO_HW instead. Though the spec does not
guarantee packets to be re-segmented as the original ones,
we can add that to the spec, possibly with a flag for devices to
differentiate between GRO and LRO.

Further, we never advertised LRO historically before a02e8964
("virtio-net: ethtool configurable LRO") and so bridged/forwarded
configs effectively always relied on virtio receive offloads behaving
like GRO - thus even if this breaks any configs it is at least not
a regression.

Fixes: a02e8964 ("virtio-net: ethtool configurable LRO")
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reported-by: NIvan <ivan@prestigetransportation.com>
Tested-by: NIvan <ivan@prestigetransportation.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dbcf24d1

05 8月, 2021 1 次提交

virtio_net: Replace deprecated CPU-hotplug functions. · a0d1d0f4

由 Sebastian Andrzej Siewior 提交于 8月 03, 2021

The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a0d1d0f4

03 8月, 2021 1 次提交

virtio-net: realign page_to_skb() after merges · c32325b8

由 Jakub Kicinski 提交于 8月 02, 2021

We ended up merging two versions of the same patch set:

commit 8fb7da9e ("virtio_net: get build_skb() buf by data ptr")
commit 5c37711d ("virtio-net: fix for unable to handle page fault for address")

into net, and

commit 7bf64460 ("virtio-net: get build_skb() buf by data ptr")
commit 6c66c147 ("virtio-net: fix for unable to handle page fault for address")

into net-next. Redo the merge from commit 12628565 ("Merge
ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net"), so that
the most recent code remains.
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c32325b8

11 7月, 2021 1 次提交

virtio_net: check virtqueue_add_sgs() return value · 222722bc

由 Yunjian Wang 提交于 7月 10, 2021

As virtqueue_add_sgs() can fail, we should check the return value.

Addresses-Coverity-ID: 1464439 ("Unchecked return value")
Signed-off-by: NYunjian Wang <wangyunjian@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

222722bc

08 7月, 2021 1 次提交

virtio_net: disable cb aggressively · a7766ef1

由 Michael S. Tsirkin 提交于 4月 13, 2021

There are currently two cases where we poll TX vq not in response to a
callback: start xmit and rx napi. We currently do this with callbacks
enabled which can cause extra interrupts from the card. Used not to be
a big issue as we run with interrupts disabled but that is no longer the
case, and in some cases the rate of spurious interrupts is so high
linux detects this and actually kills the interrupt.

Fix up by disabling the callbacks before polling the tx vq.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a7766ef1

03 7月, 2021 3 次提交

virtio_net: move txq wakeups under tx q lock · 22bc63c5

由 Michael S. Tsirkin 提交于 4月 13, 2021

We currently check num_free outside tx q lock
which is unsafe: new packets can arrive meanwhile
and there won't be space in the queue.
Thus a spurious queue wakeup causing overhead
and even packet drops.

Move the check under the lock to fix that.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

22bc63c5

virtio_net: move tx vq operation under tx queue lock · 5a2f966d

由 Michael S. Tsirkin 提交于 4月 13, 2021

It's unsafe to operate a vq from multiple threads.
Unfortunately this is exactly what we do when invoking
clean tx poll from rx napi.
Same happens with napi-tx even without the
opportunistic cleaning from the receive interrupt: that races
with processing the vq in start_xmit.

As a fix move everything that deals with the vq to under tx lock.

Fixes: b92f1e67 ("virtio-net: transmit napi")
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

5a2f966d

virtio_net: Fix error handling in virtnet_restore() · 3f2869ca

由 Xie Yongji 提交于 5月 17, 2021

Do some cleanups in virtnet_restore() when virtnet_cpu_notif_add() failed.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Link: https://lore.kernel.org/r/20210517084516.332-1-xieyongji@bytedance.comAcked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3f2869ca

24 6月, 2021 1 次提交

virtio_net: Use virtio_find_vqs_ctx() helper · a2f7dc00

由 Xianting Tian 提交于 6月 23, 2021

virtio_find_vqs_ctx() is defined but never be called currently,
it is the right place to use it.
Signed-off-by: NXianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2f7dc00

08 6月, 2021 1 次提交

virtio_net: Remove BUG() to avoid machine dead · 85eb1389

由 Xianting Tian 提交于 6月 05, 2021

We should not directly BUG() when there is hdr error, it is
better to output a print when such error happens. Currently,
the caller of xmit_skb() already did it.
Signed-off-by: NXianting Tian <xianting.tian@linux.alibaba.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85eb1389

04 6月, 2021 1 次提交

virtio-net: fix for skb_over_panic inside big mode · 1a802423

由 Xuan Zhuo 提交于 6月 04, 2021

In virtio-net's large packet mode, there is a hole in the space behind
buf.

    hdr_padded_len - hdr_len

We must take this into account when calculating tailroom.

[   44.544385] skb_put.cold (net/core/skbuff.c:5254 (discriminator 1) net/core/skbuff.c:5252 (discriminator 1))
[   44.544864] page_to_skb (drivers/net/virtio_net.c:485) [   44.545361] receive_buf (drivers/net/virtio_net.c:849 drivers/net/virtio_net.c:1131)
[   44.545870] ? netif_receive_skb_list_internal (net/core/dev.c:5714)
[   44.546628] ? dev_gro_receive (net/core/dev.c:6103)
[   44.547135] ? napi_complete_done (./include/linux/list.h:35 net/core/dev.c:5867 net/core/dev.c:5862 net/core/dev.c:6565)
[   44.547672] virtnet_poll (drivers/net/virtio_net.c:1427 drivers/net/virtio_net.c:1525)
[   44.548251] __napi_poll (net/core/dev.c:6985)
[   44.548744] net_rx_action (net/core/dev.c:7054 net/core/dev.c:7139)
[   44.549264] __do_softirq (./arch/x86/include/asm/jump_label.h:19 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:560)
[   44.549762] irq_exit_rcu (kernel/softirq.c:433 kernel/softirq.c:637 kernel/softirq.c:649)
[   44.551384] common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 13))
[   44.551991] ? asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)
[   44.552654] asm_common_interrupt (./arch/x86/include/asm/idtentry.h:638)

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: NCorentin Noël <corentin.noel@collabora.com>
Tested-by: NCorentin Noël <corentin.noel@collabora.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1a802423

02 6月, 2021 2 次提交

virtio_net: get build_skb() buf by data ptr · 8fb7da9e

由 Xuan Zhuo 提交于 6月 01, 2021

In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, we should get buf based on
headroom instead of offset.

This patch solves this problem. But if you don't use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb7da9e

virtio-net: fix for unable to handle page fault for address · 5c37711d

由 Xuan Zhuo 提交于 6月 01, 2021

In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().

[   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
[   16.603350] #PF: error_code(0x0000) - not-present page
[   16.604200] PGD 0 P4D 0
[   16.604686] Oops: 0000 [#1] SMP PTI
[   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
[   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.624047] Call Trace:
[   16.624525]  ? release_pages+0x24d/0x730
[   16.625209]  unmap_single_vma+0xa9/0x130
[   16.625885]  unmap_vmas+0x76/0xf0
[   16.626480]  exit_mmap+0xa0/0x210
[   16.627129]  mmput+0x67/0x180
[   16.627673]  do_exit+0x3d1/0xf10
[   16.628259]  ? do_user_addr_fault+0x231/0x840
[   16.629000]  do_group_exit+0x53/0xd0
[   16.629631]  __x64_sys_exit_group+0x1d/0x20
[   16.630354]  do_syscall_64+0x3c/0x80
[   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   16.631828] RIP: 0033:0x7f1a043d0191
[   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
[   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
[   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
[   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
[   16.640408] Modules linked in:
[   16.640958] CR2: ffffecbfff7b43c8
[   16.641557] ---[ end trace bc4891c6ce46354c ]---
[   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.658290] Kernel panic - not syncing: Fatal exception
[   16.659613] Kernel Offset: disabled
[   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5c37711d

01 6月, 2021 1 次提交

virtio-net: Add validation for used length · ad993a95

由 Xie Yongji 提交于 5月 31, 2021

This adds validation for used length (might come
from an untrusted device) to avoid data corruption
or loss.
Signed-off-by: NXie Yongji <xieyongji@bytedance.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Link: https://lore.kernel.org/r/20210531135852.113-1-xieyongji@bytedance.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

ad993a95

14 5月, 2021 2 次提交

virtio-net: get build_skb() buf by data ptr · 7bf64460

由 Xuan Zhuo 提交于 5月 13, 2021

In the case of merge, the page passed into page_to_skb() may be a head
page, not the page where the current data is located. So when trying to
get the buf where the data is located, you should directly use the
pointer(p) to get the address corresponding to the page.

At the same time, the offset of the data in the page should also be
obtained using offset_in_page().

This patch solves this problem. But if you don’t use this patch, the
original code can also run, because if the page is not the page of the
current data, the calculated tailroom will be less than 0, and will not
enter the logic of build_skb() . The significance of this patch is to
modify this logical problem, allowing more situations to use
build_skb().
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7bf64460

virtio-net: fix for unable to handle page fault for address · 6c66c147

由 Xuan Zhuo 提交于 5月 13, 2021

In merge mode, when xdp is enabled, if the headroom of buf is smaller
than virtnet_get_headroom(), xdp_linearize_page() will be called but the
variable of "headroom" is still 0, which leads to wrong logic after
entering page_to_skb().

[   16.600944] BUG: unable to handle page fault for address: ffffecbfff7b43c8[   16.602175] #PF: supervisor read access in kernel mode
[   16.603350] #PF: error_code(0x0000) - not-present page
[   16.604200] PGD 0 P4D 0
[   16.604686] Oops: 0000 [#1] SMP PTI
[   16.605306] CPU: 4 PID: 715 Comm: sh Tainted: G    B             5.12.0+ #312
[   16.606429] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/04
[   16.608217] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.609014] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.611863] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.612720] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.613853] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.614976] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.616124] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.617276] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.618423] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.619738] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.620670] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.621792] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.622920] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.624047] Call Trace:
[   16.624525]  ? release_pages+0x24d/0x730
[   16.625209]  unmap_single_vma+0xa9/0x130
[   16.625885]  unmap_vmas+0x76/0xf0
[   16.626480]  exit_mmap+0xa0/0x210
[   16.627129]  mmput+0x67/0x180
[   16.627673]  do_exit+0x3d1/0xf10
[   16.628259]  ? do_user_addr_fault+0x231/0x840
[   16.629000]  do_group_exit+0x53/0xd0
[   16.629631]  __x64_sys_exit_group+0x1d/0x20
[   16.630354]  do_syscall_64+0x3c/0x80
[   16.630988]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[   16.631828] RIP: 0033:0x7f1a043d0191
[   16.632464] Code: Unable to access opcode bytes at RIP 0x7f1a043d0167.
[   16.633502] RSP: 002b:00007ffe3d993308 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[   16.634737] RAX: ffffffffffffffda RBX: 00007f1a044c9490 RCX: 00007f1a043d0191
[   16.635857] RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
[   16.636986] RBP: 0000000000000000 R08: ffffffffffffff88 R09: 0000000000000001
[   16.638120] R10: 0000000000000008 R11: 0000000000000246 R12: 00007f1a044c9490
[   16.639245] R13: 0000000000000001 R14: 00007f1a044c9968 R15: 0000000000000000
[   16.640408] Modules linked in:
[   16.640958] CR2: ffffecbfff7b43c8
[   16.641557] ---[ end trace bc4891c6ce46354c ]---
[   16.642335] RIP: 0010:unmap_page_range+0x947/0xde0
[   16.643135] Code: 00 00 08 00 48 83 f8 01 45 19 e4 41 f7 d4 41 83 e4 03 e9 a4 fd ff ff e8 b7 63 ed ff 4c 89 e0 48 c1 e0 065
[   16.645983] RSP: 0018:ffffc90002503c58 EFLAGS: 00010286
[   16.646845] RAX: ffffecbfff7b43c0 RBX: 00007f19f7203000 RCX: ffffffff812ff359
[   16.647970] RDX: ffff888107778000 RSI: 0000000000000000 RDI: 0000000000000005
[   16.649091] RBP: ffffea000425e000 R08: 0000000000000000 R09: 3030303030303030
[   16.650250] R10: ffffffff82ed7d94 R11: 6637303030302052 R12: 7c00000afffded0f
[   16.651394] R13: 0000000000000001 R14: ffff888119ee7010 R15: 00007f19f7202000
[   16.652529] FS:  0000000000000000(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[   16.653887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   16.654841] CR2: ffffecbfff7b43c8 CR3: 0000000103220005 CR4: 0000000000370ee0
[   16.655992] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   16.657150] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   16.658290] Kernel panic - not syncing: Fatal exception
[   16.659613] Kernel Offset: disabled
[   16.660234] ---[ end Kernel panic - not syncing: Fatal exception ]---

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c66c147

03 5月, 2021 1 次提交

virtio-net: don't allocate control_buf if not supported · 122b84a1

由 Max Gurtovoy 提交于 5月 02, 2021

Not all virtio_net devices support the ctrl queue feature. Thus, there
is no need to allocate unused resources.
Signed-off-by: NMax Gurtovoy <mgurtovoy@nvidia.com>
Link: https://lore.kernel.org/r/20210502093319.61313-1-mgurtovoy@nvidia.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>

122b84a1

24 4月, 2021 1 次提交

virtio-net: fix use-after-free in skb_gro_receive · f80bd740

由 Xuan Zhuo 提交于 4月 22, 2021

When "headroom" > 0, the actual allocated memory space is the entire
page, so the address of the page should be used when passing it to
build_skb().

BUG: KASAN: use-after-free in skb_gro_receive (net/core/skbuff.c:4260)
Write of size 16 at addr ffff88811619fffc by task kworker/u9:0/534
CPU: 2 PID: 534 Comm: kworker/u9:0 Not tainted 5.12.0-rc7-custom-16372-gb150be05b806 #3382
Hardware name: QEMU MSN2700, BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Workqueue: xprtiod xs_stream_data_receive_workfn [sunrpc]
Call Trace:
 <IRQ>
dump_stack (lib/dump_stack.c:122)
print_address_description.constprop.0 (mm/kasan/report.c:233)
kasan_report.cold (mm/kasan/report.c:400 mm/kasan/report.c:416)
skb_gro_receive (net/core/skbuff.c:4260)
tcp_gro_receive (net/ipv4/tcp_offload.c:266 (discriminator 1))
tcp4_gro_receive (net/ipv4/tcp_offload.c:316)
inet_gro_receive (net/ipv4/af_inet.c:1545 (discriminator 2))
dev_gro_receive (net/core/dev.c:6075)
napi_gro_receive (net/core/dev.c:6168 net/core/dev.c:6198)
receive_buf (drivers/net/virtio_net.c:1151) virtio_net
virtnet_poll (drivers/net/virtio_net.c:1415 drivers/net/virtio_net.c:1519) virtio_net
__napi_poll (net/core/dev.c:6964)
net_rx_action (net/core/dev.c:7033 net/core/dev.c:7118)
__do_softirq (./arch/x86/include/asm/jump_label.h:25 ./include/linux/jump_label.h:200 ./include/trace/events/irq.h:142 kernel/softirq.c:346)
irq_exit_rcu (kernel/softirq.c:221 kernel/softirq.c:422 kernel/softirq.c:434)
common_interrupt (arch/x86/kernel/irq.c:240 (discriminator 14))
</IRQ>

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Reported-by: NIdo Schimmel <idosch@nvidia.com>
Tested-by: NIdo Schimmel <idosch@nvidia.com>
Acked-by: NJason Wang <jasowang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f80bd740

21 4月, 2021 2 次提交

virtio-net: fix use-after-free in page_to_skb() · af39c8f7

由 Eric Dumazet 提交于 4月 20, 2021

KASAN/syzbot had 4 reports, one of them being:

BUG: KASAN: slab-out-of-bounds in memcpy include/linux/fortify-string.h:191 [inline]
BUG: KASAN: slab-out-of-bounds in page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
Read of size 12 at addr ffff888014a5f800 by task systemd-udevd/8445

CPU: 0 PID: 8445 Comm: systemd-udevd Not tainted 5.12.0-rc8-next-20210419-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 <IRQ>
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:233
 __kasan_report mm/kasan/report.c:419 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:436
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 memcpy+0x20/0x60 mm/kasan/shadow.c:65
 memcpy include/linux/fortify-string.h:191 [inline]
 page_to_skb+0x5cf/0xb70 drivers/net/virtio_net.c:480
 receive_mergeable drivers/net/virtio_net.c:1009 [inline]
 receive_buf+0x2bc0/0x6250 drivers/net/virtio_net.c:1119
 virtnet_receive drivers/net/virtio_net.c:1411 [inline]
 virtnet_poll+0x568/0x10b0 drivers/net/virtio_net.c:1516
 __napi_poll+0xaf/0x440 net/core/dev.c:6962
 napi_poll net/core/dev.c:7029 [inline]
 net_rx_action+0x801/0xb40 net/core/dev.c:7116
 __do_softirq+0x29b/0x9fe kernel/softirq.c:559
 invoke_softirq kernel/softirq.c:433 [inline]
 __irq_exit_rcu+0x136/0x200 kernel/softirq.c:637
 irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
 common_interrupt+0xa4/0xd0 arch/x86/kernel/irq.c:240

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Reported-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af39c8f7

virtio-net: restrict build_skb() use to some arches · f5d7872a

由 Eric Dumazet 提交于 4月 20, 2021

build_skb() is supposed to be followed by
skb_reserve(skb, NET_IP_ALIGN), so that IP headers are word-aligned.
(Best practice is to reserve NET_IP_ALIGN+NET_SKB_PAD, but the NET_SKB_PAD
part is only a performance optimization if tunnel encaps are added.)

Unfortunately virtio_net has not provisioned this reserve.
We can only use build_skb() for arches where NET_IP_ALIGN == 0

We might refine this later, with enough testing.

Fixes: fb32856b ("virtio-net: page_to_skb() use build_skb when there's sufficient tailroom")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f5d7872a

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功